.: how the SQL Server hierarchyid data type works (kind of) | 2011-02-05 06:31AM :.
Yesterday, I stumbled across the SQL Server hierarchyid data type. It stores the position of an object within a tree. I was intrigued because I've developed similar mechanisms for efficiently querying hierarchical data, and I wanted to see how it compares. I'm always happy to offload complexity to other software when I can trust it to do the job right, and I trust SQL Server.
A hierarchyid is very compact, capable of storing a position within a tree containing 100,000 nodes with a branching factor of 6 in less than 39 bits on average. Interestingly, it supports arbitrary insertions and deletions without ever needing to update any rows (besides the one being inserted/deleted). This means you can always generate a new hierarchyid before or after any other ID, or between any two sibling IDs. Hierarchyids are designed so that sorting a set of hierarchyids puts them in the order that they would be visited on a depth-first traversal of the tree, which is the usual approach.
Hierarchyids are represented in text with the following scheme:
I'll assume that you're familiar with bits and bytes. A hierarchyid is a bit string which is left-packed into bytes to allow them to be compared directly (i.e. by comparing byte values). Trailing zeros are not part of the ID. If you look at the binary values corresponding to hierarchyids, you'll notice some patterns. (Note that the hex values often do not match the bit strings. This is because the hex values represent the actual bytes stored, while the bit strings have had trailing zeros removed.)
In particular, within certain ranges there are bits that increment in the usual way, and there are bits that seem to be constant.
The values for /0/ through /15/ are straightforward enough, but when we get to /16/, something strange happens. The bits that increment appear to get broken up. When you look at further ranges, you see other apparent breakups, where bits are grouped and separated by zeros. But the bits increment normally, with carry between groups. What is going on here?
First of all, as you can see, a varying number of bits are used to encode the numbers. This is useful, as many trees are simply binary trees, and those that aren't typically have small branching factors. But then it needs a way to know how many bits are in a particular number. It seems that hierarchyids use a prefix-free code at the beginning of each number to distinguish them: 01..., 100…, 101…, 110…, 1110…, 11110…, etc. So it knows that when it sees a 01 or 100, then it needs to read three more bits, and when it sees 101, it needs to read 4 more bits, etc.
Second, given that the endings in the broken-up patterns seem to be the same (always 0y1xxx1), I suspect that perhaps values are computed in multiple stages. First, the value of 1xxx1 is computed. This is added to 8y, and then the result is added to 16z (if it exists), which is added to 128a or 64a (if it exists), etc. Finally, the result is added to a constant which marks the beginning of its range (01xx1 starts at 0, 100xx1 at 4, etc). So perhaps the numbers can be broken down this way: prefix (n+0)* y1xxx1. That is, a prefix followed a number (possibly zero) of groups, each of which contains some bits followed by a zero, and ending with y1xxx1.
What I don't understand is why the numbers have such a strange pattern. Why not simply use 01xx1, 100xx1, 101xxx1, 1110xxxxxxxxxxxxx1, etc? Why separate the groups with zeros? It seems like a waste of space, but I assume there's a good reason. There's at least a good reason for the 1 bit at the end, which I'll get to later.
Let's consider negative numbers. Normally, numbers on computers are represented in two's complement. But in two's complement, negative numbers are greater than positive numbers when you consider their bit strings. That is, the unsigned value of a negative integer is greater than the unsigned value of a positive integer. But hierarchyids need negative numbers to have bit string values less than those of positive numbers. One way to do this is to use a number representation where 0 is represented as a 1 followed by zeros. For instance, with four bits, 0 is represented as 1000. 1 through 7 are 1001 through 1111, and crucially, -1 is 0111, -2 is 0110, etc. Let's look at some more bits.
It doesn't really use the scheme I expected. You can see that the prefixes continue: 01…, 0011…, 0010…, 00010…. I think negative bit strings are longer than positive ones because there are likely to be far fewer negative numbers, so they can make better use of prefix code bits this way. The patterns are quite similar to the positive case, with a prefix code, followed by some zero-separated groups of bits, followed by 1xxx1.
Let's take a break from negative numbers for now, and look at that trailing 1 bit. First, let's examine the way numbers are combined to form separate levels (e.g. /1/2/) and fractions (e.g. /1.1/). Thankfully, this is relatively easy to understand.
The parent/child relationship is represented by simple concatenation. /0/ is 01001 and so /0/0/ is 01001 01001. This makes sense. Next, look at the dot patterns. At first, it seems that the last number in a dotted sequence is in its normal form, and the previous numbers are their normal forms plus one. This is quite clever. Since all the normal forms end in 1, adding one yields a string ending in zero. Given that /0/ is 01001, /0.X/ is 01010 X, and /1/ is 01011, you can see that all IDs /0/*/ will be less than all IDs /0.X/*/, which are themselves less than /1/.
Things get complicated when we get to /3.0/, however. /3/ is 01111, and adding one would normally produce 10000, but it instead produces 100000. I believe this is to handle the following scenario. Consider /3.4/. If the 3. did produce 10000, then the result would be 10000100001. This would be greater than /4/, which is 100001, but it's supposed to be less. Let's look at another case. /15/ is 101111, but /15.0/ produces 110000010000 01001. This time it isn't simply a case of tacking on another zero. However, /16/ is 110000010001, so I think we can reformulate the /X.Y/ rule as “output /X+1/ minus one, and then output /Y/”. So for /3.0/, we take /3+1/ (i.e. /4/), which is 100001, and subtract one to get 100000. This works for all the cases we've seen so far.
Now we have enough information to parse and generate hierarchyids and combine them, using parent/child and sibling relationships, assuming we know the pattern for each prefix code, and those can be discovered empirically. Still unsolved is the mystery of why the patterns are so strange, and in particular, why there are seemingly constant zeros embedded within them. To see if we can figure it out, let's try removing them and seeing if we run into trouble. We'll design our own scheme, with the following patterns: 01xx1, 10xxxx1, and 110xxxxxxxx1. For negative numbers, we'll use the following: 001xx1, 0001xxx1, and 00001xxxxxxxxxx1. (These were chosen more to help detect problems than for efficient coding.)
Is this a valid scheme? It seems so. I reason that, given two bit strings, either the prefix codes of the first number match or they don't. If they don't match, then we can compare the strings by comparing the prefix codes, since we've chosen increasing codes for higher numbers. If the prefix codes do match, then the first two numbers are the same length in bits, and are exactly lined up, so they can be directly compared. If they are equal, then we can examine the next pair of numbers and repeat this logic. The scheme also shares the same logic for combining IDs. So why, then, does SQL Server use an excessively complex and seemingly wasteful formula? I honestly don't know. If I had to venture a guess, it would be that the zero bits allow creating IDs between two other IDs without adding more bits, but I haven't observed this, and I've looked for it.
The above scheme can be optimized. The order of siblings is relevant when using hierarchyids, and tracking that uses substantial space. Most systems storing hierarchical data only need to know about ancestry relationships. We can redesign the above code so that it doesn't keep track of order to yield one with substantially higher efficiency. We'll use the following patterns: 0xx, 100xx, 101xxx, 110xxxxx, 1110xxxxxxxx, 11110xxxxxxxxxxx, and 11111xxxxxxxxxxxxxxxxxxx.
Although I haven't put much work into optimizing it, this provides a good balance between overall capacity and efficiency with typical branching factors. The hypothetical 100,000 node tree with a branching factor of 6 that a hierarchyid can handle using about 39 bits on average takes only 24 bits on average with this scheme, a savings of almost 40%. Note that 000 is unused. This is partly due to SQL Server's disgusting behavior of ignoring trailing zeros when comparing varbinary values. (It considers it true that 0x10 = 0x1000 = 0x100000!) So you can't use codes without any 1 bits in them, or else SQL Server may fail to compare them correctly. But also, strings containing 6 or more contiguous zero bits would be ambiguous. (E.g. 0x00 could be /0/ or /0/0/.)
If you know that your data is a binary tree, you can use just one bit per level. (You might want to use an int or bigint column to store the ID, then.) You can do lots of things to adapt the scheme to your data, but I think the best idea would be to use hierarchyids for when you need to keep track of order among siblings (although the first scheme we developed can handle it better, if you adjust its coding), and create a second general scheme for the common case of trees where order among siblings doesn't matter.
As a closing note, I'll mention that I have probably made some mistakes, perhaps serious ones, while creating this document, and if anybody can find a flaw in the ideas behind the schemes I developed and/or explain the reason for the hierarchyid's oddities, I would greatly appreciate it.
.: In-depth and informative | 2015-05-07 07:23AM :.
Thank you for such in-depth article. This is a lot more than restating books on line!
I am SQL architect and I have started to use hierarchy id about a year ago and I continue to be fascinated by it. I use it in ERP system for Bills of Material and so far it stood to factory loads of out customers. Your article really explains how it works.
Thanks for your time and expertise.
an anonymous Vladimir Moldovanenko
.: RE: In-depth and informative | 2015-05-08 03:31PM :.
I'm glad you found it interesting. :-)
.: Converting hierarchyid to Postgres | 2016-04-29 02:04AM :.
I just used this into to convert the hierarchyid stuff in the sample Adventureworks database over to Postgres. Wrote a little function to un-do the binary values back to a materialized path. Would probably help out others wanting to convert databases from MSSQL to Postgres. Code available here:
an anonymous Lorin Thwaits
.: RE: Converting hierarchyid to Postgres | 2016-04-29 03:16AM :.
Nice work. :-) I'm glad somebody found the information useful.