# Bit-tree: a data structure for fast file processing.

Virtually all modern-day file search techniques use some sort of B-Tree structure. In fact, Douglas Comer in the "The Ubiquitous B-Tree" [3" states: "The B-Tree is, de facto, the standard organization for indexes in a database system." There are many variants of the B-Tree as first described by Bayer and McCreight [1]. This article describes one such variant called a "Bit-Tree." Understanding any of these variants requires understanding of the basic B-Tree model. We will describe B-Trees and a more detailed description can be found in Knuth [6].A "Bit-Tree" is designed to pack more information into leaf nodes. The denser this information can be compacted, the lower the height of the tree with an attendant reduction in the number of input-output operations required to search the tree. As a by-product, less space is required for the tree.

This article will restrict attention to the most common file operations: "get," "add," "delete," and "get equal or greater" (i.e., get the record with key equal to or greater than, the argument key). If key updating is allowed, it is implemented as an "add" followed by a "delete."

Bit-Tree

History

Prefix compression seems to have been first used in a computer program by Robert L. Patrick at Convair Fort Worth circa 1953 in various application programs on the IBM 704.

IBM implemented a method on the IBM System/38 [5] which used bits to distinguish between keys but also carried along enough bytes of the key so that at any point the key could be reconstructed from left to right. This assured that after a search, such as a Prefix B-Tree search, if the search key was not found, then the position found was the correct place to insert a new entry. This contrasts with the Bit-Tree, in which it is necessary to scan right or left to find out where to insert the new entry. The Bit-Tree, however, requires less storage and consequently lower-level trees.

Distinction bits were first used by Donald Morrison in the Practical Algorithm To Retrieve Information Coded in Alphanumeric (PATRICIA) [7]. PATRICIA used a binary tree structure (sometimes called a "trie," from reTRIEval) when the entire trie could be contained in memory. Since storage was not an issue, these tries used left and right pointers and flags. The term trie is often used when entries relate to the representation of keys rather than the relationship of keys.

B-Trees

This description will depart from that given in [1] in that it will describe a B-Tree that more closely represents B-Tree implementations used in contemporary data processing.

A B-Tree consists of a number of nodes. Each node contains a number of pointers separated by keys. Hence, there is one more pointer in a node than there are keys. The keys are in ascending order. Every node is either a branch or a leaf node. The pointers in a branch node point to other nodes. The pointers in a leaf node are relative record numbers (RRN) that specify the corresponding ordinal record numbers. One node in the tree is the root node. Starting at the root, the number of nodes traversed before encountering a leaf node is called the height of the tree. All leaf nodes are at the same distance from the root. The number of entries in a node, called the order of the tree, is determined by the node size. If key compression is employed, however, the order of different nodes in a tree may vary.

Searching a B-Tree for a record with a particular key is straightforward. Starting with the root node, search each node encountered from left to right until a key is found that is equal to or greater than the argument key. Then, if the node is not a leaf node, take the pointer preceding that key to find the next node to search. After completing the search of a leaf node, the preceding pointer is the relative record number of the data record sought.

If the root node can reside in memory, it is easy to see that the number of reads necessary to find a record is equal to the height of the tree.

Adding a new record to the file is slightly more complicated. Search the tree as before. Then move the pointer that was found and all the entries to the right of it one position (i.e., one key length plus one pointer length) to the right. Insert the record number and the key of the new record into the vacant position. If the addition of this data causes the total data to exceed the node size, the node is split into two nodes.

To split a node, remove the middle key, called the split key. Now write all information to the left of the split key as one node, and all information to the right of the split key as another new node. Then insert a pointer to the new node, and the split key into the parent of the original node. If this causes the parent to become overfull, the parent is then split in the same manner and so on. Finally, if this causes the root to split, create a new node consisting of pointers to the two halves of the previous root node, separated by the last split key. This is how the tree grows in height as the file gets larger. Notice that this technique assures that all leaf nodes are at the same level.

In practice, nodes are split when reading down the tree, due to an improvement suggested by Guibas and Sedgwick [4]. This not only simplifies the program, but more important, simplifies the locking protocol required when more than one user is adding a record in the same vicinity of the tree.

Before discussing the compacting technique a Bit-Tree processor uses for leaf nodes, it should be noted that any conventional technique can be used to represent information in the branch nodes. The recommended way to do this is the method described by Bayer and Unterauer in "Prefix B-Trees" [2]. "Prefix B-Trees," however, is a misnomer, since the compression method used could be used in "non-Tree" searching and is not restricted to tree structures (perhaps the authors were being too modest). Several important points are discussed in [2], and we will now briefly highlight them.

Prefix B-Trees

The first point is that in order to search a file by key, it is not necessary to have a list of keys, but rather to be able to distinguish between keys. Suppose for example, that a search has been narrowed down to two keys, "JOSE FLORES" and "JOSEPHSON FLOWERS", so that now it is necessary to decide which to choose. Typically an index sequential access method (ISAM) file would have the key "JOSEPHSON FLOWERS" and this would be sufficient to make the distinction, that is, if the key being searched for is less than this key, take the first entry, otherwise, take the second entry. Notice, however, that any quantity greater than "JOSE FLORES" and equal to or less than "JOSEPHSON FLOWERS" would suffice equally well to enable the correct entry to be selected. For instance the quantity "JOSEPHINE" would do as the distinguishing key.

The second point is that "front end" or "prefix" compression should be used. Suppose "JOSEPHSON FLOWERS" is the first key in a node, and "JOMEINI" is the first key in the prior node. Then the leading two bytes in JOSEPHSON FLOWERS" are redundant. After prefix compression, the partial key could be "SEPHSON FLOWERS" (or could just as well be "SEPINE").

The third point is that "rear end" or "suffix" compression should be used. It is only necessary to keep enough information in the leading entry in a node to distinguish it from the last entry in the prior node. In the current example, suppose the last entry in the node preceding the node starting with "JOSEPHSON FLOWERS" was "JOSE FLORES". The first byte that differs in these keys is byte 5, or the "P" in "JOSEPHSON FLOWERS" (or in "JOSEPHINE"). Hence, with the prefix compression already determined, the partial key in this case would be "SEP".

This example is shown graphically in Figure 1. Most implementations would keep certain "housekeeping" information such as "node type," "node length," at the beginning of the node. This is shown as "prolog" in Figure 1. In that figure, "RNP" refers to Relative Node Pointer," which is described in the conclusion.

The final point is that when splitting a node, the split point can be chosen to maximize the amount of suffix compression. Notice from the previous paragraph that suffix compression is only determined by the last key in the previous node and the first key in the succeeding node. Instead of splitting a node exactly in the middle, suppose a certain range of entries is considered for a possible split point. Then the split point chosen should be between the two keys having the least number of identical leading bytes.

Bit-Trees

Any indexed file search technique uses key information and relative record numbers, (RRNs), to find the desired record. Simply stated, key information is examined in order to find the RRN for the record with the key desired (if it exists). In Bit-Tree leaf nodes, the key information used is the distinction bit between two consecutive keys. The distinction bit is defined as the ordinal number of the most significant bit which differs in the two keys, normally with a bias. The purpose of the bias is to reserve distinction bit values of zero so that they can be used to find the beginning and end of nodes. This fact will be used in some of the algorithms described later.

It may seem that Bit-Trees are the logical extension of Prefix B-Trees taken to the conclusion, however, this is not the case. Both Bit-Trees and Prefix B-Trees lose information about the keys. When the search of a Prefix B-Tree node is complete, however, only insignificant information about the last key tested is unknown. That is, the correct position in the current node has been found for any search. Unfortunately, this is not true for Bit-Trees. When the search of a Bit-Tree node is complete, the correct position in the node has only been found if a record with that key is in the file. This of course is disastrous when searching branch nodes for a key that is not in the file (e.g., before making an "add") since one must know which node to search next. It is for this reason that distinction bits are only used in leaf nodes.

Bit-Tree

Any technique that loses information about the keys cannot assure (without reading the data file) that the key searched for is in the file. Using a Bit-Tree or a Prefix B-Tree search, it is at least assured that the RRN found is the correct one when the record is in the file. On the other hand, if no such record is in the file, and it is now to be added, the RRN in the position in the node found (by either of these techniques) is used to read the data record, so that the search key can be compared with the actual key in the data record to determine a "found" or "not found" result.

Bit-Tree Algorithms

Let D(K,L) represent the distinction bit (hereafter called the D-Bit) between the keys K and L where K does not equal L. Then if the bits are numbered from left to right starting a 0, and [unkeyable] represents exclusive or, D(K,L) can be defined formally as: D(K,L) = b + m - 1 - [log[X. sub. 2] (K [unkeyable] L) ] where m is the key length in bits, b is the bias, and [x] means the integer part of x. An example of this is shown in Figure 2 for the (EBCDIC) keys "K" (hex D2) and "P" (hex D7).

If a certain leaf node points to n + 1 records, let R[X.sub. i], (i = 0,...,n), represent the i'th RRN in the node. Let [K . sup. i) represent the key of the record specified by [R. sup. i[. Then, as is usual, the [R. sub. i] <[KK. sub i+1]. Between each pair of RRNs there must be a distinction bit.

Let [D. sub. i, (i = 1,...,n), represent D([K.sub.i-,Ki]). Since the keys are in ascending order and this is the highest-order bit change, it must be from a 0 to a 1, so the [D.sub.i] bit is always on in [K.sub.i]. Then the distinction bits [D. sub.i = 100, [D. sub. i] bit is always on in [K.sub.i]. Then the distinction bits [D.sub.i] = 100, [D.sub.k] = 112 and [D.sub.j] = 92 would appear as shown in Figure 3. Before describing the search technique, it is necessary to make some observations. Suppose that for some i and j with i<j, we have a) [D.sub.j < D.subi and b)] [D.sub.i<D.sub.k] for all k in the range [i + 1,j - 1]. That is, [D.sub.j] is the first D-Bit following [D.sub.i] which is smaller than [D.sub.i] which is smaller than [D.sub.i. The following theorem is required:

THEOREM 1: For the sequence of D-Bits just mentioned, the [D.sub.i] bit is on in [K.sub.i]+1,] ...[K.sub.j-1].

PROOF: Since [D.sub.i+1] through [D.sub.j-1] are all larger than [D.sub.i,] none of the lower (more significant) bits could have changed in [K.sub.i+1] through [K.sub.j-1]. Therefore the [D.sub.i] bit must be on in each of those corresponding keys.

THEOREM 2: If [D.sub.i = D.sub.j] there must be a [D.sub.k < D.sub.i with i<k<j] (see Figure 4).

PROOF: If there is no smaller distinction bit between [D.sub.i] and [D.sub.j], then from Therorem 1, the [D.sub.i] bit must be on in [K.sub.i]+1 through [K.sub.j]-1. Then however, the [D.sub.i] bit would be on in [K.sub.j-1] so that [D.subj] could not equal [D.sub.i.

In particular, [D.sub.i] can never equal [D.sub.i+1].

Consequently, to search a leaf node for a particular key "K", assign [R.sub.0] to the variable [R.sub.found] and then for each [D.sub.i] starting with i = 1, test the [D.sub.i] bit in K. If it is on, assign [R.sub.i] to the variable [R.sub.] found and continue. If it is off, skip the following distinction bits until a smaller one is found. This is because we know from Theorem 1 that the [D.sub.1] bit is on for each of these entries. When the end of the node is reached, the relative record number in [R.sub.found] is the record number of the record wanted if the record is in the file. This is proved by Theorem 3, which is found in appendix A.

Now read the record and compare its key with K. If it is equal, the record is found, otherwise it is not found. If the record is found and the operation is a "get " or "update" (assuming update of keys is not permitted), no further references to the Bit-Tree are required.

If the record was found and the operation was a "delete", then the node must be modified. Suppose record number [R.sub.i] is to be deleted. Then [D.sub.i] and [D.sub.i+1] must also be deleted since they refer to the key of [R.subi] which will no longer exist. This must all be replaced with the distinction bit between [K.sub.i-1] and [K.sub.1]. No records, however, have to be read due to the following theorem:

THEOREM 4: When [R.sub.i] is deleted the new distinction bit between [K.sub.i]-1] and [D.sub.i+1].

PROOF: There are two cases according to whether [D.sub.E<D.sub.i] 1] or [D.sub.i]> D.sub.i+1]. (Notice that [D.sub.i] cannot equal [D.sub.i+1] because of Theorem 2.)

CASE 1: [D.sub i<D.sub.i+1 If [D.sub.i<D.sub.i+1] If [D.sub.[D.sub.i+1] then the [D.sub.i] bit is off in [K.sub.i-1] and is on in [K.sub.i]. Since [D.sub.<D.subi+1], however, [D.sub.1] is also on in [K.sub.i] 1] by Theorem 1). Furthermore, by definition, no lower number bit differs between [K.sub.i-1] and [K.sub.i+1]. Hence [D.sub.i] is the distinction bit between [K.sub.i-1] and [K.sub.i+1]. CASE 2: [D.sub.i>D.sub.i+1 If [D.sub.i>D.sub.i 1] then [D.sub.i 1] on in [K.sub.i+1 and off in [K.sub.i]. The distinction bit between [K.sub.i 1] and [K.sub.] was [D.sub.i], however, which means that all lower bits in [K.sub.i-1] and [K.sub.i were the same. Hence [D.sub.i+1] was off in [K.sub.i-1 also. Consequently [D.sub.i+1 is the distinction bit between [K.sub.i-1 and [K.sub.i+1].

Combining the results of Case 1 and Case 2, if [R.sub.i] is deleted, the resulting distinction bit between [K.sub.i-1] and [K.sub.i+1] is the smaller of [D.sub.i] and [D.sub.i+1.

Suppose the record was not found, and the operation was a "get equal or greater". First compute the distinction bit, D, between the search key, K, and [K.sub.i], the key found by the search routine. If K > [K.sub.i] then clearly K belongs to the right of [K.sub.i]. Therefore, scan the entries following [D.sub.i], skipping those whose distinction bits are greater than D because, like [K.sub.i], they too will have the D-bit off (or else their distinction bit would be D) and hence are also smaller than K. The record desired is the first following record with a distinction bit smaller than D. Notice that a distinction bit equal to D cannot occur because then the entry [R.sub.i] would not have been selected.

Conversely, suppose K < [K.sub.i], Then clearly K belongs to the left of [K.sub.i]. Scanning backwards starting with [D.sub.i], let [D.sub.j] be the first distinction bit found that is smaller than D. Then, since the D bit is on in [K.sub.i] it must be on in all of the keys [K.sub.j] through [K.sub.i] because no distinction bits as small as D were found in this interval. Since the D bit is on in [K.sub.j] and off in K, < D [K.sub.j]. Also, since [D.sub.j] is off in [K.sub.j-1] and on in K, [K.subj] D K. Therefore [R.sub.j] is the record number wanted. (This argument holds even if i = j.)

It is because of the preceding algorithm that distinction bits should be biased. Since distinction bit values are recorded in an unsigned field, the smallest possible value is zero. Biasing will preclude Then the first record number in the node can be preceded by a zero and the last record number followed by a zero. This assures that the above forward and backward searches for a smaller distinction bit will be successful. These two entries could be called [D.sub.0] and [D.sub.n+1].

Following a "not found" condition, an "add" to the file proceeds as in the last case for a "get equal or greater". When the next higher key is found, insert the distinction bit computed above, D, and the new record number at that point.

Node Splitting

The merit of B-Trees is based on the concept of splitting nodes. A split strategy was described for Prefix B-Trees. This showed how to find where to split a node in order to minimize the length of the partial key to be inserted into its parent node. A similar technique can be used with Bit-Trees, although it is by no means obvious, because the keys are not visible. It would be too inefficient to read each candidate key from the data file to find the one that generates the shortest partial key for the parent.

Fortunately, this need not be done. The most efficient split key is [K.sub.i] where i is such that [D.sub.i] is a minimum in the range of candidate split points. (Notice that [D.sub.i] is assured to be unique due to Theorem 2.) What is of concern here is suffix compression. The last byte of the split key that needs to be included in the parent is the first byte that differs from the previous key. By picking the smallest D-bit, we assure that the number of bytes needed is minimized, even without knowing the key of [R.sub.i-1]. Specifically the last byte that needs to be included from the split key is byte [B.sub.j] where J = [D.sub.i/8] for 8-bit bytes biased at 8.

Conclusion

The author has implemented a Bit-Tree for the IBM System/36. IBM imposes the following restrictions on keyed files: key length less than 121 bytes, file size less than 16,000,000 records. Hence an RRN can be contained in 3 bytes. Also, a tree for a 16 million record file will have less than 65,536 nodes so that an RNP can fit into 2 bytes. The Bit-Tree mentioned has the following characteristics: Nodes are 2,048 bytes. A prolog of 16 bytes is used so that 2,032 bytes are available for entries. For keys up to 30 bytes, leaf node entries are 4 bytes (1-byte distinction bit, 3-byte RRN) and 5 bytes for keys from 30 to 120 bytes (2-byte distinction bit). Almost all files encountered have keys of 30 bytes or less.

Prefix B-Tree type branch nodes are used. Empirical studies show that the compressed keys used in branch nodes average 2 bytes, regardless of key length. Then a branch node entry that averages 6 bytes is comprised as follows: 1 byte specifying bytes in partial key, 1 byte specifying where key ends, 2 bytes for partial key (on average) and 2 bytes for an RNP. (Notice if this technique was used for leaf nodes, entries would have to be 1 byte longer since RRNs would have to be used instead of RNPs.) Therefore, a maximum of 2,032/6 or 338 entries can be contained in a branch node. Also, a minimum of 169 entries and an average of 253 entries can be contained in a branch node.

As a result, all files encountered so far fit into three-level trees. With this implementation, the maximum entries per leaf node is 508 while the minimum is 254 so that the average should be 381. A branch (root) node must be full for a two-level tree to split into a three-level tree, and the preceding paragraph shows that this happens when it has 338 entries which would point to 339 leaves. Hence three-level trees would start at about 339 x 382 = 129,498 entries. Since the root node is always held resident, a search of files of up to 129,498 records normally requires one input operation to determine the desired record, and one more to access it.

If prefix B-tree type nodes were used for the leaves, then an average entry would require 7 bytes, and three-level trees would be reached for files of about 339 x 291 = 98,649 records. Consequently, for files between about 98,649 and 129,498 records, using B-Tree type nodes would require an additional disk access.

The preceding paragraph, however, uses the average length of a B-Tree type entry. Whereas the entry length of a Bit-Tree entry is fixed at 4 bytes (for keys up to 30 bytes), the entry length for a B-Tree type entry can approach K/2 where K is the key length. This can be seen by considering a file comprised of pairs of keys differing in the last byte, and one pair differing from the next in one of the first bytes.

Finally, a fair amount of computing time is spent searching leaf nodes. Because Bit-Tree entries are a fixed length, the coding can be much more efficient. In the implementation described here, the number of cycles to inspect a Prefix B-Tree entry is 184 while the number of cycles to inspect a Bit-Tree entry is 42. Furthermore, inspecting Prefix B-Tree entries requires instruction modification on the IBM System/36 whereas inspecting Bit-Tree entries does not. This can be disastrous for reentrant programs.

Using a Bit-Tree, an additional input operation must be done for an "add" operation. (This additional input operation must also be done for a "get". Notice that in this case, this is exactly the same operation that would be done next anyway.) In most applications, however, "get" operations happen far more often than "add" operations.

Appendix A

THEOREM 3: If the entry for the key Q (Query key) is in the leaf, this leaf search algorithm will find it: i.e., (R)found will be set to the record with key Q.

PROOF: Since R(0) is assigned to (R)found at the start of the search, there will be some key F (Found key) in [(R.sub.found]. The proof is by reductio ad absurdum; We assume that Q [is not equal to] F and show that F could not have been chosen by the algorithm.

CASE 1. Let Q < F. Since the keys are in ascending order, If Q is the [i.sup.th.] entry and F is the [j.sub.th.] entry, i < j. By definition, all bit positions to the left of D(Q,F) are the same in Q and F. This implies that bit positions 0, 1, 2, ..., D(Q,F) - 1 are identical for all keys in the [i.sup.th.] through [j.sup.th.] entries. Hence for i < k [is less than or equal to] j, we have [D.sub.k.] [is greater than or equal to] D(Q,F).

It is clear that there must exist an entry m, i < m [is less than or equal to] j, such that [D.sub.m.] = D(Q,F), since the D(Q,F) bit must change from 0 in Q to 1 in F at some point in the sequence of keys.

Now consider two subcases.

1. Suppose the algorithm looks at [D.sub.m] (i.e., it checks the [m.sup.th.] entry). Then it will find that the ([D.sub.m.])[D.sup.th.] (that is, the D(Q,F)[D.sup.th.) bit is 0 in the key Q for which we are searching. It will then skip all following entries until it finds a distinction bit which is strictly less than D(Q,F). Hence if m < j, the algorithm will skip the [j.sup.th.] entry. If m = j, it will not make that entry [R.sub.found.] since the ([D.sub.m.])[D.sup.th.] bit is 0 in Q. In either case, F (the [j.sup.th.] entry) would not have been assigned to [R.sub.found], which is contrary to our assumption.

2. Suppose the algorithm does not look at the [m.sup.th.] entry. The only reason would be that the algorithm is skipping all entries until it finds some distinction bit which is less than [D.sup.m.] (which is equal to D(Q,F)). Then it would also skip the [j.sup.th.] entry, and F would not be chosen, again contrary to our assumption.

CASE 2. Suppose F < Q. Let F be the [i.sup.th.] entry and Q be the [j.sup.th.] entry, so again i < j. Reasoning as in the beginning of Case 1 we see that for i < k [is less than or equal to], we have [D.sub.k.] [is less than or equal to] D(Q,F), and that there exists an m, with i < m [is less than or equal to] j, such that [D.sub.m.] = D(Q,F).

Again, consider two subcases:

1. Suppose the algorithm looks at the [m.sup.th.] entry. Then since the [D.sub.m.] bit is on in Q, the algorithm at that point assigns the [m.sub.th.] entry to [R.sub.found.]. Thus F could not have been the value chosen.

2. Suppose the algorithm does not look at the [m.sup.th.] entry. This can happen only if the last distinction bit looked at, [D.sub.k.], was less than [D.sub.m.]. All the bits in the 0, 1, 2, . . .,

D(Q,F) - 1, however, are the same for the keys in the [i.sup.th.] through [j.sup.th.] entries, so k < i. Thus, the algorithm must also have skipped the [i.sup.th.] entry, again contrary to our consumption.

References

[1]. Bayer, R., and McCreight, E. Organization and maintenance of large ordered indexes. Acta Inform. 1, 3 (1972), 173-189.

[2]. Bayer, R., and Unterauer, K. Prefix B-Trees. ACM Trans. Datab. Syst. 2, (Mar. 1977), 11-26.

[3]. Comer, D. The ubiquitous B-Tree. Comput. Sur. 2, 2 (June 1979), 122.

[4]. Guibas, L., and Sedgewick, R. A dichromatic framework for balanced trees. In Proceedings of the 19th Symp Foundations of Computer Science, 1978, 8-21.

[5]. Howard and Borgendale, System/38 machine indexing support, IBM System/38 Technical Developments, 1980, 67-69.

[6]. Knuth, D. The art of computer programming. Vol. 3. Addison Wesley, Mass. 1973, 478.

[7]. Morrison, D. PATRICIA-Practical Algorithm To Retrieve Information Coded In Alphanumeric, J. ACM 15 (1968), 514-534.

CR Categories and Subject Descriptors: D.4.3 [Operating Systems Management]: File Systems Management--file organization; E.5 [Data]: Files--sorting/searching; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--search process

General Terms: Algorithms, Performance

Additional Key Words and Phrases: B-Tree, Bit-Tree, branch, distinction bit, index, information retrieval, ISAM, leaf, mode, random access, root, trie

About the Author

DAVID E. FERGUSON is chairman and CEO at Amalgamated Software of North America, Inc., in Big Bear Lake, Calif. His research concentrates on operating systems, principally sorting, searching, compilers and assemblers, and on number theory. Author's Present Address: Amalgamated Software of North America, Inc., P.O. Box 1668, 611 Spruce Road, Big Bear Lake, CA 92315

Printer friendly Cite/link Email Feedback | |

Author: | Ferguson, David E. |
---|---|

Publication: | Communications of the ACM |

Article Type: | Technical |

Date: | Jun 1, 1992 |

Words: | 5146 |

Previous Article: | An object-oriented methodology for knowledge base/database coupling. |

Next Article: | Debunking the software patent myths. |

Topics: |