Hierarchical Compression
In hierarchical compression, groups of bits are combined together into a tree-like structure of bit vectors. A '0' (zero) at a top level indicates that all descendants are also 0. A '1' indicates that the descendants contain some non-zero bits. In the first case we don't need to keep the memory for lower levels, which results in significant decrease of memory consumption. If this sounds a little confusing, the following illustration will make it clear.
Fig.1.
The BitMagic Library uses a tree of pointers, which is a particularly efficient technique of implementing hierarchical compression.
A branching ratio of three was used, and instead of using bit vectors for top-level coding, pointer values were used. NULL pointers naturally stand for '0' in a bit vector, and non-NULL stand for '1'. The depth of the compression tree is low, due to the fact we need a particularly low number of comparisons to find a leaf and inspect a bit in it. In cases where the vector is sparse, only the root node need be inspected to determine whether the object is represented in the set.
Leaf nodes can be short blocks of bits dynamically allocated on the heap. If a block consists of only 1 bit, we can replace it with a block of static global memory (which is one for all instances). In this case we have particulary efficient implementation, as illustrated below.
Fig. 2
As you can see, all the memory for bit vectors is fragmented by blocks of equal size. Each block is completely independent from another. We therefore have the ability to manipulate the blocks, create special implementations of bit vectors for parallel computing, compress blocks using different methods, swap non-frequently used blocks out of the memory, and create pools of blocks to improve memory reallocation operations.
Hierarchical Compression and Memory
Hierarchical, block segmented organization of bit vectors makes it easy to extend addressable space up to 64 bits, which can be very useful for building large sparse bit matrices. Large bit matrices are an interesting topic because different compression and block-forming schemes can be used there.
Summary
Hierarchical compression is an efficient and flexible way to manipulate bit vectors, making the performance/space trade-off worthwhile.
(c) Anatoliy Kuznetsov. 2002. [ anatoliy_kuznetsov at yahoo.com ]
|