|
| 1 | +/* |
| 2 | +Package hamt provides a reference implementation of the IPLD HAMT used in the |
| 3 | +Filecoin blockchain. It includes some optional flexibility such that it may be |
| 4 | +used for other purposes outside of Filecoin. |
| 5 | +
|
| 6 | +HAMT is a "hash array mapped trie" |
| 7 | +https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation |
| 8 | +extends the standard form by including buckets for the key/value pairs at |
| 9 | +storage leaves and CHAMP mutation semantics |
| 10 | +https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant |
| 11 | +and mutation rules provide us with the ability to maintain canonical forms |
| 12 | +given any set of keys and their values, regardless of insertion order and |
| 13 | +intermediate data insertion and deletion. Therefore, for any given set of keys |
| 14 | +and their values, a HAMT using the same parameters and CHAMP semantics, the |
| 15 | +root node should always produce the same content identifier (CID). |
| 16 | +
|
| 17 | +Algorithm Overview |
| 18 | +
|
| 19 | +The HAMT algorithm hashes incoming keys and uses incrementing subsections of |
| 20 | +that hash digest at each level of its tree structure to determine the placement |
| 21 | +of either the entry or a link to a child node of the tree. A `bitWidth` |
| 22 | +determines the number of bits of the hash to use for index calculation at each |
| 23 | +level of the tree such that the root node takes the first `bitWidth` bits of |
| 24 | +the hash to calculate an index and as we move lower in the tree, we move along |
| 25 | +the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing |
| 26 | +hash function will generate a hash that provides a new index at each level of |
| 27 | +the data structure. An index comprising `bitWidth` bits will generate index |
| 28 | +values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 |
| 29 | +to 255 inclusive. |
| 30 | +
|
| 31 | +Each node in the tree can therefore hold up to `2^bitWidth` elements of data, |
| 32 | +which we store in an array. In the this HAMT and the IPLD HashMap we store |
| 33 | +entries in buckets. A `Set(key, value)` mutation where the index generated at |
| 34 | +the root node for the hash of key denotes an array index that does not yet |
| 35 | +contain an entry, we create a new bucket and insert the key / value pair entry. |
| 36 | +In this way, a single node can theoretically hold up to |
| 37 | +`2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of |
| 38 | +elements a bucket is allowed to contain ("collisions"). In practice, indexes do |
| 39 | +not distribute with perfect randomness so this maximum is theoretical. Entries |
| 40 | +stored in the node's buckets are stored in key-sorted order. |
| 41 | +
|
| 42 | +Parameters |
| 43 | +
|
| 44 | +This HAMT implementation: |
| 45 | +
|
| 46 | +• Fixes the `bucketSize` to 3. |
| 47 | +
|
| 48 | +• Defaults the `bitWidth` to 8, however within Filecoin it uses 5 |
| 49 | +
|
| 50 | +• Defaults the hash algorithm to the 64-bit variant of Murmur3-x64 |
| 51 | +
|
| 52 | +Further Reading |
| 53 | +
|
| 54 | +The algorithm used here is identical to that of the IPLD HashMap algorithm |
| 55 | +specified at |
| 56 | +https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The |
| 57 | +specific parameters used by Filecoin and the DAG-CBOR block layout differ from |
| 58 | +the specification and are defined at |
| 59 | +https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant. |
| 60 | +*/ |
| 61 | +package hamt |
0 commit comments