Skip to content

Commit 58f187c

Browse files
authored
Merge pull request #52 from rvagg/rvagg/docs
Documentation
2 parents 21886a1 + 3d5b360 commit 58f187c

File tree

5 files changed

+412
-73
lines changed

5 files changed

+412
-73
lines changed

README.md

Lines changed: 16 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,28 +6,22 @@ go-hamt-ipld
66
[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](http://webchat.freenode.net/?channels=%23ipfs)
77
[![Travis CI](https://travis-ci.org/ipfs/go-hamt-ipld.svg?branch=master)](https://travis-ci.org/ipfs/go-hamt-ipld)
88

9-
> A CHAMP HAMT implemented using ipld
10-
11-
12-
## Table of Contents
13-
14-
- [Usage](#usage)
15-
- [API](#api)
16-
- [Contribute](#contribute)
17-
- [License](#license)
18-
19-
20-
## Examples
21-
22-
```go
23-
// TODO
24-
```
25-
26-
## Contribute
27-
28-
PRs are welcome!
29-
30-
Small note: If editing the Readme, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification.
9+
**This package is a reference implementation of the IPLD HAMT used in the
10+
Filecoin blockchain.** It includes some optional flexibility such that it may
11+
be used for other purposes outside of Filecoin.
12+
13+
HAMT is a ["hash array mapped trie"](https://en.wikipedia.org/wiki/Hash_array_mapped_trie).
14+
This implementation extends the standard form by including buckets for the
15+
key/value pairs at storage leaves and [CHAMP mutation semantics](https://michael.steindorfer.name/publications/oopsla15.pdf).
16+
The CHAMP invariant and mutation rules provide us with the ability to maintain
17+
canonical forms given any set of keys and their values, regardless of insertion
18+
order and intermediate data insertion and deletion. Therefore, for any given
19+
set of keys and their values, a HAMT using the same parameters and CHAMP
20+
semantics, the root node should always produce the same content identifier
21+
(CID).
22+
23+
**See https://godoc.org/github.com/ipfs/go-hamt-ipld for more information and
24+
API details.**
3125

3226
## License
3327

doc.go

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
/*
2+
Package hamt provides a reference implementation of the IPLD HAMT used in the
3+
Filecoin blockchain. It includes some optional flexibility such that it may be
4+
used for other purposes outside of Filecoin.
5+
6+
HAMT is a "hash array mapped trie"
7+
https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation
8+
extends the standard form by including buckets for the key/value pairs at
9+
storage leaves and CHAMP mutation semantics
10+
https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant
11+
and mutation rules provide us with the ability to maintain canonical forms
12+
given any set of keys and their values, regardless of insertion order and
13+
intermediate data insertion and deletion. Therefore, for any given set of keys
14+
and their values, a HAMT using the same parameters and CHAMP semantics, the
15+
root node should always produce the same content identifier (CID).
16+
17+
Algorithm Overview
18+
19+
The HAMT algorithm hashes incoming keys and uses incrementing subsections of
20+
that hash digest at each level of its tree structure to determine the placement
21+
of either the entry or a link to a child node of the tree. A `bitWidth`
22+
determines the number of bits of the hash to use for index calculation at each
23+
level of the tree such that the root node takes the first `bitWidth` bits of
24+
the hash to calculate an index and as we move lower in the tree, we move along
25+
the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing
26+
hash function will generate a hash that provides a new index at each level of
27+
the data structure. An index comprising `bitWidth` bits will generate index
28+
values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0
29+
to 255 inclusive.
30+
31+
Each node in the tree can therefore hold up to `2^bitWidth` elements of data,
32+
which we store in an array. In the this HAMT and the IPLD HashMap we store
33+
entries in buckets. A `Set(key, value)` mutation where the index generated at
34+
the root node for the hash of key denotes an array index that does not yet
35+
contain an entry, we create a new bucket and insert the key / value pair entry.
36+
In this way, a single node can theoretically hold up to
37+
`2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of
38+
elements a bucket is allowed to contain ("collisions"). In practice, indexes do
39+
not distribute with perfect randomness so this maximum is theoretical. Entries
40+
stored in the node's buckets are stored in key-sorted order.
41+
42+
Parameters
43+
44+
This HAMT implementation:
45+
46+
• Fixes the `bucketSize` to 3.
47+
48+
• Defaults the `bitWidth` to 8, however within Filecoin it uses 5
49+
50+
• Defaults the hash algorithm to the 64-bit variant of Murmur3-x64
51+
52+
Further Reading
53+
54+
The algorithm used here is identical to that of the IPLD HashMap algorithm
55+
specified at
56+
https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The
57+
specific parameters used by Filecoin and the DAG-CBOR block layout differ from
58+
the specification and are defined at
59+
https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.
60+
*/
61+
package hamt

0 commit comments

Comments
 (0)