You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-9Lines changed: 18 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,10 +22,10 @@ QmRJzsvyCQyizr73Gmms8ZRtvNxmgqumxc2KUp71dfEmoj # sha256 in base58
22
22
EiAsJrRraP/Gj/mbRTwdMEE0E0ItcGSDv6D5il6IYmbnrg== # sha256 in base64
23
23
```
24
24
25
-
## format
25
+
## Format
26
26
27
27
```
28
-
<1-byte hash function code><1-byte digest size in bytes><hash function output>
28
+
<varint hash function code><varint digest size in bytes><hash function output>
29
29
```
30
30
31
31
Binary example (only 4 bytes for simplicity):
@@ -37,18 +37,26 @@ fn code dig size hash digest
37
37
sha1 4 bytes 4 byte sha1 digest
38
38
```
39
39
40
-
> Why have digest size as a separate byte?
40
+
> Why have digest size as a separate number?
41
41
42
-
Because you end up with a function code really meaning "function-and-digest-size-code". Makes using custom digest sizes annoying, and is less flexible.
43
-
44
-
> What if we need more?
45
-
46
-
Let's decide that when we have 128 hash functions or digest sizes.
42
+
Because otherwise you end up with a function code really meaning "function-and-digest-size-code". Makes using custom digest sizes annoying, and is less flexible.
47
43
48
44
> Why isn't the size first?
49
45
50
46
Because aesthetically I prefer the code first. You already have to write your stream parsing code to understand that a single byte already means "a length in bytes more to skip". Reversing these doesn't buy you much.
51
47
48
+
> Why varints?
49
+
50
+
So that we have no limitation on functions or lengths. Implementation note: you do not need to implement varints until the standard multihash table has more than 127 functions.
51
+
52
+
> What kind of varints?
53
+
54
+
An Most Significant Bit unsigned varint, as defined by the [multiformats/unsigned-varint](https://github.com/multiformats/unsigned-varint).
55
+
56
+
> Don't we have to agree on a table of functions?
57
+
58
+
Yes, but we already have to agree on functions, so this is not hard. The table even leaves some room for custom function codes.
59
+
52
60
## Implementations:
53
61
54
62
-[go-multihash](//github.com/jbenet/go-multihash)
@@ -72,6 +80,7 @@ The current multihash table is [here](hashtable.csv):
72
80
73
81
```
74
82
code name
83
+
0x00 identity
75
84
0x11 sha1
76
85
0x12 sha2-256
77
86
0x13 sha2-512
@@ -100,6 +109,6 @@ They disagree. :(
100
109
101
110
## Disclaimers
102
111
103
-
Warning: **obviously multihash values bias the first two bytes**. Do not expect them to be uniformly distributed. The entropy size is `len(multihash) - 2`. Skip the first two bytes when using them with bloom filters, etc. Why not _ap_pend instead of _pre_pend? Because when reading a stream of hashes, you can know the length of the hash (from the table).
112
+
Warning: **obviously multihash values bias the first two bytes**. Do not expect them to be uniformly distributed. The entropy size is `len(multihash) - 2`. Skip the first two bytes when using them with bloom filters, etc. Why not _ap_pend instead of _pre_pend? Because when reading a stream of hashes, you can know the length of the whole value, and allocate the right amount of memory, skip it, or discard it.
0 commit comments