Skip to content

Commit 38d6f61

Browse files
authored
Merge pull request #40 from multiformats/varints
Add varints officially
2 parents 01bfaf3 + 8b01688 commit 38d6f61

File tree

1 file changed

+18
-9
lines changed

1 file changed

+18
-9
lines changed

README.md

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ QmRJzsvyCQyizr73Gmms8ZRtvNxmgqumxc2KUp71dfEmoj # sha256 in base58
2222
EiAsJrRraP/Gj/mbRTwdMEE0E0ItcGSDv6D5il6IYmbnrg== # sha256 in base64
2323
```
2424

25-
## format
25+
## Format
2626

2727
```
28-
<1-byte hash function code><1-byte digest size in bytes><hash function output>
28+
<varint hash function code><varint digest size in bytes><hash function output>
2929
```
3030

3131
Binary example (only 4 bytes for simplicity):
@@ -37,18 +37,26 @@ fn code dig size hash digest
3737
sha1 4 bytes 4 byte sha1 digest
3838
```
3939

40-
> Why have digest size as a separate byte?
40+
> Why have digest size as a separate number?
4141
42-
Because you end up with a function code really meaning "function-and-digest-size-code". Makes using custom digest sizes annoying, and is less flexible.
43-
44-
> What if we need more?
45-
46-
Let's decide that when we have 128 hash functions or digest sizes.
42+
Because otherwise you end up with a function code really meaning "function-and-digest-size-code". Makes using custom digest sizes annoying, and is less flexible.
4743

4844
> Why isn't the size first?
4945
5046
Because aesthetically I prefer the code first. You already have to write your stream parsing code to understand that a single byte already means "a length in bytes more to skip". Reversing these doesn't buy you much.
5147

48+
> Why varints?
49+
50+
So that we have no limitation on functions or lengths. Implementation note: you do not need to implement varints until the standard multihash table has more than 127 functions.
51+
52+
> What kind of varints?
53+
54+
An Most Significant Bit unsigned varint, as defined by the [multiformats/unsigned-varint](https://github.com/multiformats/unsigned-varint).
55+
56+
> Don't we have to agree on a table of functions?
57+
58+
Yes, but we already have to agree on functions, so this is not hard. The table even leaves some room for custom function codes.
59+
5260
## Implementations:
5361

5462
- [go-multihash](//github.com/jbenet/go-multihash)
@@ -72,6 +80,7 @@ The current multihash table is [here](hashtable.csv):
7280

7381
```
7482
code name
83+
0x00 identity
7584
0x11 sha1
7685
0x12 sha2-256
7786
0x13 sha2-512
@@ -100,6 +109,6 @@ They disagree. :(
100109

101110
## Disclaimers
102111

103-
Warning: **obviously multihash values bias the first two bytes**. Do not expect them to be uniformly distributed. The entropy size is `len(multihash) - 2`. Skip the first two bytes when using them with bloom filters, etc. Why not _ap_pend instead of _pre_pend? Because when reading a stream of hashes, you can know the length of the hash (from the table).
112+
Warning: **obviously multihash values bias the first two bytes**. Do not expect them to be uniformly distributed. The entropy size is `len(multihash) - 2`. Skip the first two bytes when using them with bloom filters, etc. Why not _ap_pend instead of _pre_pend? Because when reading a stream of hashes, you can know the length of the whole value, and allocate the right amount of memory, skip it, or discard it.
104113

105114
License: MIT

0 commit comments

Comments
 (0)