@@ -65,11 +65,10 @@ For each block, compact filters are derived containing sets of items associated
65
65
with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
66
66
data objects is compressed into a probabilistic structure called a
67
67
''Golomb-coded set'' (GCS), which matches all items in the set with probability
68
- 1, and matches other items with probability <code>2^(-P) </code> for some
69
- integer parameter <code>P </code>. We also introduce parameter <code>M </code>
70
- which allows filter to uniquely tune the range that items are hashed onto
71
- before compressing. Each defined filter also selects distinct parameters for P
72
- and M.
68
+ 1, and matches other items with probability <code>1/M </code> for some
69
+ integer parameter <code>M </code>. The encoding is also parameterized by
70
+ <code>P </code>, the bit length of the remainder code. Each filter defined
71
+ specifies values for <code>P </code> and <code>M </code>.
73
72
74
73
At a high level, a GCS is constructed from a set of <code>N </code> items by:
75
74
# hashing all items to 64-bit integers in the range <code>[0, N * M) </code>
@@ -88,8 +87,8 @@ one is able to select both Parameters independently, then more optimal values
88
87
can be
89
88
selected<ref >https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref >.
90
89
Set membership queries against the hash outputs will have a false positive rate
91
- of <code>2^(-P) </code>. To avoid integer overflow, the
92
- number of items <code> N </code> MUST be <2^32 and <code>M </code> MUST be <2^32.
90
+ of <code>M </code>. To avoid integer overflow, the number of items <code> N </code>
91
+ MUST be <2^32 and <code>M </code> MUST be <2^32.
93
92
94
93
The items are first passed through the pseudorandom function ''SipHash'' , which
95
94
takes a 128-bit key <code>k </code> and a variable-sized byte vector and produces
@@ -189,9 +188,10 @@ golomb_decode(stream, P: uint) -> uint64:
189
188
190
189
==== Set Construction ====
191
190
192
- A GCS is constructed from three parameters:
191
+ A GCS is constructed from four parameters:
193
192
* <code>L </code>, a vector of <code>N </code> raw items
194
- * <code>P </code>, which determines the false positive rate
193
+ * <code>P </code>, the bit parameter of the Golomb-Rice coding
194
+ * <code>M </code>, the target false positive rate
195
195
* <code>k </code>, the 128-bit key used to randomize the SipHash outputs
196
196
197
197
The result is a byte vector with a minimum size of <code>N * (P + 1) </code>
0 commit comments