@@ -65,11 +65,14 @@ For each block, compact filters are derived containing sets of items associated
65
65
with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
66
66
data objects is compressed into a probabilistic structure called a
67
67
''Golomb-coded set'' (GCS), which matches all items in the set with probability
68
- 1, and matches other items with probability <code>2^(-P) </code> for some integer
69
- parameter <code>P </code>.
68
+ 1, and matches other items with probability <code>2^(-P) </code> for some
69
+ integer parameter <code>P </code>. We also introduce parameter <code>M </code>
70
+ which allows filter to uniquely tune the range that items are hashed onto
71
+ before compressing. Each defined filter also selects distinct parameters for P
72
+ and M.
70
73
71
74
At a high level, a GCS is constructed from a set of <code>N </code> items by:
72
- # hashing all items to 64-bit integers in the range <code>[0, N * 2^P ) </code>
75
+ # hashing all items to 64-bit integers in the range <code>[0, N * M ) </code>
73
76
# sorting the hashed values in ascending order
74
77
# computing the differences between each value and the previous one
75
78
# writing the differences sequentially, compressed with Golomb-Rice coding
@@ -80,9 +83,13 @@ The following sections describe each step in greater detail.
80
83
81
84
The first step in the filter construction is hashing the variable-sized raw
82
85
items in the set to the range <code>[0, F) </code>, where <code>F = N *
83
- 2^P </code>. Set membership queries against the hash outputs will have a false
84
- positive rate of <code>2^(-P) </code>. To avoid integer overflow, the number of
85
- items <code>N </code> MUST be <2^32 and <code>P </code> MUST be <=32.
86
+ M </code>. Customarily, <code>M </code> is set to <code>2^P </code>. However, if
87
+ one is able to select both Parameters independently, then more optimal values
88
+ can be
89
+ selected<ref >https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref >.
90
+ Set membership queries against the hash outputs will have a false positive rate
91
+ of <code>2^(-P) </code>. To avoid integer overflow, the
92
+ number of items <code>N </code> MUST be <2^32 and <code>M </code> MUST be <2^32.
86
93
87
94
The items are first passed through the pseudorandom function ''SipHash'' , which
88
95
takes a 128-bit key <code>k </code> and a variable-sized byte vector and produces
@@ -104,9 +111,9 @@ result.
104
111
hash_to_range(item: []byte, F: uint64, k: [16 ]byte) -> uint64:
105
112
return (siphash(k, item) * F) >> 64
106
113
107
- hashed_set_construct(raw_items: [][]byte, P: uint, k: [16 ]byte) -> []uint64:
114
+ hashed_set_construct(raw_items: [][]byte, k: [16 ]byte, M: uint ) -> []uint64:
108
115
let N = len(raw_items)
109
- let F = N << P
116
+ let F = N * M
110
117
111
118
let set_items = []
112
119
@@ -197,8 +204,8 @@ with Golomb-Rice coding. Finally, the bit stream is padded with 0's to the
197
204
nearest byte boundary and serialized to the output byte vector.
198
205
199
206
<pre>
200
- construct_gcs(L: [][]byte, P: uint, k: [16 ]byte) -> []byte:
201
- let set_items = hashed_set_construct(L, P, k )
207
+ construct_gcs(L: [][]byte, P: uint, k: [16 ]byte, M: uint ) -> []byte:
208
+ let set_items = hashed_set_construct(L, k, M )
202
209
203
210
set_items.sort()
204
211
@@ -224,8 +231,8 @@ against the reconstructed values. Note that querying does not require the entire
224
231
decompressed set be held in memory at once.
225
232
226
233
<pre>
227
- gcs_match(key: [16]byte, compressed_set: [ ]byte, target: []byte, P: uint, N: uint) -> bool:
228
- let F = N << P
234
+ gcs_match(key: [16]byte, compressed_set: [ ]byte, target: []byte, P: uint, N: uint, M: uint ) -> bool:
235
+ let F = N * M
229
236
let target_hash = hash_to_range(target, F, k)
230
237
231
238
stream = new_bit_stream(compressed_set)
@@ -258,49 +265,54 @@ against the decompressed GCS contents. See
258
265
259
266
=== Block Filters ===
260
267
261
- This BIP defines two initial filter types :
268
+ This BIP defines one initial filter type :
262
269
* Basic (<code>0x00 </code>)
263
- * Extended (<code>0x01 </code>)
270
+ * <code>M = 784931 </code>
271
+ * <code>P = 19 </code>
264
272
265
273
==== Contents ====
266
274
267
275
The basic filter is designed to contain everything that a light client needs to
268
- sync a regular Bitcoin wallet. A basic filter MUST contain exactly the following
269
- items for each transaction in a block:
270
- * The outpoint of each input, except for the coinbase transaction
271
- * The scriptPubKey of each output
272
- * The <code>txid </code> of the transaction itself
273
-
274
- The extended filter contains extra data that is meant to enable applications
275
- with more advanced smart contracts. An extended filter MUST contain exactly the
276
- following items for each transaction in a block ''except the coinbase'' :
277
- * Each item within the witness stack of each input (if the input has a witness)
278
- * Each data push in the scriptSig of each input
279
-
280
- Note that neither filter type interprets P2SH scripts or witness scripts to
281
- extract data pushes from them. If necessary, future filter types may be designed
282
- to do so.
276
+ sync a regular Bitcoin wallet. A basic filter MUST contain exactly the
277
+ following items for each transaction in a block:
278
+ * The previous output script (the script being spent) for each input, except
279
+ for the coinbase transaction.
280
+ * The scriptPubKey of each output, aside from all <code>OP_RETURN </code> output
281
+ scripts.
282
+
283
+ Any "nil" items MUST NOT be included into the final set of filter elements.
284
+
285
+ We exclude all <code>OP_RETURN </code> outputs in order to allow filters to
286
+ easily be committed to in the future via a soft-fork. A likely area for future
287
+ commitments is an additional <code>OP_RETURN </code> output in the coinbase
288
+ transaction similar to the current witness commitment
289
+ <ref >https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki</rev>. By
290
+ excluding all <code>OP_RETURN </code> outputs we avoid a circular dependency
291
+ between the commitment, and the item being committed to.
283
292
284
293
==== Construction ====
285
294
286
- Both the basic and extended filter types are constructed as Golomb-coded sets
287
- with the following parameters.
295
+ The basic type is constructed as Golomb-coded sets with the following
296
+ parameters.
288
297
289
- The parameter <code>P </code> MUST be set to <code>20 </code>. This value was
290
- chosen as simulations show that it minimizes the bandwidth utilized, considering
291
- both the expected number of blocks downloaded due to false positives and the
292
- size of the filters themselves. The code along with a demo used for the
293
- parameter tuning can be found
294
- [https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go here ].
298
+ The parameter <code>P </code> MUST be set to <code>19 </code>, and the parameter
299
+ <code>M </code> MUST be set to <code>784931 </code>. Analysis has shown that if
300
+ one is able to select <code>P </code> and <code>M </code> independently, then
301
+ setting <code>M=1.497137 * 2^P </code> is close to optimal
302
+ <ref >https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref >.
303
+
304
+ Empirical analysis also shows that was chosen as these parameters minimize the
305
+ bandwidth utilized, considering both the expected number of blocks downloaded
306
+ due to false positives and the size of the filters themselves.
295
307
296
308
The parameter <code>k </code> MUST be set to the first 16 bytes of the hash of
297
309
the block for which the filter is constructed. This ensures the key is
298
310
deterministic while still varying from block to block.
299
311
300
312
Since the value <code>N </code> is required to decode a GCS, a serialized GCS
301
- includes it as a prefix, written as a CompactSize. Thus, the complete
302
- serialization of a filter is:
303
- * <code>N </code>, encoded as a CompactSize
313
+ includes it as a prefix, written as a <code> CompactSize </code> . Thus, the
314
+ complete serialization of a filter is:
315
+ * <code>N </code>, encoded as a <code> CompactSize </code>
304
316
* The bytes of the compressed filter itself
305
317
306
318
==== Signaling ====
@@ -323,7 +335,8 @@ though it requires implementation of the new filters.
323
335
324
336
We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the
325
337
basis of this BIP to our attention, Greg Maxwell for pointing us in the
326
- direction of Golomb-Rice coding and fast range optimization, and Pedro
338
+ direction of Golomb-Rice coding and fast range optimization, Pieter Wullie for
339
+ his analysis of optimal GCS parameters, and Pedro
327
340
Martelletto for writing the initial indexing code for <code>btcd </code>.
328
341
329
342
We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for
@@ -375,8 +388,8 @@ easier to understand.
375
388
=== Golomb-Coded Set Multi-Match ===
376
389
377
390
<pre>
378
- gcs_match_any(key: [16]byte, compressed_set: [ ]byte, targets: [][]byte, P: uint, N: uint) -> bool:
379
- let F = N << P
391
+ gcs_match_any(key: [16]byte, compressed_set: [ ]byte, targets: [][]byte, P: uint, N: uint, M: uint ) -> bool:
392
+ let F = N * M
380
393
381
394
// Map targets to the same range as the set hashes.
382
395
let target_hashes = []
0 commit comments