Skip to content

Commit 91bd69d

Browse files
authored
Merge pull request bitcoin#687 from Roasbeef/bip158-updates
BIP-0158: remove extended filter, remove txid from regular filter, reparameterize gcs params
2 parents 7158648 + ac76644 commit 91bd69d

File tree

3 files changed

+221
-273
lines changed

3 files changed

+221
-273
lines changed

bip-0158.mediawiki

Lines changed: 56 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,14 @@ For each block, compact filters are derived containing sets of items associated
6565
with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
6666
data objects is compressed into a probabilistic structure called a
6767
''Golomb-coded set'' (GCS), which matches all items in the set with probability
68-
1, and matches other items with probability <code>2^(-P)</code> for some integer
69-
parameter <code>P</code>.
68+
1, and matches other items with probability <code>2^(-P)</code> for some
69+
integer parameter <code>P</code>. We also introduce parameter <code>M</code>
70+
which allows filter to uniquely tune the range that items are hashed onto
71+
before compressing. Each defined filter also selects distinct parameters for P
72+
and M.
7073

7174
At a high level, a GCS is constructed from a set of <code>N</code> items by:
72-
# hashing all items to 64-bit integers in the range <code>[0, N * 2^P)</code>
75+
# hashing all items to 64-bit integers in the range <code>[0, N * M)</code>
7376
# sorting the hashed values in ascending order
7477
# computing the differences between each value and the previous one
7578
# writing the differences sequentially, compressed with Golomb-Rice coding
@@ -80,9 +83,13 @@ The following sections describe each step in greater detail.
8083

8184
The first step in the filter construction is hashing the variable-sized raw
8285
items in the set to the range <code>[0, F)</code>, where <code>F = N *
83-
2^P</code>. Set membership queries against the hash outputs will have a false
84-
positive rate of <code>2^(-P)</code>. To avoid integer overflow, the number of
85-
items <code>N</code> MUST be <2^32 and <code>P</code> MUST be <=32.
86+
M</code>. Customarily, <code>M</code> is set to <code>2^P</code>. However, if
87+
one is able to select both Parameters independently, then more optimal values
88+
can be
89+
selected<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
90+
Set membership queries against the hash outputs will have a false positive rate
91+
of <code>2^(-P)</code>. To avoid integer overflow, the
92+
number of items <code>N</code> MUST be <2^32 and <code>M</code> MUST be <2^32.
8693

8794
The items are first passed through the pseudorandom function ''SipHash'', which
8895
takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces
@@ -104,9 +111,9 @@ result.
104111
hash_to_range(item: []byte, F: uint64, k: [16]byte) -> uint64:
105112
return (siphash(k, item) * F) >> 64
106113

107-
hashed_set_construct(raw_items: [][]byte, P: uint, k: [16]byte) -> []uint64:
114+
hashed_set_construct(raw_items: [][]byte, k: [16]byte, M: uint) -> []uint64:
108115
let N = len(raw_items)
109-
let F = N << P
116+
let F = N * M
110117

111118
let set_items = []
112119
@@ -197,8 +204,8 @@ with Golomb-Rice coding. Finally, the bit stream is padded with 0's to the
197204
nearest byte boundary and serialized to the output byte vector.
198205

199206
<pre>
200-
construct_gcs(L: [][]byte, P: uint, k: [16]byte) -> []byte:
201-
let set_items = hashed_set_construct(L, P, k)
207+
construct_gcs(L: [][]byte, P: uint, k: [16]byte, M: uint) -> []byte:
208+
let set_items = hashed_set_construct(L, k, M)
202209

203210
set_items.sort()
204211
@@ -224,8 +231,8 @@ against the reconstructed values. Note that querying does not require the entire
224231
decompressed set be held in memory at once.
225232

226233
<pre>
227-
gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint) -> bool:
228-
let F = N << P
234+
gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint, M: uint) -> bool:
235+
let F = N * M
229236
let target_hash = hash_to_range(target, F, k)
230237

231238
stream = new_bit_stream(compressed_set)
@@ -258,49 +265,54 @@ against the decompressed GCS contents. See
258265

259266
=== Block Filters ===
260267

261-
This BIP defines two initial filter types:
268+
This BIP defines one initial filter type:
262269
* Basic (<code>0x00</code>)
263-
* Extended (<code>0x01</code>)
270+
* <code>M = 784931</code>
271+
* <code>P = 19</code>
264272
265273
==== Contents ====
266274

267275
The basic filter is designed to contain everything that a light client needs to
268-
sync a regular Bitcoin wallet. A basic filter MUST contain exactly the following
269-
items for each transaction in a block:
270-
* The outpoint of each input, except for the coinbase transaction
271-
* The scriptPubKey of each output
272-
* The <code>txid</code> of the transaction itself
273-
274-
The extended filter contains extra data that is meant to enable applications
275-
with more advanced smart contracts. An extended filter MUST contain exactly the
276-
following items for each transaction in a block ''except the coinbase'':
277-
* Each item within the witness stack of each input (if the input has a witness)
278-
* Each data push in the scriptSig of each input
279-
280-
Note that neither filter type interprets P2SH scripts or witness scripts to
281-
extract data pushes from them. If necessary, future filter types may be designed
282-
to do so.
276+
sync a regular Bitcoin wallet. A basic filter MUST contain exactly the
277+
following items for each transaction in a block:
278+
* The previous output script (the script being spent) for each input, except
279+
for the coinbase transaction.
280+
* The scriptPubKey of each output, aside from all <code>OP_RETURN</code> output
281+
scripts.
282+
283+
Any "nil" items MUST NOT be included into the final set of filter elements.
284+
285+
We exclude all <code>OP_RETURN</code> outputs in order to allow filters to
286+
easily be committed to in the future via a soft-fork. A likely area for future
287+
commitments is an additional <code>OP_RETURN</code> output in the coinbase
288+
transaction similar to the current witness commitment
289+
<ref>https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki</rev>. By
290+
excluding all <code>OP_RETURN</code> outputs we avoid a circular dependency
291+
between the commitment, and the item being committed to.
283292

284293
==== Construction ====
285294

286-
Both the basic and extended filter types are constructed as Golomb-coded sets
287-
with the following parameters.
295+
The basic type is constructed as Golomb-coded sets with the following
296+
parameters.
288297

289-
The parameter <code>P</code> MUST be set to <code>20</code>. This value was
290-
chosen as simulations show that it minimizes the bandwidth utilized, considering
291-
both the expected number of blocks downloaded due to false positives and the
292-
size of the filters themselves. The code along with a demo used for the
293-
parameter tuning can be found
294-
[https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go here].
298+
The parameter <code>P</code> MUST be set to <code>19</code>, and the parameter
299+
<code>M</code> MUST be set to <code>784931</code>. Analysis has shown that if
300+
one is able to select <code>P</code> and <code>M</code> independently, then
301+
setting <code>M=1.497137 * 2^P</code> is close to optimal
302+
<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
303+
304+
Empirical analysis also shows that was chosen as these parameters minimize the
305+
bandwidth utilized, considering both the expected number of blocks downloaded
306+
due to false positives and the size of the filters themselves.
295307

296308
The parameter <code>k</code> MUST be set to the first 16 bytes of the hash of
297309
the block for which the filter is constructed. This ensures the key is
298310
deterministic while still varying from block to block.
299311

300312
Since the value <code>N</code> is required to decode a GCS, a serialized GCS
301-
includes it as a prefix, written as a CompactSize. Thus, the complete
302-
serialization of a filter is:
303-
* <code>N</code>, encoded as a CompactSize
313+
includes it as a prefix, written as a <code>CompactSize</code>. Thus, the
314+
complete serialization of a filter is:
315+
* <code>N</code>, encoded as a <code>CompactSize</code>
304316
* The bytes of the compressed filter itself
305317

306318
==== Signaling ====
@@ -323,7 +335,8 @@ though it requires implementation of the new filters.
323335

324336
We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the
325337
basis of this BIP to our attention, Greg Maxwell for pointing us in the
326-
direction of Golomb-Rice coding and fast range optimization, and Pedro
338+
direction of Golomb-Rice coding and fast range optimization, Pieter Wullie for
339+
his analysis of optimal GCS parameters, and Pedro
327340
Martelletto for writing the initial indexing code for <code>btcd</code>.
328341

329342
We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for
@@ -375,8 +388,8 @@ easier to understand.
375388
=== Golomb-Coded Set Multi-Match ===
376389

377390
<pre>
378-
gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint) -> bool:
379-
let F = N << P
391+
gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint, M: uint) -> bool:
392+
let F = N * M
380393

381394
// Map targets to the same range as the set hashes.
382395
let target_hashes = []

0 commit comments

Comments
 (0)