Skip to content

Commit c775dc4

Browse files
committed
Merge #12254: BIP 158: Compact Block Filters for Light Clients
254c85b bench: Benchmark GCS filter creation and matching. (Jim Posen) f33b717 blockfilter: Optimization on compilers with int128 support. (Jim Posen) 97b64d6 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen) a4afb9c blockfilter: Additional helper methods to compute hash and header. (Jim Posen) cd09c79 blockfilter: Serialization methods on BlockFilter. (Jim Posen) c1855f6 blockfilter: Construction of basic block filters. (Jim Posen) 53e7874 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen) 558c536 blockfilter: Implement GCSFilter Match methods. (Jim Posen) cf70b55 blockfilter: Implement GCSFilter constructors. (Jim Posen) c454f0a blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen) 9b622dc streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen) fe943f9 streams: Implement BitStreamReader/Writer classes. (Jim Posen) 87f2d9e streams: Unit test for VectorReader class. (Jim Posen) 947133d streams: Create VectorReader stream interface for vectors. (Jim Posen) Pull request description: This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block. ### Filter Sizes [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain. As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks: ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png) The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements. The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*: | Stat | Filter Type | |-------|--------------| | Weighted Size Ratio Mean | 0.0198 | | Size Ratio Mean | 0.0224 | | Size Ratio Std Deviation | 0.0202 | | Mean Element Size (bits) | 21.145 | | Approx Theoretical Min Element Size (bits) | 21.025 | Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
2 parents 1117283 + 254c85b commit c775dc4

11 files changed

+852
-0
lines changed

src/Makefile.am

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ BITCOIN_CORE_H = \
9696
bech32.h \
9797
bloom.h \
9898
blockencodings.h \
99+
blockfilter.h \
99100
chain.h \
100101
chainparams.h \
101102
chainparamsbase.h \
@@ -219,6 +220,7 @@ libbitcoin_server_a_SOURCES = \
219220
addrman.cpp \
220221
bloom.cpp \
221222
blockencodings.cpp \
223+
blockfilter.cpp \
222224
chain.cpp \
223225
checkpoints.cpp \
224226
consensus/tx_verify.cpp \

src/Makefile.bench.include

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ bench_bench_bitcoin_SOURCES = \
2222
bench/rollingbloom.cpp \
2323
bench/crypto_hash.cpp \
2424
bench/ccoins_caching.cpp \
25+
bench/gcs_filter.cpp \
2526
bench/merkle_root.cpp \
2627
bench/mempool_eviction.cpp \
2728
bench/verify_script.cpp \

src/Makefile.test.include

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ TEST_BINARY=test/test_bitcoin$(EXEEXT)
99

1010
JSON_TEST_FILES = \
1111
test/data/base58_encode_decode.json \
12+
test/data/blockfilters.json \
1213
test/data/key_io_valid.json \
1314
test/data/key_io_invalid.json \
1415
test/data/script_tests.json \
@@ -39,6 +40,7 @@ BITCOIN_TESTS =\
3940
test/bip32_tests.cpp \
4041
test/blockchain_tests.cpp \
4142
test/blockencodings_tests.cpp \
43+
test/blockfilter_tests.cpp \
4244
test/bloom_tests.cpp \
4345
test/bswap_tests.cpp \
4446
test/checkqueue_tests.cpp \

src/bench/gcs_filter.cpp

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
// Copyright (c) 2018 The Bitcoin Core developers
2+
// Distributed under the MIT software license, see the accompanying
3+
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+
#include <bench/bench.h>
6+
#include <blockfilter.h>
7+
8+
static void ConstructGCSFilter(benchmark::State& state)
9+
{
10+
GCSFilter::ElementSet elements;
11+
for (int i = 0; i < 10000; ++i) {
12+
GCSFilter::Element element(32);
13+
element[0] = static_cast<unsigned char>(i);
14+
element[1] = static_cast<unsigned char>(i >> 8);
15+
elements.insert(std::move(element));
16+
}
17+
18+
uint64_t siphash_k0 = 0;
19+
while (state.KeepRunning()) {
20+
GCSFilter filter(siphash_k0, 0, 20, 1 << 20, elements);
21+
22+
siphash_k0++;
23+
}
24+
}
25+
26+
static void MatchGCSFilter(benchmark::State& state)
27+
{
28+
GCSFilter::ElementSet elements;
29+
for (int i = 0; i < 10000; ++i) {
30+
GCSFilter::Element element(32);
31+
element[0] = static_cast<unsigned char>(i);
32+
element[1] = static_cast<unsigned char>(i >> 8);
33+
elements.insert(std::move(element));
34+
}
35+
GCSFilter filter(0, 0, 20, 1 << 20, elements);
36+
37+
while (state.KeepRunning()) {
38+
filter.Match(GCSFilter::Element());
39+
}
40+
}
41+
42+
BENCHMARK(ConstructGCSFilter, 1000);
43+
BENCHMARK(MatchGCSFilter, 50 * 1000);

src/blockfilter.cpp

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
// Copyright (c) 2018 The Bitcoin Core developers
2+
// Distributed under the MIT software license, see the accompanying
3+
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+
#include <blockfilter.h>
6+
#include <hash.h>
7+
#include <primitives/transaction.h>
8+
#include <script/script.h>
9+
#include <streams.h>
10+
11+
/// SerType used to serialize parameters in GCS filter encoding.
12+
static constexpr int GCS_SER_TYPE = SER_NETWORK;
13+
14+
/// Protocol version used to serialize parameters in GCS filter encoding.
15+
static constexpr int GCS_SER_VERSION = 0;
16+
17+
template <typename OStream>
18+
static void GolombRiceEncode(BitStreamWriter<OStream>& bitwriter, uint8_t P, uint64_t x)
19+
{
20+
// Write quotient as unary-encoded: q 1's followed by one 0.
21+
uint64_t q = x >> P;
22+
while (q > 0) {
23+
int nbits = q <= 64 ? static_cast<int>(q) : 64;
24+
bitwriter.Write(~0ULL, nbits);
25+
q -= nbits;
26+
}
27+
bitwriter.Write(0, 1);
28+
29+
// Write the remainder in P bits. Since the remainder is just the bottom
30+
// P bits of x, there is no need to mask first.
31+
bitwriter.Write(x, P);
32+
}
33+
34+
template <typename IStream>
35+
static uint64_t GolombRiceDecode(BitStreamReader<IStream>& bitreader, uint8_t P)
36+
{
37+
// Read unary-encoded quotient: q 1's followed by one 0.
38+
uint64_t q = 0;
39+
while (bitreader.Read(1) == 1) {
40+
++q;
41+
}
42+
43+
uint64_t r = bitreader.Read(P);
44+
45+
return (q << P) + r;
46+
}
47+
48+
// Map a value x that is uniformly distributed in the range [0, 2^64) to a
49+
// value uniformly distributed in [0, n) by returning the upper 64 bits of
50+
// x * n.
51+
//
52+
// See: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
53+
static uint64_t MapIntoRange(uint64_t x, uint64_t n)
54+
{
55+
#ifdef __SIZEOF_INT128__
56+
return (static_cast<unsigned __int128>(x) * static_cast<unsigned __int128>(n)) >> 64;
57+
#else
58+
// To perform the calculation on 64-bit numbers without losing the
59+
// result to overflow, split the numbers into the most significant and
60+
// least significant 32 bits and perform multiplication piece-wise.
61+
//
62+
// See: https://stackoverflow.com/a/26855440
63+
uint64_t x_hi = x >> 32;
64+
uint64_t x_lo = x & 0xFFFFFFFF;
65+
uint64_t n_hi = n >> 32;
66+
uint64_t n_lo = n & 0xFFFFFFFF;
67+
68+
uint64_t ac = x_hi * n_hi;
69+
uint64_t ad = x_hi * n_lo;
70+
uint64_t bc = x_lo * n_hi;
71+
uint64_t bd = x_lo * n_lo;
72+
73+
uint64_t mid34 = (bd >> 32) + (bc & 0xFFFFFFFF) + (ad & 0xFFFFFFFF);
74+
uint64_t upper64 = ac + (bc >> 32) + (ad >> 32) + (mid34 >> 32);
75+
return upper64;
76+
#endif
77+
}
78+
79+
uint64_t GCSFilter::HashToRange(const Element& element) const
80+
{
81+
uint64_t hash = CSipHasher(m_siphash_k0, m_siphash_k1)
82+
.Write(element.data(), element.size())
83+
.Finalize();
84+
return MapIntoRange(hash, m_F);
85+
}
86+
87+
std::vector<uint64_t> GCSFilter::BuildHashedSet(const ElementSet& elements) const
88+
{
89+
std::vector<uint64_t> hashed_elements;
90+
hashed_elements.reserve(elements.size());
91+
for (const Element& element : elements) {
92+
hashed_elements.push_back(HashToRange(element));
93+
}
94+
std::sort(hashed_elements.begin(), hashed_elements.end());
95+
return hashed_elements;
96+
}
97+
98+
GCSFilter::GCSFilter(uint64_t siphash_k0, uint64_t siphash_k1, uint8_t P, uint32_t M)
99+
: m_siphash_k0(siphash_k0), m_siphash_k1(siphash_k1), m_P(P), m_M(M), m_N(0), m_F(0)
100+
{}
101+
102+
GCSFilter::GCSFilter(uint64_t siphash_k0, uint64_t siphash_k1, uint8_t P, uint32_t M,
103+
std::vector<unsigned char> encoded_filter)
104+
: GCSFilter(siphash_k0, siphash_k1, P, M)
105+
{
106+
m_encoded = std::move(encoded_filter);
107+
108+
VectorReader stream(GCS_SER_TYPE, GCS_SER_VERSION, m_encoded, 0);
109+
110+
uint64_t N = ReadCompactSize(stream);
111+
m_N = static_cast<uint32_t>(N);
112+
if (m_N != N) {
113+
throw std::ios_base::failure("N must be <2^32");
114+
}
115+
m_F = static_cast<uint64_t>(m_N) * static_cast<uint64_t>(m_M);
116+
117+
// Verify that the encoded filter contains exactly N elements. If it has too much or too little
118+
// data, a std::ios_base::failure exception will be raised.
119+
BitStreamReader<VectorReader> bitreader(stream);
120+
for (uint64_t i = 0; i < m_N; ++i) {
121+
GolombRiceDecode(bitreader, m_P);
122+
}
123+
if (!stream.empty()) {
124+
throw std::ios_base::failure("encoded_filter contains excess data");
125+
}
126+
}
127+
128+
GCSFilter::GCSFilter(uint64_t siphash_k0, uint64_t siphash_k1, uint8_t P, uint32_t M,
129+
const ElementSet& elements)
130+
: GCSFilter(siphash_k0, siphash_k1, P, M)
131+
{
132+
size_t N = elements.size();
133+
m_N = static_cast<uint32_t>(N);
134+
if (m_N != N) {
135+
throw std::invalid_argument("N must be <2^32");
136+
}
137+
m_F = static_cast<uint64_t>(m_N) * static_cast<uint64_t>(m_M);
138+
139+
CVectorWriter stream(GCS_SER_TYPE, GCS_SER_VERSION, m_encoded, 0);
140+
141+
WriteCompactSize(stream, m_N);
142+
143+
if (elements.empty()) {
144+
return;
145+
}
146+
147+
BitStreamWriter<CVectorWriter> bitwriter(stream);
148+
149+
uint64_t last_value = 0;
150+
for (uint64_t value : BuildHashedSet(elements)) {
151+
uint64_t delta = value - last_value;
152+
GolombRiceEncode(bitwriter, m_P, delta);
153+
last_value = value;
154+
}
155+
156+
bitwriter.Flush();
157+
}
158+
159+
bool GCSFilter::MatchInternal(const uint64_t* element_hashes, size_t size) const
160+
{
161+
VectorReader stream(GCS_SER_TYPE, GCS_SER_VERSION, m_encoded, 0);
162+
163+
// Seek forward by size of N
164+
uint64_t N = ReadCompactSize(stream);
165+
assert(N == m_N);
166+
167+
BitStreamReader<VectorReader> bitreader(stream);
168+
169+
uint64_t value = 0;
170+
size_t hashes_index = 0;
171+
for (uint32_t i = 0; i < m_N; ++i) {
172+
uint64_t delta = GolombRiceDecode(bitreader, m_P);
173+
value += delta;
174+
175+
while (true) {
176+
if (hashes_index == size) {
177+
return false;
178+
} else if (element_hashes[hashes_index] == value) {
179+
return true;
180+
} else if (element_hashes[hashes_index] > value) {
181+
break;
182+
}
183+
184+
hashes_index++;
185+
}
186+
}
187+
188+
return false;
189+
}
190+
191+
bool GCSFilter::Match(const Element& element) const
192+
{
193+
uint64_t query = HashToRange(element);
194+
return MatchInternal(&query, 1);
195+
}
196+
197+
bool GCSFilter::MatchAny(const ElementSet& elements) const
198+
{
199+
const std::vector<uint64_t> queries = BuildHashedSet(elements);
200+
return MatchInternal(queries.data(), queries.size());
201+
}
202+
203+
static GCSFilter::ElementSet BasicFilterElements(const CBlock& block,
204+
const CBlockUndo& block_undo)
205+
{
206+
GCSFilter::ElementSet elements;
207+
208+
for (const CTransactionRef& tx : block.vtx) {
209+
for (const CTxOut& txout : tx->vout) {
210+
const CScript& script = txout.scriptPubKey;
211+
if (script[0] == OP_RETURN) continue;
212+
elements.emplace(script.begin(), script.end());
213+
}
214+
}
215+
216+
for (const CTxUndo& tx_undo : block_undo.vtxundo) {
217+
for (const Coin& prevout : tx_undo.vprevout) {
218+
const CScript& script = prevout.out.scriptPubKey;
219+
elements.emplace(script.begin(), script.end());
220+
}
221+
}
222+
223+
return elements;
224+
}
225+
226+
BlockFilter::BlockFilter(BlockFilterType filter_type, const CBlock& block, const CBlockUndo& block_undo)
227+
: m_filter_type(filter_type), m_block_hash(block.GetHash())
228+
{
229+
switch (m_filter_type) {
230+
case BlockFilterType::BASIC:
231+
m_filter = GCSFilter(m_block_hash.GetUint64(0), m_block_hash.GetUint64(1),
232+
BASIC_FILTER_P, BASIC_FILTER_M,
233+
BasicFilterElements(block, block_undo));
234+
break;
235+
236+
default:
237+
throw std::invalid_argument("unknown filter_type");
238+
}
239+
}
240+
241+
uint256 BlockFilter::GetHash() const
242+
{
243+
const std::vector<unsigned char>& data = GetEncodedFilter();
244+
245+
uint256 result;
246+
CHash256().Write(data.data(), data.size()).Finalize(result.begin());
247+
return result;
248+
}
249+
250+
uint256 BlockFilter::ComputeHeader(const uint256& prev_header) const
251+
{
252+
const uint256& filter_hash = GetHash();
253+
254+
uint256 result;
255+
CHash256()
256+
.Write(filter_hash.begin(), filter_hash.size())
257+
.Write(prev_header.begin(), prev_header.size())
258+
.Finalize(result.begin());
259+
return result;
260+
}

0 commit comments

Comments
 (0)