Skip to content

Commit 78c312c

Browse files
committed
Replace current benchmarking framework with nanobench
This replaces the current benchmarking framework with nanobench [1], an MIT licensed single-header benchmarking library, of which I am the autor. This has in my opinion several advantages, especially on Linux: * fast: Running all benchmarks takes ~6 seconds instead of 4m13s on an Intel i7-8700 CPU @ 3.20GHz. * accurate: I ran e.g. the benchmark for SipHash_32b 10 times and calculate standard deviation / mean = coefficient of variation: * 0.57% CV for old benchmarking framework * 0.20% CV for nanobench So the benchmark results with nanobench seem to vary less than with the old framework. * It automatically determines runtime based on clock precision, no need to specify number of evaluations. * measure instructions, cycles, branches, instructions per cycle, branch misses (only Linux, when performance counters are available) * output in markdown table format. * Warn about unstable environment (frequency scaling, turbo, ...) * For better profiling, it is possible to set the environment variable NANOBENCH_ENDLESS to force endless running of a particular benchmark without the need to recompile. This makes it to e.g. run "perf top" and look at hotspots. Here is an example copy & pasted from the terminal output: | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 2.52 | 396,529,415.94 | 0.6% | 25.42 | 8.02 | 3.169 | 0.06 | 0.0% | 0.03 | `bench/crypto_hash.cpp RIPEMD160` | 1.87 | 535,161,444.83 | 0.3% | 21.36 | 5.95 | 3.589 | 0.06 | 0.0% | 0.02 | `bench/crypto_hash.cpp SHA1` | 3.22 | 310,344,174.79 | 1.1% | 36.80 | 10.22 | 3.601 | 0.09 | 0.0% | 0.04 | `bench/crypto_hash.cpp SHA256` | 2.01 | 496,375,796.23 | 0.0% | 18.72 | 6.43 | 2.911 | 0.01 | 1.0% | 0.00 | `bench/crypto_hash.cpp SHA256D64_1024` | 7.23 | 138,263,519.35 | 0.1% | 82.66 | 23.11 | 3.577 | 1.63 | 0.1% | 0.00 | `bench/crypto_hash.cpp SHA256_32b` | 3.04 | 328,780,166.40 | 0.3% | 35.82 | 9.69 | 3.696 | 0.03 | 0.0% | 0.03 | `bench/crypto_hash.cpp SHA512` [1] https://github.com/martinus/nanobench * Adds support for asymptotes This adds support to calculate asymptotic complexity of a benchmark. This is similar to #17375, but currently only one asymptote is supported, and I have added support in the benchmark `ComplexMemPool` as an example. Usage is e.g. like this: ``` ./bench_bitcoin -filter=ComplexMemPool -asymptote=25,50,100,200,400,600,800 ``` This runs the benchmark `ComplexMemPool` several times but with different complexityN settings. The benchmark can extract that number and use it accordingly. Here, it's used for `childTxs`. The output is this: | complexityN | ns/op | op/s | err% | ins/op | cyc/op | IPC | total | benchmark |------------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|----------:|:---------- | 25 | 1,064,241.00 | 939.64 | 1.4% | 3,960,279.00 | 2,829,708.00 | 1.400 | 0.01 | `ComplexMemPool` | 50 | 1,579,530.00 | 633.10 | 1.0% | 6,231,810.00 | 4,412,674.00 | 1.412 | 0.02 | `ComplexMemPool` | 100 | 4,022,774.00 | 248.58 | 0.6% | 16,544,406.00 | 11,889,535.00 | 1.392 | 0.04 | `ComplexMemPool` | 200 | 15,390,986.00 | 64.97 | 0.2% | 63,904,254.00 | 47,731,705.00 | 1.339 | 0.17 | `ComplexMemPool` | 400 | 69,394,711.00 | 14.41 | 0.1% | 272,602,461.00 | 219,014,691.00 | 1.245 | 0.76 | `ComplexMemPool` | 600 | 168,977,165.00 | 5.92 | 0.1% | 639,108,082.00 | 535,316,887.00 | 1.194 | 1.86 | `ComplexMemPool` | 800 | 310,109,077.00 | 3.22 | 0.1% |1,149,134,246.00 | 984,620,812.00 | 1.167 | 3.41 | `ComplexMemPool` | coefficient | err% | complexity |--------------:|-------:|------------ | 4.78486e-07 | 4.5% | O(n^2) | 6.38557e-10 | 21.7% | O(n^3) | 3.42338e-05 | 38.0% | O(n log n) | 0.000313914 | 46.9% | O(n) | 0.0129823 | 114.4% | O(log n) | 0.0815055 | 133.8% | O(1) The best fitting curve is O(n^2), so the algorithm seems to scale quadratic with `childTxs` in the range 25 to 800.
1 parent 19e9192 commit 78c312c

38 files changed

+3644
-573
lines changed

.appveyor.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ after_build:
7575
#- 7z a bitcoin-%APPVEYOR_BUILD_VERSION%.zip %APPVEYOR_BUILD_FOLDER%\build_msvc\%platform%\%configuration%\*.exe
7676
test_script:
7777
- cmd: src\test_bitcoin.exe -l test_suite
78-
- cmd: src\bench_bitcoin.exe -evals=1 -scaling=0 > NUL
78+
- cmd: src\bench_bitcoin.exe > NUL
7979
- ps: python test\util\bitcoin-util-test.py
8080
- cmd: python test\util\rpcauth-test.py
8181
# Fee estimation test failing on appveyor with: WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted.

contrib/devtools/copyright_header.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
'src/reverse_iterator.h',
2323
'src/test/fuzz/FuzzedDataProvider.h',
2424
'src/tinyformat.h',
25+
'src/bench/nanobench.h',
2526
'test/functional/test_framework/bignum.py',
2627
# python init:
2728
'*__init__.py',

doc/benchmarking.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,10 @@ After compiling bitcoin-core, the benchmarks can be run with:
1919

2020
The output will look similar to:
2121
```
22-
# Benchmark, evals, iterations, total, min, max, median
23-
AssembleBlock, 5, 700, 1.79954, 0.000510913, 0.000517018, 0.000514497
22+
| ns/byte | byte/s | error % | benchmark
23+
|--------------------:|--------------------:|--------:|:----------------------------------------------
24+
| 64.13 | 15,592,356.01 | 0.1% | `Base58CheckEncode`
25+
| 24.56 | 40,722,672.68 | 0.2% | `Base58Decode`
2426
...
2527
```
2628

src/Makefile.bench.include

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ bench_bench_bitcoin_SOURCES = \
3333
bench/merkle_root.cpp \
3434
bench/mempool_eviction.cpp \
3535
bench/mempool_stress.cpp \
36+
bench/nanobench.h \
37+
bench/nanobench.cpp \
3638
bench/rpc_blockchain.cpp \
3739
bench/rpc_mempool.cpp \
3840
bench/util_time.cpp \

src/Makefile.test.include

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1151,8 +1151,8 @@ endif
11511151
if TARGET_WINDOWS
11521152
else
11531153
if ENABLE_BENCH
1154-
@echo "Running bench/bench_bitcoin -evals=1 -scaling=0..."
1155-
$(BENCH_BINARY) -evals=1 -scaling=0 > /dev/null
1154+
@echo "Running bench/bench_bitcoin ..."
1155+
$(BENCH_BINARY) > /dev/null
11561156
endif
11571157
endif
11581158
$(AM_V_at)$(MAKE) $(AM_MAKEFLAGS) -C secp256k1 check

src/bench/addrman.cpp

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -67,52 +67,52 @@ static void FillAddrMan(CAddrMan& addrman)
6767

6868
/* Benchmarks */
6969

70-
static void AddrManAdd(benchmark::State& state)
70+
static void AddrManAdd(benchmark::Bench& bench)
7171
{
7272
CreateAddresses();
7373

7474
CAddrMan addrman;
7575

76-
while (state.KeepRunning()) {
76+
bench.run([&] {
7777
AddAddressesToAddrMan(addrman);
7878
addrman.Clear();
79-
}
79+
});
8080
}
8181

82-
static void AddrManSelect(benchmark::State& state)
82+
static void AddrManSelect(benchmark::Bench& bench)
8383
{
8484
CAddrMan addrman;
8585

8686
FillAddrMan(addrman);
8787

88-
while (state.KeepRunning()) {
88+
bench.run([&] {
8989
const auto& address = addrman.Select();
9090
assert(address.GetPort() > 0);
91-
}
91+
});
9292
}
9393

94-
static void AddrManGetAddr(benchmark::State& state)
94+
static void AddrManGetAddr(benchmark::Bench& bench)
9595
{
9696
CAddrMan addrman;
9797

9898
FillAddrMan(addrman);
9999

100-
while (state.KeepRunning()) {
100+
bench.run([&] {
101101
const auto& addresses = addrman.GetAddr();
102102
assert(addresses.size() > 0);
103-
}
103+
});
104104
}
105105

106-
static void AddrManGood(benchmark::State& state)
106+
static void AddrManGood(benchmark::Bench& bench)
107107
{
108108
/* Create many CAddrMan objects - one to be modified at each loop iteration.
109109
* This is necessary because the CAddrMan::Good() method modifies the
110110
* object, affecting the timing of subsequent calls to the same method and
111111
* we want to do the same amount of work in every loop iteration. */
112112

113-
const uint64_t numLoops = state.m_num_iters * state.m_num_evals;
113+
bench.epochs(5).epochIterations(1);
114114

115-
std::vector<CAddrMan> addrmans(numLoops);
115+
std::vector<CAddrMan> addrmans(bench.epochs() * bench.epochIterations());
116116
for (auto& addrman : addrmans) {
117117
FillAddrMan(addrman);
118118
}
@@ -128,13 +128,13 @@ static void AddrManGood(benchmark::State& state)
128128
};
129129

130130
uint64_t i = 0;
131-
while (state.KeepRunning()) {
131+
bench.run([&] {
132132
markSomeAsGood(addrmans.at(i));
133133
++i;
134-
}
134+
});
135135
}
136136

137-
BENCHMARK(AddrManAdd, 5);
138-
BENCHMARK(AddrManSelect, 1000000);
139-
BENCHMARK(AddrManGetAddr, 500);
140-
BENCHMARK(AddrManGood, 2);
137+
BENCHMARK(AddrManAdd);
138+
BENCHMARK(AddrManSelect);
139+
BENCHMARK(AddrManGetAddr);
140+
BENCHMARK(AddrManGood);

src/bench/base58.cpp

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
#include <vector>
1111

1212

13-
static void Base58Encode(benchmark::State& state)
13+
static void Base58Encode(benchmark::Bench& bench)
1414
{
1515
static const std::array<unsigned char, 32> buff = {
1616
{
@@ -19,13 +19,13 @@ static void Base58Encode(benchmark::State& state)
1919
200, 24
2020
}
2121
};
22-
while (state.KeepRunning()) {
22+
bench.batch(buff.size()).unit("byte").run([&] {
2323
EncodeBase58(buff.data(), buff.data() + buff.size());
24-
}
24+
});
2525
}
2626

2727

28-
static void Base58CheckEncode(benchmark::State& state)
28+
static void Base58CheckEncode(benchmark::Bench& bench)
2929
{
3030
static const std::array<unsigned char, 32> buff = {
3131
{
@@ -36,22 +36,22 @@ static void Base58CheckEncode(benchmark::State& state)
3636
};
3737
std::vector<unsigned char> vch;
3838
vch.assign(buff.begin(), buff.end());
39-
while (state.KeepRunning()) {
39+
bench.batch(buff.size()).unit("byte").run([&] {
4040
EncodeBase58Check(vch);
41-
}
41+
});
4242
}
4343

4444

45-
static void Base58Decode(benchmark::State& state)
45+
static void Base58Decode(benchmark::Bench& bench)
4646
{
4747
const char* addr = "17VZNX1SN5NtKa8UQFxwQbFeFc3iqRYhem";
4848
std::vector<unsigned char> vch;
49-
while (state.KeepRunning()) {
49+
bench.batch(strlen(addr)).unit("byte").run([&] {
5050
(void) DecodeBase58(addr, vch, 64);
51-
}
51+
});
5252
}
5353

5454

55-
BENCHMARK(Base58Encode, 470 * 1000);
56-
BENCHMARK(Base58CheckEncode, 320 * 1000);
57-
BENCHMARK(Base58Decode, 800 * 1000);
55+
BENCHMARK(Base58Encode);
56+
BENCHMARK(Base58CheckEncode);
57+
BENCHMARK(Base58Decode);

src/bench/bech32.cpp

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
44

55
#include <bench/bench.h>
6+
#include <bench/nanobench.h>
67

78
#include <bech32.h>
89
#include <util/strencodings.h>
@@ -11,26 +12,26 @@
1112
#include <vector>
1213

1314

14-
static void Bech32Encode(benchmark::State& state)
15+
static void Bech32Encode(benchmark::Bench& bench)
1516
{
1617
std::vector<uint8_t> v = ParseHex("c97f5a67ec381b760aeaf67573bc164845ff39a3bb26a1cee401ac67243b48db");
1718
std::vector<unsigned char> tmp = {0};
1819
tmp.reserve(1 + 32 * 8 / 5);
1920
ConvertBits<8, 5, true>([&](unsigned char c) { tmp.push_back(c); }, v.begin(), v.end());
20-
while (state.KeepRunning()) {
21+
bench.batch(v.size()).unit("byte").run([&] {
2122
bech32::Encode("bc", tmp);
22-
}
23+
});
2324
}
2425

2526

26-
static void Bech32Decode(benchmark::State& state)
27+
static void Bech32Decode(benchmark::Bench& bench)
2728
{
2829
std::string addr = "bc1qkallence7tjawwvy0dwt4twc62qjgaw8f4vlhyd006d99f09";
29-
while (state.KeepRunning()) {
30+
bench.batch(addr.size()).unit("byte").run([&] {
3031
bech32::Decode(addr);
31-
}
32+
});
3233
}
3334

3435

35-
BENCHMARK(Bech32Encode, 800 * 1000);
36-
BENCHMARK(Bech32Decode, 800 * 1000);
36+
BENCHMARK(Bech32Encode);
37+
BENCHMARK(Bech32Decode);

0 commit comments

Comments
 (0)