Skip to content

Commit 3219847

Browse files
committed
Merge bitcoin/bitcoin#32279: [IBD] prevector: store P2WSH/P2TR/P2PK scripts inline
d5104cf prevector: store `P2WSH`/`P2TR`/`P2PK` scripts inline (Lőrinc) 5212150 test: assert `CScript` allocation characteristics (Lőrinc) 65ac7f6 refactor: modernize `CScriptBase` definition (Lőrinc) 756da2a refactor: extract `STATIC_SIZE` constant to prevector (Lőrinc) Pull request description: This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](bitcoin/bitcoin#32043) ### Summary The current `prevector` size of 28 bytes (chosen to fill the `sizeof(CScript)` aligned size) was introduced in 2015 (bitcoin/bitcoin#6914) before `SegWit` and `TapRoot`. However, the increasingly common `P2WSH` and `P2TR` scripts are both 34 bytes, and are forced to use heap (re)allocation rather than efficient inline storage. The core trade-off of this change is to eliminate heap allocations for common 34-36 byte scripts at the cost of increasing the base memory footprint of all `CScript` objects by 8 bytes (while still respecting peak memory usage defined by `-dbcache`). ### Context Increasing the `prevector` size allows these scripts to be stored inline, avoiding heap allocations, reducing potential memory fragmentation, and improving performance during cache flushes. Massif analysis confirms a lower stable memory usage after flushing, suggesting the elimination of heap allocations outweighs the larger base size for common workloads. Due to memory alignment, increasing the prevector size to 36 bytes doesn't change the overall `sizeof(CScript)` compared to an increase to 34 bytes, allowing us to include `P2PK` scripts as well at no additional memory cost. <details> <summary>Massif measurements</summary> > dbcache=440 Massif before, with a heap threshold of `28`: ```bash MB 744.1^# |#: ::::::@: ::::::: :@:: @::::::::::::::@@ |#: ::::::@::::: ::: :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@ |#::::::::@::::: ::: : :@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: :::::::@:::@:::::: :: ::::@ |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: 0 +----------------------------------------------------------------------->h 0 1.805 ``` and after, with a heap threshold of `36`: ```bash MB 744.2^ : |# : ::::::::::: : : :: ::: @@:::::: :: : |# : :::: :::::: : : :: ::: @ :: :: : : |# : :::: ::::::: : :@:: ::: @ :: :: : ::: |# : :::: ::::::: : :@:: ::: @ :: :: : : : |# : :::: ::::::: : :@:: ::: @ :: :: : : : |# : :::: ::::::: : :@:: ::: @ :: :: : : : |# :: :::: ::::::: : :@:: ::: @ :: :: : : : |# :: :::: ::::::: : :@:: ::::@ :: :: : : : |#:::: :::: ::::::: :::@:: ::::@ :: :: : : : |#: ::::::: ::::::: :::@:: ::::@ :: :: @: : : |#: ::::::: ::::::: :::@:::::::@ :: :: @: : : |#: ::::::: ::::::::::::@:::::::@ :: :: @: : : |#: ::::::: :::::::: :::@:::::::@ :: :: @: : : |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: 0 +----------------------------------------------------------------------->h 0 1.618 ``` --- > for `dbcache=4500`: Massif before, with a heap threshold of `28`: ```bash GB 4.565^ :: | ##: @@::: :::: :@:::: :::: :::: | # : @ :: ::: :@: :: : :: ::: | # : @ :: ::::: :@: :: : :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : ::@ :: : ::: :@: :: @: :: ::: | # : : @ :: : ::: :@: :: @: :: ::: | # : : @ :: : ::: :@: :: @: :::::: | # : : @ :: : ::: :@: :: @: :::::: | # : : @ :: : ::: :@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: @:: | # : : @ :: : ::: ::@: :: @: :::::: @: | # : : @ :: : ::: ::@: :::@: :::::: @: | # : : @ :: : ::: ::@: :::@: :::::: @: :::::::::::::::::::::::::::::@::: 0 +----------------------------------------------------------------------->h 0 1.500 ``` and after, with a heap threshold of `36`: ``` GB 4.640^ : | ##:: ::::: :::: ::::::@ :::: | # :: : ::: :::: :: :::@ :::: | # :: :: ::: :::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: :@@ | # :: :: ::: ::::: ::: :::@ :::::::@ | # :: :: ::: ::::: ::: :::@ ::::: :@ | # :: :: ::: ::::: ::: :::@::::::: :@ | # ::::: ::: ::::: ::: :::@: ::::: :@ | # ::::: ::: ::::: ::: :::@: ::::: :@ | # ::::: :::::::::: ::: :::@: ::::: :@ | # ::::: :::: ::::: ::: :::@: ::::: :@ | # ::::: :::: ::::: ::: :::@: ::::: :@ | # ::::: :::: ::::::::: :::@: ::::: :@ | # ::::: :::: ::::::::: :::@: ::::: :@ | # ::::: :::: ::::::::: :::@: ::::: :@ ::::::@:::@:::@::::@:::::@::::@:: 0 +----------------------------------------------------------------------->h 0 1.360 ``` </details> ### Benchmarks and Memory Performance benchmarks for `AssumeUTXO` load and flush show: - Small dbcache (450MB): ~1-3% performance improvement (despite more frequent flushes) - Large dbcache (4500MB): ~6-8% performance improvement due to fewer heap allocations (and basically the number of flushes) - Very large dbcache (4500MB): ~5-6% performance improvement due to fewer heap allocations (and memory limit not being reached, so there's no memory penalty) Full IBD and `-reindex-chainstate` with also show an overall ~3-4% speedup (both for smaller and larger dbcache values). We haven't investigated using different `prevector` sizes based on script type, though this could be explored in the future if needed. ### Historical explanation for the speedup (by [Anthony Towns](bitcoin/bitcoin#32279 (comment))) > I think the tradeoff is something like: > > * spends of p2pk, p2sh, p2pkh coins -- these cost 8 more bytes > * spends of p2wpkh -- these cost 16 more bytes (sPK and scriptSig didn't need an allocation) > * spends of p2wsh and p2tr -- these cost ~48 fewer bytes (save 64 byte allocation on 64bit system, lose 8 bytes for both scriptSig and sPK) > * spends of nested p2wsh -- presumably save ~96 bytes, since the scriptSig would save an allocation, but I'm bundling it in the previous section > > Based on mainnet.observer stats for 2025-05-08, p2wpkh is about 55% of txs, p2tr is about 28%, p2pkh about 13%, p2wsh about 4% and the rest is noise, maybe? Those numbers net out to a saving of ~5.5 bytes per input. If p2wpkh rose from 55% to 80% and p2tr dropped to 20%, that would net to wasting ~3.2 bytes per input. ACKs for top commit: maflcko: review ACK d5104cf 🐺 achow101: reACK d5104cf jonatack: Review ACK d5104cf andrewtoth: ACK d5104cf Tree-SHA512: 7c5271ebaf4f6d91dc4b679ecbde4b7d0467579f072289f30da988a17c38a552d0b8cdf0e9c001739975dd019894c35e541908571527916cec56e04a8e214ae2
2 parents 2a97ff4 + d5104cf commit 3219847

File tree

6 files changed

+131
-28
lines changed

6 files changed

+131
-28
lines changed

src/bench/checkqueue.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
#include <key.h>
99
#include <prevector.h>
1010
#include <random.h>
11+
#include <script/script.h>
1112

1213
#include <cstddef>
1314
#include <cstdint>
@@ -16,7 +17,6 @@
1617

1718
static const size_t BATCHES = 101;
1819
static const size_t BATCH_SIZE = 30;
19-
static const int PREVECTOR_SIZE = 28;
2020
static const unsigned int QUEUE_BATCH_SIZE = 128;
2121

2222
// This Benchmark tests the CheckQueue with a slightly realistic workload,
@@ -30,9 +30,9 @@ static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench)
3030
ECC_Context ecc_context{};
3131

3232
struct PrevectorJob {
33-
prevector<PREVECTOR_SIZE, uint8_t> p;
33+
prevector<CScriptBase::STATIC_SIZE, uint8_t> p;
3434
explicit PrevectorJob(FastRandomContext& insecure_rand){
35-
p.resize(insecure_rand.randrange(PREVECTOR_SIZE*2));
35+
p.resize(insecure_rand.randrange(CScriptBase::STATIC_SIZE * 2));
3636
}
3737
std::optional<int> operator()()
3838
{

src/bench/prevector.cpp

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,20 @@
55
#include <prevector.h>
66

77
#include <bench/bench.h>
8+
#include <script/script.h>
89
#include <serialize.h>
910
#include <streams.h>
1011

1112
#include <type_traits>
1213
#include <vector>
1314

14-
struct nontrivial_t {
15+
struct nontrivial_t
16+
{
1517
int x{-1};
1618
nontrivial_t() = default;
1719
SERIALIZE_METHODS(nontrivial_t, obj) { READWRITE(obj.x); }
1820
};
21+
1922
static_assert(!std::is_trivially_default_constructible_v<nontrivial_t>,
2023
"expected nontrivial_t to not be trivially constructible");
2124

@@ -27,35 +30,35 @@ template <typename T>
2730
static void PrevectorDestructor(benchmark::Bench& bench)
2831
{
2932
bench.batch(2).run([&] {
30-
prevector<28, T> t0;
31-
prevector<28, T> t1;
32-
t0.resize(28);
33-
t1.resize(29);
33+
prevector<CScriptBase::STATIC_SIZE, T> t0;
34+
prevector<CScriptBase::STATIC_SIZE, T> t1;
35+
t0.resize(CScriptBase::STATIC_SIZE);
36+
t1.resize(CScriptBase::STATIC_SIZE + 1);
3437
});
3538
}
3639

3740
template <typename T>
3841
static void PrevectorClear(benchmark::Bench& bench)
3942
{
40-
prevector<28, T> t0;
41-
prevector<28, T> t1;
43+
prevector<CScriptBase::STATIC_SIZE, T> t0;
44+
prevector<CScriptBase::STATIC_SIZE, T> t1;
4245
bench.batch(2).run([&] {
43-
t0.resize(28);
46+
t0.resize(CScriptBase::STATIC_SIZE);
4447
t0.clear();
45-
t1.resize(29);
48+
t1.resize(CScriptBase::STATIC_SIZE + 1);
4649
t1.clear();
4750
});
4851
}
4952

5053
template <typename T>
5154
static void PrevectorResize(benchmark::Bench& bench)
5255
{
53-
prevector<28, T> t0;
54-
prevector<28, T> t1;
56+
prevector<CScriptBase::STATIC_SIZE, T> t0;
57+
prevector<CScriptBase::STATIC_SIZE, T> t1;
5558
bench.batch(4).run([&] {
56-
t0.resize(28);
59+
t0.resize(CScriptBase::STATIC_SIZE);
5760
t0.resize(0);
58-
t1.resize(29);
61+
t1.resize(CScriptBase::STATIC_SIZE + 1);
5962
t1.resize(0);
6063
});
6164
}
@@ -64,8 +67,8 @@ template <typename T>
6467
static void PrevectorDeserialize(benchmark::Bench& bench)
6568
{
6669
DataStream s0{};
67-
prevector<28, T> t0;
68-
t0.resize(28);
70+
prevector<CScriptBase::STATIC_SIZE, T> t0;
71+
t0.resize(CScriptBase::STATIC_SIZE);
6972
for (auto x = 0; x < 900; ++x) {
7073
s0 << t0;
7174
}
@@ -74,7 +77,7 @@ static void PrevectorDeserialize(benchmark::Bench& bench)
7477
s0 << t0;
7578
}
7679
bench.batch(1000).run([&] {
77-
prevector<28, T> t1;
80+
prevector<CScriptBase::STATIC_SIZE, T> t1;
7881
for (auto x = 0; x < 1000; ++x) {
7982
s0 >> t1;
8083
}
@@ -86,7 +89,7 @@ template <typename T>
8689
static void PrevectorFillVectorDirect(benchmark::Bench& bench)
8790
{
8891
bench.run([&] {
89-
std::vector<prevector<28, T>> vec;
92+
std::vector<prevector<CScriptBase::STATIC_SIZE, T>> vec;
9093
vec.reserve(260);
9194
for (size_t i = 0; i < 260; ++i) {
9295
vec.emplace_back();
@@ -99,11 +102,11 @@ template <typename T>
99102
static void PrevectorFillVectorIndirect(benchmark::Bench& bench)
100103
{
101104
bench.run([&] {
102-
std::vector<prevector<28, T>> vec;
105+
std::vector<prevector<CScriptBase::STATIC_SIZE, T>> vec;
103106
vec.reserve(260);
104107
for (size_t i = 0; i < 260; ++i) {
105108
// force allocation
106-
vec.emplace_back(29, T{});
109+
vec.emplace_back(CScriptBase::STATIC_SIZE + 1, T{});
107110
}
108111
});
109112
}

src/prevector.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ class prevector {
3838
static_assert(std::is_trivially_copyable_v<T>);
3939

4040
public:
41+
static constexpr unsigned int STATIC_SIZE{N};
42+
4143
typedef Size size_type;
4244
typedef Diff difference_type;
4345
typedef T value_type;

src/script/script.h

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -403,10 +403,8 @@ class CScriptNum
403403
/**
404404
* We use a prevector for the script to reduce the considerable memory overhead
405405
* of vectors in cases where they normally contain a small number of small elements.
406-
* Tests in October 2015 showed use of this reduced dbcache memory usage by 23%
407-
* and made an initial sync 13% faster.
408406
*/
409-
typedef prevector<28, unsigned char> CScriptBase;
407+
using CScriptBase = prevector<36, uint8_t>;
410408

411409
bool GetScriptOp(CScriptBase::const_iterator& pc, CScriptBase::const_iterator end, opcodetype& opcodeRet, std::vector<unsigned char>* pvchRet);
412410

src/test/script_tests.cpp

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1151,6 +1151,107 @@ BOOST_AUTO_TEST_CASE(script_CHECKMULTISIG23)
11511151
BOOST_CHECK_MESSAGE(err == SCRIPT_ERR_INVALID_STACK_OPERATION, ScriptErrorString(err));
11521152
}
11531153

1154+
/** Return the TxoutType of a script without exposing Solver details. */
1155+
static TxoutType GetTxoutType(const CScript& output_script)
1156+
{
1157+
std::vector<std::vector<uint8_t>> unused;
1158+
return Solver(output_script, unused);
1159+
}
1160+
1161+
#define CHECK_SCRIPT_STATIC_SIZE(script, expected_size) \
1162+
do { \
1163+
BOOST_CHECK_EQUAL((script).size(), (expected_size)); \
1164+
BOOST_CHECK_EQUAL((script).capacity(), CScriptBase::STATIC_SIZE); \
1165+
BOOST_CHECK_EQUAL((script).allocated_memory(), 0); \
1166+
} while (0)
1167+
1168+
#define CHECK_SCRIPT_DYNAMIC_SIZE(script, expected_size, expected_extra) \
1169+
do { \
1170+
BOOST_CHECK_EQUAL((script).size(), (expected_size)); \
1171+
BOOST_CHECK_EQUAL((script).capacity(), (expected_extra)); \
1172+
BOOST_CHECK_EQUAL((script).allocated_memory(), (expected_extra)); \
1173+
} while (0)
1174+
1175+
BOOST_AUTO_TEST_CASE(script_size_and_capacity_test)
1176+
{
1177+
BOOST_CHECK_EQUAL(sizeof(CompressedScript), 40);
1178+
BOOST_CHECK_EQUAL(sizeof(CScriptBase), 40);
1179+
BOOST_CHECK_NE(sizeof(CScriptBase), sizeof(prevector<CScriptBase::STATIC_SIZE + 1, uint8_t>)); // CScriptBase size should be set to avoid wasting space in padding
1180+
BOOST_CHECK_EQUAL(sizeof(CScript), 40);
1181+
BOOST_CHECK_EQUAL(sizeof(CTxOut), 48);
1182+
1183+
CKey dummy_key;
1184+
dummy_key.MakeNewKey(/*fCompressed=*/true);
1185+
const CPubKey dummy_pubkey{dummy_key.GetPubKey()};
1186+
1187+
// Small OP_RETURN has direct allocation
1188+
{
1189+
const auto script{CScript() << OP_RETURN << std::vector<uint8_t>(10, 0xaa)};
1190+
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::NULL_DATA);
1191+
CHECK_SCRIPT_STATIC_SIZE(script, 12);
1192+
}
1193+
1194+
// P2WPKH has direct allocation
1195+
{
1196+
const auto script{GetScriptForDestination(WitnessV0KeyHash{PKHash{dummy_pubkey}})};
1197+
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::WITNESS_V0_KEYHASH);
1198+
CHECK_SCRIPT_STATIC_SIZE(script, 22);
1199+
}
1200+
1201+
// P2SH has direct allocation
1202+
{
1203+
const auto script{GetScriptForDestination(ScriptHash{CScript{} << OP_TRUE})};
1204+
BOOST_CHECK(script.IsPayToScriptHash());
1205+
CHECK_SCRIPT_STATIC_SIZE(script, 23);
1206+
}
1207+
1208+
// P2PKH has direct allocation
1209+
{
1210+
const auto script{GetScriptForDestination(PKHash{dummy_pubkey})};
1211+
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEYHASH);
1212+
CHECK_SCRIPT_STATIC_SIZE(script, 25);
1213+
}
1214+
1215+
// P2WSH has direct allocation
1216+
{
1217+
const auto script{GetScriptForDestination(WitnessV0ScriptHash{CScript{} << OP_TRUE})};
1218+
BOOST_CHECK(script.IsPayToWitnessScriptHash());
1219+
CHECK_SCRIPT_STATIC_SIZE(script, 34);
1220+
}
1221+
1222+
// P2TR has direct allocation
1223+
{
1224+
const auto script{GetScriptForDestination(WitnessV1Taproot{XOnlyPubKey{dummy_pubkey}})};
1225+
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::WITNESS_V1_TAPROOT);
1226+
CHECK_SCRIPT_STATIC_SIZE(script, 34);
1227+
}
1228+
1229+
// Compressed P2PK has direct allocation
1230+
{
1231+
const auto script{GetScriptForRawPubKey(dummy_pubkey)};
1232+
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEY);
1233+
CHECK_SCRIPT_STATIC_SIZE(script, 35);
1234+
}
1235+
1236+
// Uncompressed P2PK needs extra allocation
1237+
{
1238+
CKey uncompressed_key;
1239+
uncompressed_key.MakeNewKey(/*fCompressed=*/false);
1240+
const CPubKey uncompressed_pubkey{uncompressed_key.GetPubKey()};
1241+
1242+
const auto script{GetScriptForRawPubKey(uncompressed_pubkey)};
1243+
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEY);
1244+
CHECK_SCRIPT_DYNAMIC_SIZE(script, 67, 67);
1245+
}
1246+
1247+
// Bare multisig needs extra allocation
1248+
{
1249+
const auto script{GetScriptForMultisig(1, std::vector{2, dummy_pubkey})};
1250+
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::MULTISIG);
1251+
CHECK_SCRIPT_DYNAMIC_SIZE(script, 71, 103);
1252+
}
1253+
}
1254+
11541255
/* Wrapper around ProduceSignature to combine two scriptsigs */
11551256
SignatureData CombineSignatures(const CTxOut& txout, const CMutableTransaction& tx, const SignatureData& scriptSig1, const SignatureData& scriptSig2)
11561257
{

src/test/validation_flush_tests.cpp

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,8 @@ BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
2626
LOCK(::cs_main);
2727
auto& view = chainstate.CoinsTip();
2828

29-
// The number of bytes consumed by coin's heap data, i.e. CScript
30-
// (prevector<28, unsigned char>) when assigned 56 bytes of data per above.
31-
//
29+
// The number of bytes consumed by coin's heap data, i.e.
30+
// CScript (prevector<36, unsigned char>) when assigned 56 bytes of data per above.
3231
// See also: Coin::DynamicMemoryUsage().
3332
constexpr unsigned int COIN_SIZE = is_64_bit ? 80 : 64;
3433

0 commit comments

Comments
 (0)