Skip to content

Commit 34ae04d

Browse files
committed
Merge bitcoin/bitcoin#21726: Improve Indices on pruned nodes via prune blockers
71c3f03 move-only: Rename index + pruning functional test (Fabian Jahr) de08932 test: Update test for indices on pruned nodes (Fabian Jahr) 825d198 Index: Allow coinstatsindex with pruning enabled (Fabian Jahr) f08c9fb Index: Use prune locks for blockfilterindex (Fabian Jahr) 2561823 blockstorage: Add prune locks to BlockManager (Fabian Jahr) 231fc7b refactor: Introduce GetFirstStoredBlock helper function (Fabian Jahr) Pull request description: # Motivation The main motivation of this change and only behavior change noticeable by user is to allow running `coinstatsindex` on pruned nodes as has been requested [here for example](https://twitter.com/benthecarman/status/1388170854140452870?s=20). # Background `coinstatsindex` on pruned nodes can be enabled in a much simpler than it is done here but it comes with downside. The ability to run `blockfilterindex`on pruned nodes was added in #15946 but it also added the `blockfilterindex` as a dependency to `validation` and it introduced two new circular dependencies. Enabling `coinstatsindex` on pruned nodes in a similar way would add it as a dependency as well and introduce another circular dependency. Instead, this PR introduces a `m_prune_blockers` map to `BlockManager` as a flexible approach to block pruning. Entities like `blockfilterindex`, for example, can add a key and a height to block pruning over that height. These entities need to update that value to allow more pruning when they are ready. # Alternative approach Upon completing the first draft of this PR I found #19463 as an alternative that follows the same but follows a very different approach. I am listing the main differences here as I see them: - Usage of globals - Blocks pruning with a start and a stop height - Can persist blockers across restarts - Blockers can be set/unset via RPCs Personally, I don't think any of these are necessary to be added here but if the general approach or specific features are more appealing to reviewers I am happy to change to a solution based on that PR or port over specific parts of it here. ACKs for top commit: mzumsande: Code review ACK 71c3f03 ryanofsky: Code review ACK 71c3f03. Changes since last review: just tweaking comments and asserts, and rebasing w0xlt: tACK bitcoin/bitcoin@71c3f03 on signet. Tree-SHA512: de7efda08b44aa31013fbebc47a02cd2de32db170b570f9643e1f013fee0e8e7ca3068952d1acc6e5e74a70910735c5f263437981ad73df841ad945b52d36b71
2 parents 260ede1 + 71c3f03 commit 34ae04d

15 files changed

+265
-116
lines changed

src/index/base.cpp

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -65,21 +65,17 @@ bool BaseIndex::Init()
6565
LOCK(cs_main);
6666
CChain& active_chain = m_chainstate->m_chain;
6767
if (locator.IsNull()) {
68-
m_best_block_index = nullptr;
68+
SetBestBlockIndex(nullptr);
6969
} else {
70-
m_best_block_index = m_chainstate->FindForkInGlobalIndex(locator);
70+
SetBestBlockIndex(m_chainstate->FindForkInGlobalIndex(locator));
7171
}
7272
m_synced = m_best_block_index.load() == active_chain.Tip();
7373
if (!m_synced) {
7474
bool prune_violation = false;
7575
if (!m_best_block_index) {
7676
// index is not built yet
7777
// make sure we have all block data back to the genesis
78-
const CBlockIndex* block = active_chain.Tip();
79-
while (block->pprev && (block->pprev->nStatus & BLOCK_HAVE_DATA)) {
80-
block = block->pprev;
81-
}
82-
prune_violation = block != active_chain.Genesis();
78+
prune_violation = node::GetFirstStoredBlock(active_chain.Tip()) != active_chain.Genesis();
8379
}
8480
// in case the index has a best block set and is not fully synced
8581
// check if we have the required blocks to continue building the index
@@ -138,7 +134,7 @@ void BaseIndex::ThreadSync()
138134
int64_t last_locator_write_time = 0;
139135
while (true) {
140136
if (m_interrupt) {
141-
m_best_block_index = pindex;
137+
SetBestBlockIndex(pindex);
142138
// No need to handle errors in Commit. If it fails, the error will be already be
143139
// logged. The best way to recover is to continue, as index cannot be corrupted by
144140
// a missed commit to disk for an advanced index state.
@@ -150,7 +146,7 @@ void BaseIndex::ThreadSync()
150146
LOCK(cs_main);
151147
const CBlockIndex* pindex_next = NextSyncBlock(pindex, m_chainstate->m_chain);
152148
if (!pindex_next) {
153-
m_best_block_index = pindex;
149+
SetBestBlockIndex(pindex);
154150
m_synced = true;
155151
// No need to handle errors in Commit. See rationale above.
156152
Commit();
@@ -172,7 +168,7 @@ void BaseIndex::ThreadSync()
172168
}
173169

174170
if (last_locator_write_time + SYNC_LOCATOR_WRITE_INTERVAL < current_time) {
175-
m_best_block_index = pindex;
171+
SetBestBlockIndex(pindex);
176172
last_locator_write_time = current_time;
177173
// No need to handle errors in Commit. See rationale above.
178174
Commit();
@@ -230,10 +226,10 @@ bool BaseIndex::Rewind(const CBlockIndex* current_tip, const CBlockIndex* new_ti
230226
// out of sync may be possible but a users fault.
231227
// In case we reorg beyond the pruned depth, ReadBlockFromDisk would
232228
// throw and lead to a graceful shutdown
233-
m_best_block_index = new_tip;
229+
SetBestBlockIndex(new_tip);
234230
if (!Commit()) {
235231
// If commit fails, revert the best block index to avoid corruption.
236-
m_best_block_index = current_tip;
232+
SetBestBlockIndex(current_tip);
237233
return false;
238234
}
239235

@@ -274,7 +270,7 @@ void BaseIndex::BlockConnected(const std::shared_ptr<const CBlock>& block, const
274270
}
275271

276272
if (WriteBlock(*block, pindex)) {
277-
m_best_block_index = pindex;
273+
SetBestBlockIndex(pindex);
278274
} else {
279275
FatalError("%s: Failed to write block %s to index",
280276
__func__, pindex->GetBlockHash().ToString());
@@ -381,3 +377,14 @@ IndexSummary BaseIndex::GetSummary() const
381377
summary.best_block_height = m_best_block_index ? m_best_block_index.load()->nHeight : 0;
382378
return summary;
383379
}
380+
381+
void BaseIndex::SetBestBlockIndex(const CBlockIndex* block) {
382+
assert(!node::fPruneMode || AllowPrune());
383+
384+
m_best_block_index = block;
385+
if (AllowPrune() && block) {
386+
node::PruneLockInfo prune_lock;
387+
prune_lock.height_first = block->nHeight;
388+
WITH_LOCK(::cs_main, m_chainstate->m_blockman.UpdatePruneLock(GetName(), prune_lock));
389+
}
390+
}

src/index/base.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,9 @@ class BaseIndex : public CValidationInterface
7575
/// to a chain reorganization), the index must halt until Commit succeeds or else it could end up
7676
/// getting corrupted.
7777
bool Commit();
78+
79+
virtual bool AllowPrune() const = 0;
80+
7881
protected:
7982
CChainState* m_chainstate{nullptr};
8083

@@ -103,6 +106,9 @@ class BaseIndex : public CValidationInterface
103106
/// Get the name of the index for display in logs.
104107
virtual const char* GetName() const = 0;
105108

109+
/// Update the internal best block index as well as the prune lock.
110+
void SetBestBlockIndex(const CBlockIndex* block);
111+
106112
public:
107113
/// Destructor interrupts sync thread if running and blocks until it exits.
108114
virtual ~BaseIndex();

src/index/blockfilterindex.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ class BlockFilterIndex final : public BaseIndex
3838
/** cache of block hash to filter header, to avoid disk access when responding to getcfcheckpt. */
3939
std::unordered_map<uint256, uint256, FilterHeaderHasher> m_headers_cache GUARDED_BY(m_cs_headers_cache);
4040

41+
bool AllowPrune() const override { return true; }
42+
4143
protected:
4244
bool Init() override;
4345

src/index/coinstatsindex.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ class CoinStatsIndex final : public BaseIndex
3636

3737
bool ReverseBlock(const CBlock& block, const CBlockIndex* pindex);
3838

39+
bool AllowPrune() const override { return true; }
40+
3941
protected:
4042
bool Init() override;
4143

src/index/txindex.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ class TxIndex final : public BaseIndex
2020
private:
2121
const std::unique_ptr<DB> m_db;
2222

23+
bool AllowPrune() const override { return false; }
24+
2325
protected:
2426
bool WriteBlock(const CBlock& block, const CBlockIndex* pindex) override;
2527

src/init.cpp

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -421,7 +421,7 @@ void SetupServerArgs(ArgsManager& argsman)
421421
-GetNumCores(), MAX_SCRIPTCHECK_THREADS, DEFAULT_SCRIPTCHECK_THREADS), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
422422
argsman.AddArg("-persistmempool", strprintf("Whether to save the mempool on shutdown and load on restart (default: %u)", DEFAULT_PERSIST_MEMPOOL), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
423423
argsman.AddArg("-pid=<file>", strprintf("Specify pid file. Relative paths will be prefixed by a net-specific datadir location. (default: %s)", BITCOIN_PID_FILENAME), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
424-
argsman.AddArg("-prune=<n>", strprintf("Reduce storage requirements by enabling pruning (deleting) of old blocks. This allows the pruneblockchain RPC to be called to delete specific blocks, and enables automatic pruning of old blocks if a target size in MiB is provided. This mode is incompatible with -txindex and -coinstatsindex. "
424+
argsman.AddArg("-prune=<n>", strprintf("Reduce storage requirements by enabling pruning (deleting) of old blocks. This allows the pruneblockchain RPC to be called to delete specific blocks and enables automatic pruning of old blocks if a target size in MiB is provided. This mode is incompatible with -txindex. "
425425
"Warning: Reverting this setting requires re-downloading the entire blockchain. "
426426
"(default: 0 = disable pruning blocks, 1 = allow manual pruning via RPC, >=%u = automatically prune block files to stay under the specified target size in MiB)", MIN_DISK_SPACE_FOR_BLOCK_FILES / 1024 / 1024), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
427427
argsman.AddArg("-reindex", "Rebuild chain state and block index from the blk*.dat files on disk. This will also rebuild active optional indexes.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
@@ -858,8 +858,6 @@ bool AppInitParameterInteraction(const ArgsManager& args, bool use_syscall_sandb
858858
if (args.GetIntArg("-prune", 0)) {
859859
if (args.GetBoolArg("-txindex", DEFAULT_TXINDEX))
860860
return InitError(_("Prune mode is incompatible with -txindex."));
861-
if (args.GetBoolArg("-coinstatsindex", DEFAULT_COINSTATSINDEX))
862-
return InitError(_("Prune mode is incompatible with -coinstatsindex."));
863861
if (args.GetBoolArg("-reindex-chainstate", false)) {
864862
return InitError(_("Prune mode is incompatible with -reindex-chainstate. Use full -reindex instead."));
865863
}

src/node/blockstorage.cpp

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
#include <util/system.h>
2222
#include <validation.h>
2323

24+
#include <unordered_map>
25+
2426
namespace node {
2527
std::atomic_bool fImporting(false);
2628
std::atomic_bool fReindex(false);
@@ -230,6 +232,11 @@ void BlockManager::FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPr
230232
nLastBlockWeCanPrune, count);
231233
}
232234

235+
void BlockManager::UpdatePruneLock(const std::string& name, const PruneLockInfo& lock_info) {
236+
AssertLockHeld(::cs_main);
237+
m_prune_locks[name] = lock_info;
238+
}
239+
233240
CBlockIndex* BlockManager::InsertBlockIndex(const uint256& hash)
234241
{
235242
AssertLockHeld(cs_main);
@@ -397,6 +404,16 @@ bool BlockManager::IsBlockPruned(const CBlockIndex* pblockindex)
397404
return (m_have_pruned && !(pblockindex->nStatus & BLOCK_HAVE_DATA) && pblockindex->nTx > 0);
398405
}
399406

407+
const CBlockIndex* GetFirstStoredBlock(const CBlockIndex* start_block) {
408+
AssertLockHeld(::cs_main);
409+
assert(start_block);
410+
const CBlockIndex* last_block = start_block;
411+
while (last_block->pprev && (last_block->pprev->nStatus & BLOCK_HAVE_DATA)) {
412+
last_block = last_block->pprev;
413+
}
414+
return last_block;
415+
}
416+
400417
// If we're using -prune with -reindex, then delete block files that will be ignored by the
401418
// reindex. Since reindexing works by starting at block file 0 and looping until a blockfile
402419
// is missing, do the same here to delete any later block files after a gap. Also delete all

src/node/blockstorage.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
#include <atomic>
1515
#include <cstdint>
16+
#include <unordered_map>
1617
#include <vector>
1718

1819
extern RecursiveMutex cs_main;
@@ -65,6 +66,10 @@ struct CBlockIndexHeightOnlyComparator {
6566
bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const;
6667
};
6768

69+
struct PruneLockInfo {
70+
int height_first{std::numeric_limits<int>::max()}; //! Height of earliest block that should be kept and not pruned
71+
};
72+
6873
/**
6974
* Maintains a tree of blocks (stored in `m_block_index`) which is consulted
7075
* to determine where the most-work tip is.
@@ -118,6 +123,14 @@ class BlockManager
118123
/** Dirty block file entries. */
119124
std::set<int> m_dirty_fileinfo;
120125

126+
/**
127+
* Map from external index name to oldest block that must not be pruned.
128+
*
129+
* @note Internally, only blocks at height (height_first - PRUNE_LOCK_BUFFER - 1) and
130+
* below will be pruned, but callers should avoid assuming any particular buffer size.
131+
*/
132+
std::unordered_map<std::string, PruneLockInfo> m_prune_locks GUARDED_BY(::cs_main);
133+
121134
public:
122135
BlockMap m_block_index GUARDED_BY(cs_main);
123136

@@ -175,12 +188,18 @@ class BlockManager
175188
//! Check whether the block associated with this index entry is pruned or not.
176189
bool IsBlockPruned(const CBlockIndex* pblockindex) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
177190

191+
//! Create or update a prune lock identified by its name
192+
void UpdatePruneLock(const std::string& name, const PruneLockInfo& lock_info) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
193+
178194
~BlockManager()
179195
{
180196
Unload();
181197
}
182198
};
183199

200+
//! Find the first block that is not pruned
201+
const CBlockIndex* GetFirstStoredBlock(const CBlockIndex* start_block) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
202+
184203
void CleanupBlockRevFiles();
185204

186205
/** Open a block file (blk?????.dat) */

src/rpc/blockchain.cpp

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -787,10 +787,9 @@ static RPCHelpMan pruneblockchain()
787787

788788
PruneBlockFilesManual(active_chainstate, height);
789789
const CBlockIndex* block = CHECK_NONFATAL(active_chain.Tip());
790-
while (block->pprev && (block->pprev->nStatus & BLOCK_HAVE_DATA)) {
791-
block = block->pprev;
792-
}
793-
return uint64_t(block->nHeight);
790+
const CBlockIndex* last_block = node::GetFirstStoredBlock(block);
791+
792+
return static_cast<uint64_t>(last_block->nHeight);
794793
},
795794
};
796795
}
@@ -1217,11 +1216,7 @@ RPCHelpMan getblockchaininfo()
12171216
obj.pushKV("pruned", node::fPruneMode);
12181217
if (node::fPruneMode) {
12191218
const CBlockIndex* block = CHECK_NONFATAL(tip);
1220-
while (block->pprev && (block->pprev->nStatus & BLOCK_HAVE_DATA)) {
1221-
block = block->pprev;
1222-
}
1223-
1224-
obj.pushKV("pruneheight", block->nHeight);
1219+
obj.pushKV("pruneheight", node::GetFirstStoredBlock(block)->nHeight);
12251220

12261221
// if 0, execution bypasses the whole if block.
12271222
bool automatic_pruning{args.GetIntArg("-prune", 0) != 1};

src/validation.cpp

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
#include <deploymentstatus.h>
2020
#include <flatfile.h>
2121
#include <hash.h>
22-
#include <index/blockfilterindex.h>
2322
#include <logging.h>
2423
#include <logging/timer.h>
2524
#include <node/blockstorage.h>
@@ -106,6 +105,12 @@ const std::vector<std::string> CHECKLEVEL_DOC {
106105
"level 4 tries to reconnect the blocks",
107106
"each level includes the checks of the previous levels",
108107
};
108+
/** The number of blocks to keep below the deepest prune lock.
109+
* There is nothing special about this number. It is higher than what we
110+
* expect to see in regular mainnet reorgs, but not so high that it would
111+
* noticeably interfere with the pruning mechanism.
112+
* */
113+
static constexpr int PRUNE_LOCK_BUFFER{10};
109114

110115
/**
111116
* Mutex to guard access to validation specific variables, such as reading
@@ -2338,12 +2343,24 @@ bool CChainState::FlushStateToDisk(
23382343
CoinsCacheSizeState cache_state = GetCoinsCacheSizeState();
23392344
LOCK(m_blockman.cs_LastBlockFile);
23402345
if (fPruneMode && (m_blockman.m_check_for_pruning || nManualPruneHeight > 0) && !fReindex) {
2341-
// make sure we don't prune above the blockfilterindexes bestblocks
2346+
// make sure we don't prune above any of the prune locks bestblocks
23422347
// pruning is height-based
2343-
int last_prune = m_chain.Height(); // last height we can prune
2344-
ForEachBlockFilterIndex([&](BlockFilterIndex& index) {
2345-
last_prune = std::max(1, std::min(last_prune, index.GetSummary().best_block_height));
2346-
});
2348+
int last_prune{m_chain.Height()}; // last height we can prune
2349+
std::optional<std::string> limiting_lock; // prune lock that actually was the limiting factor, only used for logging
2350+
2351+
for (const auto& prune_lock : m_blockman.m_prune_locks) {
2352+
if (prune_lock.second.height_first == std::numeric_limits<int>::max()) continue;
2353+
// Remove the buffer and one additional block here to get actual height that is outside of the buffer
2354+
const int lock_height{prune_lock.second.height_first - PRUNE_LOCK_BUFFER - 1};
2355+
last_prune = std::max(1, std::min(last_prune, lock_height));
2356+
if (last_prune == lock_height) {
2357+
limiting_lock = prune_lock.first;
2358+
}
2359+
}
2360+
2361+
if (limiting_lock) {
2362+
LogPrint(BCLog::PRUNE, "%s limited pruning to height %d\n", limiting_lock.value(), last_prune);
2363+
}
23472364

23482365
if (nManualPruneHeight > 0) {
23492366
LOG_TIME_MILLIS_WITH_CATEGORY("find files to prune (manual)", BCLog::BENCH);
@@ -2581,6 +2598,18 @@ bool CChainState::DisconnectTip(BlockValidationState& state, DisconnectedBlockTr
25812598
assert(flushed);
25822599
}
25832600
LogPrint(BCLog::BENCH, "- Disconnect block: %.2fms\n", (GetTimeMicros() - nStart) * MILLI);
2601+
2602+
{
2603+
// Prune locks that began at or after the tip should be moved backward so they get a chance to reorg
2604+
const int max_height_first{pindexDelete->nHeight - 1};
2605+
for (auto& prune_lock : m_blockman.m_prune_locks) {
2606+
if (prune_lock.second.height_first <= max_height_first) continue;
2607+
2608+
prune_lock.second.height_first = max_height_first;
2609+
LogPrint(BCLog::PRUNE, "%s prune lock moved back to %d\n", prune_lock.first, max_height_first);
2610+
}
2611+
}
2612+
25842613
// Write the chain state to disk, if necessary.
25852614
if (!FlushStateToDisk(state, FlushStateMode::IF_NEEDED)) {
25862615
return false;

0 commit comments

Comments
 (0)