Skip to content

Commit cc12b89

Browse files
committed
Merge bitcoin/bitcoin#24858: incorrect blk file size calculation during reindex results in recoverable blk file corruption
bcb0cac reindex, log, test: fixes #21379 (mruddy) Pull request description: Fixes #21379. The blocks/blk?????.dat files are mutated and become increasingly malformed, or corrupt, as a result of running the re-indexing process. The mutations occur after the re-indexing process has finished, as new blocks are appended, but are a result of a re-indexing process miscalculation that lingers in the block manager's `m_blockfile_info` `nSize` data until node restart. These additions to the blk files are non-fatal, but also not desirable. That is, this is a form of data corruption that the reading code is lenient enough to process (it skips the extra bytes), but it adds some scary looking log messages as it encounters them. The summary of the problem is that the re-index process double counts the size of the serialization header (magic message start bytes [4 bytes] + length [4 bytes] = 8 bytes) while calculating the blk data file size (both values already account for the serialization header's size, hence why it is over accounted). This bug manifests itself in a few different ways, after re-indexing, when a new block from a peer is processed: 1. If the new block will not fit into the last blk file processed while re-indexing, while remaining under the 128MiB limit, then the blk file is flushed to disk and truncated to a size that is 8 greater than it should be. The truncation adds zero bytes (see `FlatFileSeq::Flush` and `TruncateFile`). 1. If the last blk file processed while re-indexing has logical space for the new block under the 128 MiB limit: 1. If the blk file was not already large enough to hold the new block, then the zeros are, in effect, added by `fseek` when the file is opened for writing. Eight zero bytes are added to the end of the last blk file just before the new block is written. This happens because the write offset is 8 too great due to the miscalculation. The result is 8 zero bytes between the end of the last block and the beginning of the next block's magic + length + block. 1. If the blk file was already large enough to hold the new block, then the current existing file contents remain in the 8 byte gap between the end of the last block and the beginning of the next block's magic + length + block. Commonly, when this occcurs, it is due to the blk file containing blocks that are not connected to the block tree during reindex and are thus left behind by the reindex process and later overwritten when new blocks are added. The orphaned blocks can be valid blocks, but due to the nature of concurrent block download, the parent may not have been retrieved and written by the time the node was previously shutdown. ACKs for top commit: LarryRuane: tested code-review ACK bcb0cac ryanofsky: Code review ACK bcb0cac. This is a disturbing bug with an easy fix which seems well-worth merging. mzumsande: ACK bcb0cac (reviewed code and did some testing, I agree that it fixes the bug). w0xlt: tACK bitcoin/bitcoin@bcb0cac Tree-SHA512: acc97927ea712916506772550451136b0f1e5404e92df24cc05e405bb09eb6fe7c3011af3dd34a7723c3db17fda657ae85fa314387e43833791e9169c0febe51
2 parents 1d277f4 + bcb0cac commit cc12b89

File tree

5 files changed

+68
-5
lines changed

5 files changed

+68
-5
lines changed

src/Makefile.test.include

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ BITCOIN_TESTS =\
7777
test/blockencodings_tests.cpp \
7878
test/blockfilter_index_tests.cpp \
7979
test/blockfilter_tests.cpp \
80+
test/blockmanager_tests.cpp \
8081
test/bloom_tests.cpp \
8182
test/bswap_tests.cpp \
8283
test/checkqueue_tests.cpp \

src/node/blockstorage.cpp

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -789,19 +789,24 @@ bool ReadRawBlockFromDisk(std::vector<uint8_t>& block, const FlatFilePos& pos, c
789789
return true;
790790
}
791791

792-
/** Store block on disk. If dbp is non-nullptr, the file is known to already reside on disk */
793792
FlatFilePos BlockManager::SaveBlockToDisk(const CBlock& block, int nHeight, CChain& active_chain, const CChainParams& chainparams, const FlatFilePos* dbp)
794793
{
795794
unsigned int nBlockSize = ::GetSerializeSize(block, CLIENT_VERSION);
796795
FlatFilePos blockPos;
797-
if (dbp != nullptr) {
796+
const auto position_known {dbp != nullptr};
797+
if (position_known) {
798798
blockPos = *dbp;
799+
} else {
800+
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
801+
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
802+
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
803+
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);
799804
}
800-
if (!FindBlockPos(blockPos, nBlockSize + 8, nHeight, active_chain, block.GetBlockTime(), dbp != nullptr)) {
805+
if (!FindBlockPos(blockPos, nBlockSize, nHeight, active_chain, block.GetBlockTime(), position_known)) {
801806
error("%s: FindBlockPos failed", __func__);
802807
return FlatFilePos();
803808
}
804-
if (dbp == nullptr) {
809+
if (!position_known) {
805810
if (!WriteBlockToDisk(block, blockPos, chainparams.MessageStart())) {
806811
AbortNode("Failed to write block");
807812
return FlatFilePos();

src/node/blockstorage.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ static const unsigned int UNDOFILE_CHUNK_SIZE = 0x100000; // 1 MiB
4444
/** The maximum size of a blk?????.dat file (since 0.8) */
4545
static const unsigned int MAX_BLOCKFILE_SIZE = 0x8000000; // 128 MiB
4646

47+
/** Size of header written by WriteBlockToDisk before a serialized CBlock */
48+
static constexpr size_t BLOCK_SERIALIZATION_HEADER_SIZE = CMessageHeader::MESSAGE_START_SIZE + sizeof(unsigned int);
49+
4750
extern std::atomic_bool fImporting;
4851
extern std::atomic_bool fReindex;
4952
/** Pruning-related variables and constants */
@@ -171,6 +174,7 @@ class BlockManager
171174
bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex* pindex, const CChainParams& chainparams)
172175
EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
173176

177+
/** Store block on disk. If dbp is not nullptr, then it provides the known position of the block within a block file on disk. */
174178
FlatFilePos SaveBlockToDisk(const CBlock& block, int nHeight, CChain& active_chain, const CChainParams& chainparams, const FlatFilePos* dbp);
175179

176180
/** Calculate the amount of disk space the block & undo files currently use */

src/test/blockmanager_tests.cpp

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// Copyright (c) 2022 The Bitcoin Core developers
2+
// Distributed under the MIT software license, see the accompanying
3+
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+
#include <chainparams.h>
6+
#include <node/blockstorage.h>
7+
#include <node/context.h>
8+
#include <validation.h>
9+
10+
#include <boost/test/unit_test.hpp>
11+
#include <test/util/setup_common.h>
12+
13+
using node::BlockManager;
14+
using node::BLOCK_SERIALIZATION_HEADER_SIZE;
15+
16+
// use BasicTestingSetup here for the data directory configuration, setup, and cleanup
17+
BOOST_FIXTURE_TEST_SUITE(blockmanager_tests, BasicTestingSetup)
18+
19+
BOOST_AUTO_TEST_CASE(blockmanager_find_block_pos)
20+
{
21+
const auto params {CreateChainParams(ArgsManager{}, CBaseChainParams::MAIN)};
22+
BlockManager blockman {};
23+
CChain chain {};
24+
// simulate adding a genesis block normally
25+
BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0, chain, *params, nullptr).nPos, BLOCK_SERIALIZATION_HEADER_SIZE);
26+
// simulate what happens during reindex
27+
// simulate a well-formed genesis block being found at offset 8 in the blk00000.dat file
28+
// the block is found at offset 8 because there is an 8 byte serialization header
29+
// consisting of 4 magic bytes + 4 length bytes before each block in a well-formed blk file.
30+
FlatFilePos pos{0, BLOCK_SERIALIZATION_HEADER_SIZE};
31+
BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0, chain, *params, &pos).nPos, BLOCK_SERIALIZATION_HEADER_SIZE);
32+
// now simulate what happens after reindex for the first new block processed
33+
// the actual block contents don't matter, just that it's a block.
34+
// verify that the write position is at offset 0x12d.
35+
// this is a check to make sure that https://github.com/bitcoin/bitcoin/issues/21379 does not recur
36+
// 8 bytes (for serialization header) + 285 (for serialized genesis block) = 293
37+
// add another 8 bytes for the second block's serialization header and we get 293 + 8 = 301
38+
FlatFilePos actual{blockman.SaveBlockToDisk(params->GenesisBlock(), 1, chain, *params, nullptr)};
39+
BOOST_CHECK_EQUAL(actual.nPos, BLOCK_SERIALIZATION_HEADER_SIZE + ::GetSerializeSize(params->GenesisBlock(), CLIENT_VERSION) + BLOCK_SERIALIZATION_HEADER_SIZE);
40+
}
41+
42+
BOOST_AUTO_TEST_SUITE_END()

src/validation.cpp

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4476,7 +4476,18 @@ void Chainstate::LoadExternalBlockFile(
44764476
}
44774477
}
44784478
} catch (const std::exception& e) {
4479-
LogPrintf("%s: Deserialize or I/O error - %s\n", __func__, e.what());
4479+
// historical bugs added extra data to the block files that does not deserialize cleanly.
4480+
// commonly this data is between readable blocks, but it does not really matter. such data is not fatal to the import process.
4481+
// the code that reads the block files deals with invalid data by simply ignoring it.
4482+
// it continues to search for the next {4 byte magic message start bytes + 4 byte length + block} that does deserialize cleanly
4483+
// and passes all of the other block validation checks dealing with POW and the merkle root, etc...
4484+
// we merely note with this informational log message when unexpected data is encountered.
4485+
// we could also be experiencing a storage system read error, or a read of a previous bad write. these are possible, but
4486+
// less likely scenarios. we don't have enough information to tell a difference here.
4487+
// the reindex process is not the place to attempt to clean and/or compact the block files. if so desired, a studious node operator
4488+
// may use knowledge of the fact that the block files are not entirely pristine in order to prepare a set of pristine, and
4489+
// perhaps ordered, block files for later reindexing.
4490+
LogPrint(BCLog::REINDEX, "%s: unexpected data at file offset 0x%x - %s. continuing\n", __func__, (nRewind - 1), e.what());
44804491
}
44814492
}
44824493
} catch (const std::runtime_error& e) {

0 commit comments

Comments
 (0)