This repository was archived by the owner on Oct 10, 2025. It is now read-only.
Fix OverflowFile checkpoint corruption when no data is written#6046
Open
1amageek wants to merge 1 commit intokuzudb:masterfrom
Open
Fix OverflowFile checkpoint corruption when no data is written#60461amageek wants to merge 1 commit intokuzudb:masterfrom
1amageek wants to merge 1 commit intokuzudb:masterfrom
Conversation
Fixes a bug where OverflowFile::checkpoint() unconditionally allocated a header page even when no data had been written, causing PrimaryKeyIndexStorageInfo corruption and database reopen failures. **Root cause:** When creating a VectorIndex without inserting data, OverflowFile::checkpoint() allocated a page unnecessarily, setting overflowHeaderPage to a valid page index instead of INVALID_PAGE_IDX. **Fix:** Skip checkpoint when headerChanged == false, following the same design pattern as NodeTable and RelTable. The headerChanged flag is only set to true when actual string data (>12 bytes) is written via OverflowFileHandle::setStringOverflow(). **Test coverage:** Added overflow_file_checkpoint_test.cpp with 5 test cases documenting the bug scenario and verifying correct behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #6045
Fixes a critical bug where
OverflowFile::checkpoint()unconditionally allocated a header page even when no data had been written, causingPrimaryKeyIndexStorageInfocorruption and database reopen failures.Problem
When creating a VectorIndex without inserting any data, the database checkpoint completes successfully but corrupts the metadata. Reopening the database fails with an assertion error in
hash_index.cpp:487:Minimal Reproduction
Root Cause
In
src/storage/overflow_file.cpp:236,OverflowFile::checkpoint()was unconditionally allocating a page even when no data had been written:Sequence of events:
PrimaryKeyIndex(for STRING primary key)PrimaryKeyIndexcreates anOverflowFile(for strings >12 bytes) withheaderPageIdx = INVALID_PAGE_IDXOverflowFile::checkpoint()allocates a page unnecessarilyPrimaryKeyIndexStorageInfo.overflowHeaderPage = 1(should beINVALID_PAGE_IDX)Solution
Skip checkpoint when
headerChanged == false, following the same design pattern asNodeTable::checkpoint()andRelTable::checkpoint():The
headerChangedflag is only set totruewhen actual string data (>12 bytes) is written viaOverflowFileHandle::setStringOverflow().Benefits
Testing
Added comprehensive test suite in
test/storage/overflow_file_checkpoint_test.cppwith 5 test cases:InMemOverflowFileAlwaysAllocatesHeader- Verifies in-memory behaviorShortStringsDoNotTriggerOverflow- Verifies strings ≤12 bytes are inlinedLongStringsDoTriggerOverflow- Verifies strings >12 bytes use overflowEmptyOverflowFileHeaderNotChanged- Documents the core bug fixVectorIndexCreationSequence- Documents the bug scenarioAll tests pass:
Files Changed
src/storage/overflow_file.cpp- Added early return whenheaderChanged == falsetest/storage/CMakeLists.txt- Added new test targettest/storage/overflow_file_checkpoint_test.cpp- New test fileImpact
This fix resolves crashes when: