-
-
Notifications
You must be signed in to change notification settings - Fork 47
feat: add verify_integrity() for full-file checksum verification #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
polaz
wants to merge
116
commits into
fjall-rs:main
from
structured-world:feat/#187-verify-integrity
Closed
Changes from 3 commits
Commits
Show all changes
116 commits
Select commit
Hold shift + click to select a range
947828b
feat: add verify_integrity() for full-file checksum verification
polaz 3fa5f56
refactor(verify): stream file checksums and harden public API
polaz ddf8e9d
fix(verify): use u64 instead of private BlobFileId in public API
polaz 221ea21
feat(verify): implement std::error::Error for IntegrityError
polaz 19f4b66
refactor(verify): add #[must_use] to verify_integrity return value
polaz 48824b9
docs(verify): document public visibility of enum variant fields
polaz dc5406b
docs(verify): annotate non-obvious hasher and import choices
polaz cd948eb
refactor(verify): use Xxh3Default for consistency, simplify assertion
polaz 52f4d68
refactor(verify): use Xxh3Default::new() matching ChecksummedWriter
polaz dbb4ee4
feat: add optimized contains_prefix() method
polaz 453e729
refactor(contains_prefix): accurate doc wording and test corrections
polaz c25e693
refactor(blob_tree): accurate contains_prefix override note
polaz 273a801
feat: add multi_get() for batch point reads
polaz 1962eb5
perf: seqno-aware seek in data block point reads
polaz c52ec80
docs(test): clarify seqno snapshot visibility in test comment
polaz 236e4a2
fix: add default impl for multi_get and BlobTree test
polaz 0513f33
docs(data_block): precise seek_to_key_seqno guarantees
polaz 972ea55
feat: add SequenceNumberGenerator trait
polaz 42d2c64
perf(data_block): single cmp in seek_to_key_seqno predicate
polaz f9bc2f0
fix: clarify multi_get docs and use existing helpers
polaz cbf88d3
docs(test): describe restart_interval loop coverage
polaz 51ea6cf
refactor: extract BlobTree::resolve_key helper, add unsorted key test
polaz fdb55f7
fix: remove unused imports and invalid #[must_use] on trait impl
polaz cf3b0e2
fix: remove unused SequenceNumberGenerator imports from test files
polaz 837a7f3
feat: add inherent method wrappers on SequenceNumberCounter
polaz 0957a0b
docs(seqno): document UnwindSafe supertrait requirement
polaz a6eebcb
docs(seqno): document trait invariants and add #[must_use] to trait m…
polaz 69e9691
test: add smoke test for custom SequenceNumberGenerator
polaz 36f23e0
fix(seqno): enforce MSB boundary in SequenceNumberCounter
polaz 334edc2
fix(test): enforce MSB invariant in custom generator example
polaz bc16549
docs(seqno): explain fetch_add vs CAS loop design choice in next()
polaz b0248bc
refactor: polish docs and consolidate Config impl block
polaz 65ba5f3
docs(seqno): exempt set() from monotonicity invariant
polaz 0f57228
docs(seqno): clarify orphan rule compliance for From impl
polaz 2fc8fe8
docs(seqno): add Panics section to set() wrapper
polaz 3dd7b1c
fix(seqno): use fetch_update instead of fetch_add in next()
polaz e14a15d
fix(test): use fetch_update in OffsetGenerator::next()
polaz 8b7a67d
docs(config): reflow new_with_generators doc comment
polaz 787f183
fix(version): clamp visible_seqno to stay below reserved MSB range
polaz de7e24b
refactor(seqno): centralize MAX_SEQNO as public constant
polaz 3320180
fix(seqno): align OffsetGenerator boundaries with SequenceNumberCounter
polaz 1fddda0
perf(data_block): seqno-aware seek for iterator bounds
polaz 2b0b265
refactor(data_block): dedup seek predicate, harden seqno tests
polaz 95ae8ab
fix(docs): add backticks around identifiers in seek_to_key_seqno doc
polaz d72761d
fix(seqno): resolve clippy warnings in SequenceNumberCounter
polaz a03b0de
ci: add CoordiNode CI and upstream monitor workflows
polaz 2462f33
docs: add maintained fork notice and support section
polaz d456379
ci: add dependabot configuration for cargo and actions
polaz 68faa56
ci: add release-plz workflow for automated changelog and releases
polaz 9bf3cf8
ci: split PR checks from full matrix, reduce PR to lint + ubuntu test
polaz 7d275e3
Merge branch 'main' into feat/#187-verify-integrity
polaz 3c7368c
Merge branch 'main' into feat/#138-optimized-containsprefix
polaz 2f119dd
Merge branch 'main' into feat/#237-data-block-seqno-aware-seek
polaz 61c4701
Merge branch 'main' into feat/#96-multi-get
polaz d9ac1c2
Merge branch 'main' into feat/#174-sequencenumbergenerator-trait
polaz 994436c
fix: resolve all clippy warnings for strict -D warnings CI
polaz e16fce2
Merge remote-tracking branch 'origin/main' into fix/#2-clippy-warnings
polaz c21d272
fix(decompress): use runtime validation instead of debug_assert for b…
polaz cb85fd4
test(block): add corruption test for lz4 byte count validation
polaz a6a675a
test(vlog): add corruption test for lz4 blob reader byte count valida…
polaz 8f8a154
fix(filter,vlog): guard zero-key division and use checked cast
polaz 5607259
fix(test): use lz4_flex::compress instead of compress_prepend_size
polaz 0376989
docs: add Copilot review instructions with scope and issue-suggestion…
polaz e967130
Merge remote-tracking branch 'origin/main' into fix/#2-clippy-warnings
polaz b22f937
ci: add Copilot code review instructions with scope rules
polaz a677f03
Merge remote-tracking branch 'origin/main' into fix/#2-clippy-warnings
polaz dbb763a
refactor: upgrade #[allow] to #[expect] with reasons on all suppressions
polaz 5a0575e
docs(table): expand get_highest_seqno docstring, add mixed insert+ing…
polaz 84562fa
Merge remote-tracking branch 'upstream/main'
polaz 3f65399
refactor: compute add_size as usize, remove unreachable wildcard arms
polaz fc10b94
Merge branch 'main' into fix/#2-clippy-warnings
polaz 364f366
Merge branch 'fix/#2-clippy-warnings' of github.com:structured-world/…
polaz 1a7995a
fix(blob,block): use checked_add for read_len, document size cap scope
polaz 0cee933
Merge pull request #12 from structured-world/fix/#2-clippy-warnings
polaz d811d02
Merge branch 'main' into docs/#265-seqno-docstring-and-test
polaz f606019
Merge branch 'main' into feat/#174-sequencenumbergenerator-trait
polaz 0ea0654
Merge branch 'main' into feat/#237-data-block-seqno-aware-seek
polaz 80283a2
Merge branch 'main' into feat/#138-optimized-containsprefix
polaz 72e3ea1
Merge branch 'main' into feat/#187-verify-integrity
polaz cccff65
Merge pull request #14 from structured-world/docs/#265-seqno-docstrin…
polaz b90986a
Merge branch 'main' into feat/#174-sequencenumbergenerator-trait
polaz 9047beb
Merge branch 'main' into feat/#96-multi-get
polaz 31fdb57
Merge branch 'main' into feat/#237-data-block-seqno-aware-seek
polaz b374e6d
Merge branch 'main' into feat/#138-optimized-containsprefix
polaz c846de3
Merge branch 'main' into feat/#187-verify-integrity
polaz 5e4cbda
refactor(verify): harden IntegrityReport API and improve I/O
polaz 4d71fb1
fix: address review feedback on contains_prefix
polaz caa3bb0
docs(test): clarify snapshot seqno semantics in multi_get test
polaz 2590eb9
docs(data_block): document why reverse seeks accept but ignore seqno
polaz 9043459
fix(verify): resolve strict clippy warnings
polaz 7e102a3
fix(seqno): clarify docs and remove stale clippy suppression
polaz 4a7d0ae
Merge pull request #6 from structured-world/feat/#138-optimized-conta…
polaz 4c40606
Merge remote-tracking branch 'origin/main' into feat/#237-data-block-…
polaz 5f42fc9
Merge remote-tracking branch 'origin/main' into feat/#174-sequencenum…
polaz de92363
Merge remote-tracking branch 'origin/main' into feat/#187-verify-inte…
polaz f164533
chore(merge): merge main into feat/#96-multi-get
polaz f2fe67a
refactor(seqno): improve test names and use MAX_SEQNO constant
polaz 2e543c5
refactor(verify): remove redundant BufReader, document get() guard
polaz 1f3dd30
Merge pull request #8 from structured-world/feat/#237-data-block-seqn…
polaz eb8d5ed
chore(merge): merge main into feat/#96-multi-get
polaz 80d489d
Merge remote-tracking branch 'origin/main' into feat/#174-sequencenum…
polaz cc77932
test(verify): cover Display, Error::source, and blob IoError paths
polaz dafc12a
docs(seqno): clarify next() pre-increment and get() semantics
polaz c6fb576
Merge branch 'main' into feat/#174-sequencenumbergenerator-trait
polaz 1522007
Merge branch 'feat/#174-sequencenumbergenerator-trait' of github.com:…
polaz 66a36a6
Merge branch 'main' into feat/#187-verify-integrity
polaz d21b7ef
Merge pull request #9 from structured-world/feat/#96-multi-get
polaz 44cbc41
Merge remote-tracking branch 'origin/main' into feat/#187-verify-inte…
polaz 164df5d
Merge remote-tracking branch 'origin/feat/#187-verify-integrity' into…
polaz 1b51866
Merge branch 'main' into feat/#174-sequencenumbergenerator-trait
polaz 07956ec
fix(verify): handle EINTR in stream_checksum read loop
polaz 33690b5
Merge branch 'main' into feat/#187-verify-integrity
polaz 5bc4d19
fix(seqno): canonical should_panic syntax, precise trait docs, panic …
polaz d4f44ee
Merge branch 'feat/#174-sequencenumbergenerator-trait' of github.com:…
polaz 86c2171
Merge pull request #10 from structured-world/feat/#174-sequencenumber…
polaz 40279ed
Merge branch 'main' into feat/#187-verify-integrity
polaz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,192 @@ | ||
| // Copyright (c) 2024-present, fjall-rs | ||
| // This source code is licensed under both the Apache 2.0 and MIT License | ||
| // (found in the LICENSE-* files in the repository) | ||
|
|
||
| use crate::{checksum::Checksum, table::TableId}; | ||
| use std::path::PathBuf; | ||
|
|
||
| /// Describes a single integrity error found during verification. | ||
| #[derive(Debug)] | ||
|
polaz marked this conversation as resolved.
|
||
| #[non_exhaustive] | ||
| pub enum IntegrityError { | ||
| /// Full-file checksum mismatch for an SST table. | ||
| SstFileCorrupted { | ||
| /// Table ID | ||
| table_id: TableId, | ||
| /// Path to the corrupted file | ||
| path: PathBuf, | ||
| /// Checksum stored in the manifest | ||
| expected: Checksum, | ||
| /// Checksum computed from disk | ||
| got: Checksum, | ||
| }, | ||
|
|
||
| /// Full-file checksum mismatch for a blob file. | ||
| BlobFileCorrupted { | ||
| /// Blob file ID | ||
| blob_file_id: u64, | ||
| /// Path to the corrupted file | ||
|
polaz marked this conversation as resolved.
|
||
| path: PathBuf, | ||
| /// Checksum stored in the manifest | ||
| expected: Checksum, | ||
| /// Checksum computed from disk | ||
| got: Checksum, | ||
| }, | ||
|
polaz marked this conversation as resolved.
polaz marked this conversation as resolved.
|
||
|
|
||
| /// I/O error while reading a file during verification. | ||
| IoError { | ||
| /// Path to the file that could not be read | ||
| path: PathBuf, | ||
| /// The underlying I/O error | ||
| error: std::io::Error, | ||
|
polaz marked this conversation as resolved.
|
||
| }, | ||
|
polaz marked this conversation as resolved.
|
||
| } | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| impl std::fmt::Display for IntegrityError { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| match self { | ||
| Self::SstFileCorrupted { | ||
| table_id, | ||
| path, | ||
| expected, | ||
| got, | ||
| } => write!( | ||
| f, | ||
| "SST table {table_id} corrupted at {}: expected {expected}, got {got}", | ||
| path.display() | ||
| ), | ||
| Self::BlobFileCorrupted { | ||
| blob_file_id, | ||
| path, | ||
| expected, | ||
| got, | ||
| } => write!( | ||
| f, | ||
| "blob file {blob_file_id} corrupted at {}: expected {expected}, got {got}", | ||
| path.display() | ||
| ), | ||
| Self::IoError { path, error } => { | ||
| write!(f, "I/O error reading {}: {}", path.display(), error) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Result of an integrity verification scan. | ||
| #[derive(Debug)] | ||
| pub struct IntegrityReport { | ||
| /// Number of SST table files checked. | ||
| pub sst_files_checked: usize, | ||
|
|
||
| /// Number of blob files checked. | ||
| pub blob_files_checked: usize, | ||
|
|
||
| /// Integrity errors found during verification. | ||
| pub errors: Vec<IntegrityError>, | ||
| } | ||
|
|
||
| impl IntegrityReport { | ||
| /// Returns `true` if no errors were found. | ||
| #[must_use] | ||
| pub fn is_ok(&self) -> bool { | ||
| self.errors.is_empty() | ||
| } | ||
|
|
||
| /// Total number of files checked (SST + blob). | ||
| #[must_use] | ||
| pub fn files_checked(&self) -> usize { | ||
| self.sst_files_checked + self.blob_files_checked | ||
| } | ||
| } | ||
|
|
||
| /// Computes a streaming XXH3 128-bit checksum for a file without loading it entirely into memory. | ||
| fn stream_checksum(path: &std::path::Path) -> std::io::Result<Checksum> { | ||
| use std::io::Read; | ||
|
|
||
| let mut reader = std::io::BufReader::new(std::fs::File::open(path)?); | ||
| let mut hasher = xxhash_rust::xxh3::Xxh3::default(); | ||
|
polaz marked this conversation as resolved.
Outdated
polaz marked this conversation as resolved.
Outdated
|
||
| let mut buf = [0u8; 8192]; | ||
|
|
||
| loop { | ||
| let n = reader.read(&mut buf)?; | ||
| if n == 0 { | ||
| break; | ||
| } | ||
| hasher.update(&buf[..n]); | ||
| } | ||
|
|
||
| Ok(Checksum::from_raw(hasher.digest128())) | ||
| } | ||
|
|
||
| /// Verifies full-file checksums for all SST and blob files in the given tree. | ||
| /// | ||
| /// Each file's content is read from disk and hashed with XXHash-3 128-bit, | ||
| /// then compared against the checksum stored in the version manifest. | ||
| /// | ||
| /// This detects silent bit-rot, partial writes, and other on-disk corruption. | ||
| /// | ||
| /// Per-file errors (e.g., unreadable files, checksum mismatches) are collected | ||
| /// into [`IntegrityReport::errors`] — the scan always runs to completion. | ||
| pub fn verify_integrity(tree: &impl crate::AbstractTree) -> IntegrityReport { | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
| let version = tree.current_version(); | ||
|
|
||
| let mut report = IntegrityReport { | ||
| sst_files_checked: 0, | ||
| blob_files_checked: 0, | ||
| errors: Vec::new(), | ||
| }; | ||
|
|
||
| // Verify all SST table files | ||
| for table in version.iter_tables() { | ||
| let path = &*table.path; | ||
| let expected = table.checksum(); | ||
|
|
||
| match stream_checksum(path) { | ||
| Ok(got) if got != expected => { | ||
| report.errors.push(IntegrityError::SstFileCorrupted { | ||
| table_id: table.id(), | ||
| path: path.to_path_buf(), | ||
| expected, | ||
| got, | ||
| }); | ||
| } | ||
| Ok(_) => {} | ||
| Err(e) => { | ||
| report.errors.push(IntegrityError::IoError { | ||
| path: path.to_path_buf(), | ||
| error: e, | ||
| }); | ||
| } | ||
| } | ||
|
|
||
| report.sst_files_checked += 1; | ||
| } | ||
|
|
||
| // Verify all blob files | ||
| for blob_file in version.blob_files.iter() { | ||
| let path = blob_file.path(); | ||
| let expected = blob_file.checksum(); | ||
|
|
||
| match stream_checksum(path) { | ||
| Ok(got) if got != expected => { | ||
| report.errors.push(IntegrityError::BlobFileCorrupted { | ||
| blob_file_id: blob_file.id(), | ||
| path: path.to_path_buf(), | ||
| expected, | ||
| got, | ||
| }); | ||
| } | ||
| Ok(_) => {} | ||
| Err(e) => { | ||
| report.errors.push(IntegrityError::IoError { | ||
| path: path.to_path_buf(), | ||
| error: e, | ||
| }); | ||
| } | ||
| } | ||
|
|
||
| report.blob_files_checked += 1; | ||
| } | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| report | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.