feat: make sure mithril package files are stable#722
Conversation
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WalkthroughAdds in-memory tar.gz archive construction and archive metadata parsing, computes resume points from existing archives, and implements resumable block packaging with tail-archive replacement, duplicate avoidance, and enhanced logging and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant System
participant Archives
participant BlockLoader
participant Archiver
participant Storage
System->>Archives: list_existing_archives()
Archives-->>System: archive metadata list
System->>System: parse_archive_bounds() / resume_point_for_archives()
rect rgba(100,150,200,0.5)
Note over System,BlockLoader: Resume from computed point
System->>BlockLoader: read_blocks(from=resume_point)
BlockLoader-->>System: batch of blocks (BTreeMap<Point, Vec<u8>>)
end
System->>Archiver: prepare batch (skip resume marker if present)
Archiver->>Archiver: group into BLOCKS_PER_ARCHIVE, archive_name_for_blocks()
Archiver->>Archiver: build_archive_bytes() (tar.gz in-memory)
rect rgba(150,200,100,0.5)
Note over Archiver,Storage: Archive Management
Archiver->>Storage: check archive exists
alt tail archive should be replaced
Storage-->>Archiver: exists (tail)
Archiver->>Storage: replace archive
else new archive
Storage-->>Archiver: not found
Archiver->>Storage: write new archive
end
end
System->>System: update resume_point & log progress
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
facf993 to
e5aad49
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/amaru/src/bin/ledger/cmd/mithril.rs`:
- Around line 201-207: archive_name_for_blocks currently derives ordering from
BTreeMap<String, &Vec<u8>> which sorts lexicographically and breaks numeric slot
ordering; change the map key to a structured ordering key (e.g. Point or a tuple
like (slot: u64, hash: String) used elsewhere) and use that for first/last
lookup in archive_name_for_blocks and the same related code at the other
occurrence (lines referenced 452-459), then only format the human-readable
filename (format!("{first}__{last}.tar.gz")) at the point of writing the archive
so ordering is based on numeric slot/hash rather than rendered filenames.
- Around line 164-173: The current package_blocks implementation writes the new
archive directly then removes the old tail, which can leave a missing or partial
archive if the process dies in between; change package_blocks (and the other
archive-writing sites in this file that use
archive_path_for_blocks/blocks_dir/build_archive_bytes) to a safe write pattern:
create a temporary file in the same directory (e.g., archive_path + ".tmp" or
use a unique suffix), write compressed bytes to that temp file, flush and sync
the file to disk, atomically rename (fs::rename) the temp file to the final
archive_path, and only after the rename remove any stale tail file; ensure you
use the same blocks_dir/ archive_path_for_blocks logic so rename is on the same
filesystem and replicate this pattern for the other places that write .tar.gz
archives.
- Around line 430-433: The calculation of from_chunk can underflow when
resume_point is Point::Origin or in chunk 0; change the subtraction logic in the
block around resume_point_for_archives, get_latest_chunk and
infer_chunk_from_slot so you never subtract 1 from 0 — use a saturating
subtraction (e.g., chunk.saturating_sub(1)) or a checked/conditional branch to
clamp at 0 when calling
infer_chunk_from_slot(resume_point.slot_or_default().into()) before assigning to
from_chunk; update the assignment of from_chunk to use that safe value instead
of the raw `- 1`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bfd8e5fe-5eae-4945-ad9b-a9798cffa51f
📒 Files selected for processing (1)
crates/amaru/src/bin/ledger/cmd/mithril.rs
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
crates/amaru/src/bin/ledger/cmd/mithril.rs (3)
164-173:⚠️ Potential issue | 🟠 MajorMake the tail swap crash-safe.
Lines 170-171 write straight to the final archive, Lines 220-224 trust any
*.tar.gzfilename, and Lines 474-476 delete the old tail before the replacement is durable. If the process cops it there, the next resume can happily treat a truncated file as valid. Write to a temp file in the same directory,sync_all,rename, then remove the stale tail only after the rename lands.🛠️ Safer write path
fn package_blocks(network: &NetworkName, blocks: &BTreeMap<String, &Vec<u8>>) -> io::Result<String> { let compressed = build_archive_bytes(blocks)?; let dir = blocks_dir(*network); fs::create_dir_all(&dir)?; let archive_path = archive_path_for_blocks(network, blocks).expect("blocks map is non-empty here by construction"); - let mut file = File::create(&archive_path)?; - file.write_all(&compressed)?; + let tmp_path = format!("{archive_path}.tmp"); + let mut file = File::create(&tmp_path)?; + file.write_all(&compressed)?; + file.sync_all()?; + fs::rename(&tmp_path, &archive_path)?; Ok(archive_path) }Then move the
remove_file()block so it runs only afterpackage_blocks()succeeds.Also applies to: 214-224, 473-477
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/amaru/src/bin/ledger/cmd/mithril.rs` around lines 164 - 173, package_blocks currently writes the compressed archive directly to the final archive path which can lead to truncated files on crash; change it to write to a temporary file in the same directory (use blocks_dir and archive_path_for_blocks to derive path), flush and sync the temp file (call sync_all on the File or its parent dir), then atomically rename the temp into place before returning the final path; also move any remove_file(old_tail) logic so it executes only after package_blocks returns successfully (i.e., after the rename has landed) to ensure the stale tail is deleted only when the new archive is durably installed.
201-207:⚠️ Potential issue | 🟠 MajorDon’t use rendered filenames as the ordering key.
Because this is a
BTreeMap<String, _>,99999...cborand100000...cborsort like strings, not slots. That can skew archive bounds, tar member order, and the tail-match check once the slot width changes.Pointalready has anOrd, so keying byPoint(or(slot, hash)) and formatting the filename only when writing keeps the ordering stable.Also applies to: 452-459
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/amaru/src/bin/ledger/cmd/mithril.rs` around lines 201 - 207, The archive_name_for_blocks function currently uses rendered filename strings as BTreeMap keys which orders lexicographically (e.g., "99999.cbor" < "100000.cbor") and breaks slot ordering; change the map key to a typed ordering (use Point or a (slot, hash) tuple) instead of String (e.g., BTreeMap<Point, &Vec<u8>>), update any callers that build or iterate the map (including the related logic at the other mentioned block range) to insert and query by Point, and only format the archive member names with .cbor when creating the filename in archive_name_for_blocks (or equivalent writer) so ordering, tail-match checks, and tar member order use the natural Ord of Point. Ensure function signatures that reference archive_name_for_blocks and the map type are updated accordingly.
287-295:⚠️ Potential issue | 🔴 CriticalClamp both chunk rewinds at zero.
Line 294 and Line 433 both do a raw
- 1. On chunk0orPoint::Origin, that either panics with overflow checks or wraps to the moon, which is a proper boss fight for the bootstrap path.saturating_sub(1)in both places keeps the rewind sane.🧯 Minimal fix
if immutable_dir.try_exists()? { return Ok(fs::read_dir(immutable_dir)? .filter_map(Result::ok) .filter_map(|entry| entry.path().file_name()?.to_str().map(|s| s.to_owned())) .filter_map(|name| name.strip_suffix(".chunk").and_then(|id| id.parse::<u64>().ok())) .max() - .map(|n| n - 1)); // Last immutable might not be finalized (hint from JP from Mithril team) + .map(|n| n.saturating_sub(1))); // Last immutable might not be finalized (hint from JP from Mithril team) } @@ - let from_chunk = latest_chunk.unwrap_or(infer_chunk_from_slot(resume_point.slot_or_default().into()) - 1); + let from_chunk = latest_chunk.unwrap_or_else(|| { + infer_chunk_from_slot(resume_point.slot_or_default().into()).saturating_sub(1) + });Also applies to: 430-433
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/amaru/src/bin/ledger/cmd/mithril.rs` around lines 287 - 295, Replace the raw subtraction that can underflow with saturating subtraction in both places: in get_latest_chunk() change the closure .map(|n| n - 1) to .map(|n| n.saturating_sub(1)), and make the same change where the code rewinds a Point/chunk near the use of Point::Origin (the other raw - 1 at the second occurrence) to use saturating_sub(1) so a zero value clamps to zero instead of underflowing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/amaru/src/bin/ledger/cmd/mithril.rs`:
- Around line 503-506: Remove the unused imports BLOCKS_PER_ARCHIVE,
build_archive_bytes, and parse_archive_metadata from the use super{...} list in
mithril.rs so the compiler warning is resolved; locate the use statement that
currently imports ArchiveMetadata, BLOCKS_PER_ARCHIVE, archive_name_for_blocks,
build_archive_bytes, latest_archive, latest_archive_end_point,
parse_archive_bounds, parse_archive_metadata, resume_point_for_archives and
delete only the three unused symbols (BLOCKS_PER_ARCHIVE, build_archive_bytes,
parse_archive_metadata).
- Around line 276-281: Change resume_point_for_archives so that when there are
zero or one parsed archives it returns Point::Origin instead of falling back to
tip: update the function resume_point_for_archives to use Point::Origin as the
default fallback (instead of tip) and ensure any code that reads from that point
and then calls skip(1) (the consumer around the resume logic) will have at least
the genesis available; add a regression test exercising 0-archive and 1-archive
cases to assert blocks are produced (no empty iterator/crash) and document that
if rebuilding from origin is undesirable a persisted boundary must be introduced
before the first archive.
---
Duplicate comments:
In `@crates/amaru/src/bin/ledger/cmd/mithril.rs`:
- Around line 164-173: package_blocks currently writes the compressed archive
directly to the final archive path which can lead to truncated files on crash;
change it to write to a temporary file in the same directory (use blocks_dir and
archive_path_for_blocks to derive path), flush and sync the temp file (call
sync_all on the File or its parent dir), then atomically rename the temp into
place before returning the final path; also move any remove_file(old_tail) logic
so it executes only after package_blocks returns successfully (i.e., after the
rename has landed) to ensure the stale tail is deleted only when the new archive
is durably installed.
- Around line 201-207: The archive_name_for_blocks function currently uses
rendered filename strings as BTreeMap keys which orders lexicographically (e.g.,
"99999.cbor" < "100000.cbor") and breaks slot ordering; change the map key to a
typed ordering (use Point or a (slot, hash) tuple) instead of String (e.g.,
BTreeMap<Point, &Vec<u8>>), update any callers that build or iterate the map
(including the related logic at the other mentioned block range) to insert and
query by Point, and only format the archive member names with .cbor when
creating the filename in archive_name_for_blocks (or equivalent writer) so
ordering, tail-match checks, and tar member order use the natural Ord of Point.
Ensure function signatures that reference archive_name_for_blocks and the map
type are updated accordingly.
- Around line 287-295: Replace the raw subtraction that can underflow with
saturating subtraction in both places: in get_latest_chunk() change the closure
.map(|n| n - 1) to .map(|n| n.saturating_sub(1)), and make the same change where
the code rewinds a Point/chunk near the use of Point::Origin (the other raw - 1
at the second occurrence) to use saturating_sub(1) so a zero value clamps to
zero instead of underflowing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7b600536-f0c9-403a-969b-c53cd5e653f7
📒 Files selected for processing (1)
crates/amaru/src/bin/ledger/cmd/mithril.rs
Codecov Report❌ Patch coverage is
... and 5 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Signed-off-by: jeluard <jeluard@users.noreply.github.com>
e5aad49 to
b92f9f8
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/amaru/src/bin/ledger/cmd/mithril.rs (1)
164-173:⚠️ Potential issue | 🟠 MajorMake the tail swap atomic.
Still a bit of a Dark Souls checkpoint, mate: the old tail gets deleted before the replacement is safely on disk, and
package_blockswrites straight to the final path. If the process cops a crash in that window, the next resume can see either no tail or a half-written archive with the final filename. Write to a temp file in the same directory,sync_all,rename, and only then remove the stale tail.Also applies to: 474-477
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/amaru/src/bin/ledger/cmd/mithril.rs` around lines 164 - 173, package_blocks currently writes the final archive directly and deletes the old tail before ensuring the new file is safely on disk; change it to write to a uniquely-named temp file in the same directory returned by blocks_dir(network), write the compressed bytes there, call file.sync_all() and then directory handle.sync_all() (open the dir with File::open(&dir) for syncing), atomically rename the temp file to the path returned by archive_path_for_blocks(network, blocks) using std::fs::rename, and only after a successful rename remove the old tail; apply the same temp-write + sync + rename pattern to any other call sites that write archives via archive_path_for_blocks to avoid half-written files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/amaru/src/bin/ledger/cmd/mithril.rs`:
- Around line 164-173: package_blocks currently writes the final archive
directly and deletes the old tail before ensuring the new file is safely on
disk; change it to write to a uniquely-named temp file in the same directory
returned by blocks_dir(network), write the compressed bytes there, call
file.sync_all() and then directory handle.sync_all() (open the dir with
File::open(&dir) for syncing), atomically rename the temp file to the path
returned by archive_path_for_blocks(network, blocks) using std::fs::rename, and
only after a successful rename remove the old tail; apply the same temp-write +
sync + rename pattern to any other call sites that write archives via
archive_path_for_blocks to avoid half-written files.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3b9b597b-e8d7-43e0-a638-1f96bee0b666
📒 Files selected for processing (1)
crates/amaru/src/bin/ledger/cmd/mithril.rs
Improve mithril package creation so that they are stable and not overlapping packages are created.
Summary by CodeRabbit
New Features
Bug Fixes
Tests