persist: replace file-per-entry with WAL and refactor into generic indexedWAL#3044
persist: replace file-per-entry with WAL and refactor into generic indexedWAL#3044wen-coding wants to merge 27 commits intomainfrom
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3044 +/- ##
========================================
Coverage 58.36% 58.36%
========================================
Files 2079 2081 +2
Lines 171899 171768 -131
========================================
- Hits 100321 100248 -73
+ Misses 62637 62589 -48
+ Partials 8941 8931 -10
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| for lane, first := range laneFirsts { | ||
| lw, ok := bp.lanes[lane] | ||
| if !ok { | ||
| continue // no WAL yet; PersistBlock will create one lazily | ||
| } | ||
| lane, fileN, err := parseBlockFilename(entry.Name()) | ||
| if err != nil { | ||
| firstBN, ok := lw.firstBlockNum().Get() | ||
| if !ok || first <= firstBN { | ||
| continue | ||
| } | ||
| first, ok := laneFirsts[lane] | ||
| if ok && fileN >= first { | ||
| continue | ||
| walIdx := lw.firstIdx + uint64(first-firstBN) | ||
| if err := lw.TruncateBefore(walIdx); err != nil { | ||
| return fmt.Errorf("truncate lane %s WAL before block %d: %w", lane, first, err) | ||
| } | ||
| path := filepath.Join(bp.dir, entry.Name()) | ||
| if err := os.Remove(path); err != nil && !os.IsNotExist(err) { | ||
| logger.Warn("failed to delete block file", "path", path, "err", err) | ||
| } |
Check warning
Code scanning / CodeQL
Iteration over map
| dbwal.Config{ | ||
| WriteBufferSize: 0, // synchronous writes | ||
| WriteBatchSize: 1, // no batching | ||
| FsyncEnabled: true, |
There was a problem hiding this comment.
Be aware of the latency impact here, if putting write in critical path, it would introduce some noticeable latency. Would recommend synchronous write + nofsync for perf reason, fsync does provide stronger guarantees, but the chance of all validators hitting power off at the same time is pretty rare
There was a problem hiding this comment.
That's reasonable, changed
| // Used when all entries are stale (e.g. the prune anchor advanced past | ||
| // everything persisted). | ||
| // | ||
| // TODO: sei-db/wal doesn't expose tidwall/wal's AllowEmpty option, so there's |
There was a problem hiding this comment.
Is the plan to expose that in a separate PR? It should be pretty simple to add though?
There was a problem hiding this comment.
PR 3049 merged and switched to TruncateAll() here.
| if err := w.wal.Write(entry); err != nil { | ||
| return err | ||
| } | ||
| if w.firstIdx == 0 { |
There was a problem hiding this comment.
Recommend using Count() == 0 instead of relying on firstIdx == 0 as a sentinel for "WAL is empty" incase the assumption of wal starting index from 0 is not valid in the future if we switch the wal library
| return nil, nil | ||
| } | ||
| entries := make([]T, 0, w.Count()) | ||
| err := w.wal.Replay(w.firstIdx, w.nextIdx-1, func(_ uint64, entry T) error { |
There was a problem hiding this comment.
There's no validation that len(entries) == w.Count() after a successful replay. If Replay succeeds but returns fewer entries than expected (e.g., the underlying WAL silently truncated its tail on open due to corruption), ReadAll would return a short slice with no error.
| return nil | ||
| } | ||
| for _, lw := range bp.lanes { | ||
| if err := lw.Close(); err != nil { |
There was a problem hiding this comment.
If one lane's WAL fails to close, the remaining lanes are never closed. Use errors.Join to accumulate errors and close all lanes
Replace the file-per-block and file-per-commitqc persistence with sei-db/wal. Blocks use one WAL per lane so that truncation is independent (no stale-lane problem). CommitQCs use a single WAL with a linear RoadIndex-to-WAL-index mapping. Key changes: - BlockPersister: per-lane WAL in blocks/<hex_lane_id>/ subdirs, lazy lane creation, independent per-lane TruncateBefore. - CommitQCPersister: single WAL in commitqcs/, tracks firstWALIdx and nextWALIdx locally for correct truncation mapping. - Remove all file-per-entry code: filename construction/parsing, directory scanning, individual file read/write/delete, corrupt file skipping. - Rewrite tests for WAL semantics (append-only, truncation, replay). Made-with: Cursor
Extract common WAL mechanics (index tracking, typed write/replay, truncation) into a generic indexedWAL[T] backed by sei-db/wal, replacing the duplicated raw-bytes WAL setup in both blocks.go and commitqcs.go. Key changes: - Add indexedWAL[T] with codec[T] interface for typed serialization - laneWAL embeds indexedWAL; firstBlockNum() returns Option for safety - DeleteBefore now removes stale lane WALs (validators no longer in committee) and their directories - Add empty-WAL guard to CommitQCPersister.DeleteBefore - Add direct unit tests for indexedWAL (wal_test.go) - Add TODO for dynamic committee membership support Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Instead of extracting the lane ID from the first replayed entry (and closing/skipping empty WALs), decode it from the hex directory name. This keeps the WAL open so the lane can receive blocks without reopening it. Made-with: Cursor
Made-with: Cursor
Validate the directory name (hex decode + PublicKeyFromBytes) before opening the WAL, avoiding a redundant hex.DecodeString call. Add tests for both skip paths: non-hex directory name and valid hex but invalid public key length. Made-with: Cursor
Replace callback-based Replay on indexedWAL with ReadAll that returns a slice. Remove the defensive sort in blocks loadAll since WAL entries are already in append order. Fix stale Replay reference in godoc. Made-with: Cursor
TruncateBefore now reads and verifies the entry at the target WAL index before truncating, catching index-mapping corruption before data loss. PersistCommitQC and PersistBlock enforce strict sequential order to prevent gaps that would break the linear domain-to-WAL-index mapping. Made-with: Cursor
The non-contiguous commitQC test now expects the gap to be caught
at PersistCommitQC time ("out of sequence") rather than at NewState
load time, matching the defense-in-depth guard added earlier.
Made-with: Cursor
Made-with: Cursor
Add contiguity checks during WAL replay to catch on-disk corruption that bypasses write-time guards. Includes tests that write directly to the WAL to simulate corrupted data. Made-with: Cursor
Reduces log noise on restart with many validators and blocks. Made-with: Cursor
Made-with: Cursor
BlockPersister uses Option[string] for dir, CommitQCPersister uses Option[*indexedWAL] for iw. None = no-op mode, Some = real persistence. The noop behavior is now structurally implied rather than a separate flag. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
When a node restarts after being offline for a long time, the prune anchor may have advanced past all locally persisted WAL entries. - Add indexedWAL.Reset() to close, remove, and reopen a fresh WAL. - DeleteBefore in both blocks.go and commitqcs.go now calls Reset() when the prune point is at or past the last persisted entry. - CommitQCPersister.DeleteBefore advances the write cursor (cp.next) in the reset branch, making ResetNext unnecessary. - PersistCommitQC now silently ignores duplicates (idx < next) so startup can idempotently re-persist in-memory entries after a reset. - Remove ResetNext; replace call sites with a re-persist loop at startup and rely on DeleteBefore's cursor management at runtime. - Reorder runPersist: prune before writes (WAL needs contiguous indices). - Update runPersist godoc to match new step ordering. - Add tests for Reset, DeleteBefore-past-all, duplicate no-op, and an integration test for NewState with anchor past all persisted QCs. Made-with: Cursor
Made-with: Cursor
When the WAL was reset (anchor past all entries) and the process crashed before writing new entries, restart would find an empty WAL with cp.next=0. The old guard order (count==0 early return before cursor check) prevented DeleteBefore from advancing cp.next, causing the subsequent PersistCommitQC to fail with "out of sequence". Fix: check idx >= cp.next before the count==0 guard so the cursor is always advanced, even on an already-empty WAL. Add TestCommitQCDeleteBeforePastAllCrashRecovery. Made-with: Cursor
The loop over all in-memory CommitQCs was unnecessary — entries loaded from the WAL are guaranteed to survive DeleteBefore (it only removes entries below the anchor). Only the anchor's CommitQC could be missing after a WAL reset, so persist just that one entry. Made-with: Cursor
- Disable WAL fsync; the prune anchor (A/B files with fsync) already provides the crash-recovery watermark. - Use Count() == 0 instead of firstIdx == 0 for emptiness checks in Write and ReadAll for robustness. - Add post-replay count check in ReadAll to detect silent data loss. - Use errors.Join in BlockPersister.Close to close all lane WALs. - Rename laneDir → lanePath to avoid shadowing the laneDir() function. - Add TestIndexedWAL_ReadAllDetectsStaleNextIdx. Made-with: Cursor
c0727e6 to
422aa8f
Compare
Now that sei-db/wal exposes AllowEmpty and TruncateAll (#3049), use them to clear a WAL in-place instead of the heavier close → remove directory → reopen pattern. - Enable AllowEmpty in WAL config. - Replace Reset() with TruncateAll() — single call, no dir removal. - Remove dir/codec fields from indexedWAL (only needed for reopen). - Eliminate firstIdx == 0 sentinel: Count() is now just nextIdx - firstIdx, empty when equal. Write() no longer needs the first-write bookkeeping branch. - Update openIndexedWAL to handle AllowEmpty's empty-log reporting (first > last) uniformly with the non-empty case. Made-with: Cursor
…runcates" Made-with: Cursor
Summary
Replace file-per-block and file-per-commitqc persistence in
blocks.goandcommitqcs.gowithsei-db/wal, and extract common WAL mechanics into a genericindexedWAL[T].wal.go(new): genericindexedWAL[T]withcodec[T]interface, providingWrite,ReadAll,TruncateBefore(with verify callback),TruncateAll,Count, andClose. Handles monotonic index tracking, typed serialization, and full WAL lifecycle. Opens the underlying WAL withAllowEmpty: trueso an empty log is valid; emptiness is defined byfirstIdx == nextIdx(no sentinel).TruncateAlldelegates towal.TruncateAll()and advancesfirstIdxtonextIdxsoCount() == 0while preserving the index counter. Fsync is disabled because the prune anchor (persisted via A/B files with fsync) is the crash-recovery watermark — on power loss we restart from the anchor and re-sync lost WAL entries from peers.ReadAllincludes a post-replay count check to detect silent data loss.blocks.go: one WAL per lane inblocks/<hex_lane_id>/subdirectories, with lazy lane creation, independent per-lane truncation, stale lane cleanup, andTruncateAll()when the prune anchor advances past all persisted blocks.PersistBlockenforces strict contiguous block numbers;DeleteBeforeverifies block numbers before truncating.loadAllchecks for gaps at replay time.Closeuseserrors.Jointo ensure all lane WALs are closed even if one fails.commitqcs.go: single WAL incommitqcs/, linear RoadIndex-to-WAL-index mapping.PersistCommitQCsilently ignores duplicates (idx < next) for idempotent startup, rejects gaps (idx > next).DeleteBeforeadvances the write cursor and truncates the WAL viaTruncateAllwhen the anchor advances past all entries — cursor advancement happens before the count-zero check so it works correctly even after a crash betweenTruncateAlland the first new write.loadAllchecks for gaps at replay time.state.go: startup prunes stale WAL entries viaDeleteBefore, then re-persists in-memory CommitQCs (no-op in normal case; writes anchor's QC after a WALTruncateAll). RuntimerunPersistreordered: anchor → prune → CommitQCs → blocks (pruning before writes ensures contiguous WAL indices after truncation).ResetNextmethod.Test plan
wal_test.go: empty start, write+read, reopen with data, reopen after truncate, truncate all but last, verify callback (accept + reject), write after truncate, TruncateAll, stale nextIdx detectionblocks_test.go: empty dir, persist+load, multiple lanes, delete-before (single/multi/empty lane/restart), noop, delete-then-persist, delete-past-all (TruncateAll), stale lane removal, empty WAL survives reopen, lazy lane creation, skip non-hex/invalid-key dirs, out-of-sequence rejection, gap detection at load timecommitqcs_test.go: empty dir, persist+load, delete-before, duplicate is no-op, gap rejected, noop, delete-then-persist, delete-past-all (TruncateAll + cursor advance), crash-recovery (TruncateAll then crash before write, restart with empty WAL), gap detection at load timestate_test.go: anchor past all persisted commitQCs truncates WAL and re-persists anchor's QC (integration test for "long offline" scenario)