Skip to content

persist: replace file-per-entry with WAL and refactor into generic indexedWAL#3044

Open
wen-coding wants to merge 27 commits intomainfrom
wen/use_wal_for_persistence
Open

persist: replace file-per-entry with WAL and refactor into generic indexedWAL#3044
wen-coding wants to merge 27 commits intomainfrom
wen/use_wal_for_persistence

Conversation

@wen-coding
Copy link
Contributor

@wen-coding wen-coding commented Mar 9, 2026

Summary

Replace file-per-block and file-per-commitqc persistence in blocks.go and commitqcs.go with sei-db/wal, and extract common WAL mechanics into a generic indexedWAL[T].

  • wal.go (new): generic indexedWAL[T] with codec[T] interface, providing Write, ReadAll, TruncateBefore (with verify callback), TruncateAll, Count, and Close. Handles monotonic index tracking, typed serialization, and full WAL lifecycle. Opens the underlying WAL with AllowEmpty: true so an empty log is valid; emptiness is defined by firstIdx == nextIdx (no sentinel). TruncateAll delegates to wal.TruncateAll() and advances firstIdx to nextIdx so Count() == 0 while preserving the index counter. Fsync is disabled because the prune anchor (persisted via A/B files with fsync) is the crash-recovery watermark — on power loss we restart from the anchor and re-sync lost WAL entries from peers. ReadAll includes a post-replay count check to detect silent data loss.
  • blocks.go: one WAL per lane in blocks/<hex_lane_id>/ subdirectories, with lazy lane creation, independent per-lane truncation, stale lane cleanup, and TruncateAll() when the prune anchor advances past all persisted blocks. PersistBlock enforces strict contiguous block numbers; DeleteBefore verifies block numbers before truncating. loadAll checks for gaps at replay time. Close uses errors.Join to ensure all lane WALs are closed even if one fails.
  • commitqcs.go: single WAL in commitqcs/, linear RoadIndex-to-WAL-index mapping. PersistCommitQC silently ignores duplicates (idx < next) for idempotent startup, rejects gaps (idx > next). DeleteBefore advances the write cursor and truncates the WAL via TruncateAll when the anchor advances past all entries — cursor advancement happens before the count-zero check so it works correctly even after a crash between TruncateAll and the first new write. loadAll checks for gaps at replay time.
  • state.go: startup prunes stale WAL entries via DeleteBefore, then re-persists in-memory CommitQCs (no-op in normal case; writes anchor's QC after a WAL TruncateAll). Runtime runPersist reordered: anchor → prune → CommitQCs → blocks (pruning before writes ensures contiguous WAL indices after truncation).
  • Removed all file-per-entry code and the ResetNext method.

Test plan

  • All autobahn tests pass (avail, consensus, data, types)
  • wal_test.go: empty start, write+read, reopen with data, reopen after truncate, truncate all but last, verify callback (accept + reject), write after truncate, TruncateAll, stale nextIdx detection
  • blocks_test.go: empty dir, persist+load, multiple lanes, delete-before (single/multi/empty lane/restart), noop, delete-then-persist, delete-past-all (TruncateAll), stale lane removal, empty WAL survives reopen, lazy lane creation, skip non-hex/invalid-key dirs, out-of-sequence rejection, gap detection at load time
  • commitqcs_test.go: empty dir, persist+load, delete-before, duplicate is no-op, gap rejected, noop, delete-then-persist, delete-past-all (TruncateAll + cursor advance), crash-recovery (TruncateAll then crash before write, restart with empty WAL), gap detection at load time
  • state_test.go: anchor past all persisted commitQCs truncates WAL and re-persists anchor's QC (integration test for "long offline" scenario)

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 11, 2026, 3:24 PM

@wen-coding wen-coding changed the title persist: refactor block/commitqc WAL into generic indexedWAL persist: replace file-per-entry with WAL and refactor into generic indexedWAL Mar 9, 2026
@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 72.47706% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.36%. Comparing base (30f8b0a) to head (1b3e3c1).

Files with missing lines Patch % Lines
...mint/internal/autobahn/consensus/persist/blocks.go 79.24% 13 Missing and 9 partials ⚠️
...dermint/internal/autobahn/consensus/persist/wal.go 64.91% 11 Missing and 9 partials ⚠️
...t/internal/autobahn/consensus/persist/commitqcs.go 77.77% 5 Missing and 5 partials ⚠️
sei-tendermint/internal/autobahn/avail/state.go 20.00% 5 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main    #3044    +/-   ##
========================================
  Coverage   58.36%   58.36%            
========================================
  Files        2079     2081     +2     
  Lines      171899   171768   -131     
========================================
- Hits       100321   100248    -73     
+ Misses      62637    62589    -48     
+ Partials     8941     8931    -10     
Flag Coverage Δ
sei-chain-pr 78.58% <72.47%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/autobahn/avail/inner.go 97.36% <ø> (ø)
sei-tendermint/internal/autobahn/avail/state.go 75.37% <20.00%> (-0.32%) ⬇️
...t/internal/autobahn/consensus/persist/commitqcs.go 80.88% <77.77%> (+2.15%) ⬆️
...dermint/internal/autobahn/consensus/persist/wal.go 64.91% <64.91%> (ø)
...mint/internal/autobahn/consensus/persist/blocks.go 77.86% <79.24%> (+0.72%) ⬆️

... and 46 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +155 to +168
for lane, first := range laneFirsts {
lw, ok := bp.lanes[lane]
if !ok {
continue // no WAL yet; PersistBlock will create one lazily
}
lane, fileN, err := parseBlockFilename(entry.Name())
if err != nil {
firstBN, ok := lw.firstBlockNum().Get()
if !ok || first <= firstBN {
continue
}
first, ok := laneFirsts[lane]
if ok && fileN >= first {
continue
walIdx := lw.firstIdx + uint64(first-firstBN)
if err := lw.TruncateBefore(walIdx); err != nil {
return fmt.Errorf("truncate lane %s WAL before block %d: %w", lane, first, err)
}
path := filepath.Join(bp.dir, entry.Name())
if err := os.Remove(path); err != nil && !os.IsNotExist(err) {
logger.Warn("failed to delete block file", "path", path, "err", err)
}

Check warning

Code scanning / CodeQL

Iteration over map

Iteration over map may be a possible source of non-determinism
@wen-coding wen-coding requested a review from pompon0 March 10, 2026 23:52
dbwal.Config{
WriteBufferSize: 0, // synchronous writes
WriteBatchSize: 1, // no batching
FsyncEnabled: true,
Copy link
Contributor

@yzang2019 yzang2019 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware of the latency impact here, if putting write in critical path, it would introduce some noticeable latency. Would recommend synchronous write + nofsync for perf reason, fsync does provide stronger guarantees, but the chance of all validators hitting power off at the same time is pretty rare

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable, changed

// Used when all entries are stale (e.g. the prune anchor advanced past
// everything persisted).
//
// TODO: sei-db/wal doesn't expose tidwall/wal's AllowEmpty option, so there's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to expose that in a separate PR? It should be pretty simple to add though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 3049 merged and switched to TruncateAll() here.

if err := w.wal.Write(entry); err != nil {
return err
}
if w.firstIdx == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend using Count() == 0 instead of relying on firstIdx == 0 as a sentinel for "WAL is empty" incase the assumption of wal starting index from 0 is not valid in the future if we switch the wal library

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nil, nil
}
entries := make([]T, 0, w.Count())
err := w.wal.Replay(w.firstIdx, w.nextIdx-1, func(_ uint64, entry T) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no validation that len(entries) == w.Count() after a successful replay. If Replay succeeds but returns fewer entries than expected (e.g., the underlying WAL silently truncated its tail on open due to corruption), ReadAll would return a short slice with no error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nil
}
for _, lw := range bp.lanes {
if err := lw.Close(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one lane's WAL fails to close, the remaining lanes are never closed. Use errors.Join to accumulate errors and close all lanes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Replace the file-per-block and file-per-commitqc persistence with
sei-db/wal. Blocks use one WAL per lane so that truncation is
independent (no stale-lane problem). CommitQCs use a single WAL
with a linear RoadIndex-to-WAL-index mapping.

Key changes:
- BlockPersister: per-lane WAL in blocks/<hex_lane_id>/ subdirs,
  lazy lane creation, independent per-lane TruncateBefore.
- CommitQCPersister: single WAL in commitqcs/, tracks firstWALIdx
  and nextWALIdx locally for correct truncation mapping.
- Remove all file-per-entry code: filename construction/parsing,
  directory scanning, individual file read/write/delete, corrupt
  file skipping.
- Rewrite tests for WAL semantics (append-only, truncation, replay).

Made-with: Cursor
Extract common WAL mechanics (index tracking, typed write/replay,
truncation) into a generic indexedWAL[T] backed by sei-db/wal,
replacing the duplicated raw-bytes WAL setup in both blocks.go and
commitqcs.go.

Key changes:
- Add indexedWAL[T] with codec[T] interface for typed serialization
- laneWAL embeds indexedWAL; firstBlockNum() returns Option for safety
- DeleteBefore now removes stale lane WALs (validators no longer in
  committee) and their directories
- Add empty-WAL guard to CommitQCPersister.DeleteBefore
- Add direct unit tests for indexedWAL (wal_test.go)
- Add TODO for dynamic committee membership support

Made-with: Cursor
Instead of extracting the lane ID from the first replayed entry
(and closing/skipping empty WALs), decode it from the hex directory
name. This keeps the WAL open so the lane can receive blocks without
reopening it.

Made-with: Cursor
Validate the directory name (hex decode + PublicKeyFromBytes) before
opening the WAL, avoiding a redundant hex.DecodeString call. Add tests
for both skip paths: non-hex directory name and valid hex but invalid
public key length.

Made-with: Cursor
Replace callback-based Replay on indexedWAL with ReadAll that returns
a slice. Remove the defensive sort in blocks loadAll since WAL entries
are already in append order. Fix stale Replay reference in godoc.

Made-with: Cursor
TruncateBefore now reads and verifies the entry at the target WAL index
before truncating, catching index-mapping corruption before data loss.
PersistCommitQC and PersistBlock enforce strict sequential order to
prevent gaps that would break the linear domain-to-WAL-index mapping.

Made-with: Cursor
The non-contiguous commitQC test now expects the gap to be caught
at PersistCommitQC time ("out of sequence") rather than at NewState
load time, matching the defense-in-depth guard added earlier.

Made-with: Cursor
Add contiguity checks during WAL replay to catch on-disk corruption
that bypasses write-time guards. Includes tests that write directly
to the WAL to simulate corrupted data.

Made-with: Cursor
Reduces log noise on restart with many validators and blocks.

Made-with: Cursor
BlockPersister uses Option[string] for dir, CommitQCPersister uses
Option[*indexedWAL] for iw. None = no-op mode, Some = real persistence.
The noop behavior is now structurally implied rather than a separate flag.

Made-with: Cursor
When a node restarts after being offline for a long time, the prune
anchor may have advanced past all locally persisted WAL entries.

- Add indexedWAL.Reset() to close, remove, and reopen a fresh WAL.
- DeleteBefore in both blocks.go and commitqcs.go now calls Reset()
  when the prune point is at or past the last persisted entry.
- CommitQCPersister.DeleteBefore advances the write cursor (cp.next)
  in the reset branch, making ResetNext unnecessary.
- PersistCommitQC now silently ignores duplicates (idx < next) so
  startup can idempotently re-persist in-memory entries after a reset.
- Remove ResetNext; replace call sites with a re-persist loop at
  startup and rely on DeleteBefore's cursor management at runtime.
- Reorder runPersist: prune before writes (WAL needs contiguous indices).
- Update runPersist godoc to match new step ordering.
- Add tests for Reset, DeleteBefore-past-all, duplicate no-op, and an
  integration test for NewState with anchor past all persisted QCs.

Made-with: Cursor
When the WAL was reset (anchor past all entries) and the process crashed
before writing new entries, restart would find an empty WAL with cp.next=0.
The old guard order (count==0 early return before cursor check) prevented
DeleteBefore from advancing cp.next, causing the subsequent PersistCommitQC
to fail with "out of sequence".

Fix: check idx >= cp.next before the count==0 guard so the cursor is always
advanced, even on an already-empty WAL. Add TestCommitQCDeleteBeforePastAllCrashRecovery.

Made-with: Cursor
The loop over all in-memory CommitQCs was unnecessary — entries loaded
from the WAL are guaranteed to survive DeleteBefore (it only removes
entries below the anchor). Only the anchor's CommitQC could be missing
after a WAL reset, so persist just that one entry.

Made-with: Cursor
- Disable WAL fsync; the prune anchor (A/B files with fsync) already
  provides the crash-recovery watermark.
- Use Count() == 0 instead of firstIdx == 0 for emptiness checks in
  Write and ReadAll for robustness.
- Add post-replay count check in ReadAll to detect silent data loss.
- Use errors.Join in BlockPersister.Close to close all lane WALs.
- Rename laneDir → lanePath to avoid shadowing the laneDir() function.
- Add TestIndexedWAL_ReadAllDetectsStaleNextIdx.

Made-with: Cursor
@wen-coding wen-coding force-pushed the wen/use_wal_for_persistence branch from c0727e6 to 422aa8f Compare March 11, 2026 04:29
wen-coding and others added 3 commits March 11, 2026 08:01
Now that sei-db/wal exposes AllowEmpty and TruncateAll (#3049), use
them to clear a WAL in-place instead of the heavier close → remove
directory → reopen pattern.

- Enable AllowEmpty in WAL config.
- Replace Reset() with TruncateAll() — single call, no dir removal.
- Remove dir/codec fields from indexedWAL (only needed for reopen).
- Eliminate firstIdx == 0 sentinel: Count() is now just nextIdx -
  firstIdx, empty when equal. Write() no longer needs the first-write
  bookkeeping branch.
- Update openIndexedWAL to handle AllowEmpty's empty-log reporting
  (first > last) uniformly with the non-empty case.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants