Skip to content
This repository was archived by the owner on Nov 25, 2025. It is now read-only.

Conversation

@powerslider
Copy link
Contributor

Why this should be merged

Check ava-labs/avalanchego#4603

How this works

Add a Finalizer interface to provide explicit cleanup operations for syncers. This ensures cleanup (like flushing batches to disk) is performed reliably even on cancellation or early returns.

  • Add Finalizer interface to sync/types.go for explicit cleanup.
  • Attach Finalize() in CodeQueue that finalizes code fetching to this new interface.
  • Gather finalization logic in a Finalize() for StateSyncer to flush in-progress trie batches.
  • Implement Finalize() for AtomicSyncer to commit pending database changes.
  • Add FinalizeAll() to SyncerRegistry with defer to ensure cleanup runs.
  • Remove OnFailure callback mechanism (replaced by Finalizer).

How this was tested

existing UT

Need to be documented?

no

Need to update RELEASES.md?

no

resolves ava-labs/avalanchego#4603

Signed-off-by: Tsvetan Dimitrov ([email protected])

… shutdown

During graceful shutdown, syncers cancelled via context cancellation were being
logged as ERROR level. This is misleading since cancellation during shutdown is
expected behavior, not an error condition.

- Use `errors.Is()` to detect `context.Canceled` and `context.DeadlineExceeded`
  (handles wrapped errors) and log as INFO instead of ERROR
- Separate `RunSyncerTasks()` logic into a synchronous wrapper and `StartAsync()`
  method for async execution to gain more flexibility and handle more use cases.
- Add early return optimization when context is already cancelled.

Test improvements:
- Add tests for cancellation scenarios (`Canceled`, `DeadlineExceeded`, wrapped
  errors, early return).
- Fix flakiness by adding WaitGroup synchronization and replacing channel-based
  coordination.
- Refactor tests to use `t.Context()` and extract common helpers.

resolves #1410
During graceful shutdown, the State Syncer was hanging because multiple
blocking operations did not check context cancellation. When shutdown
occurred, these operations would block indefinitely, preventing syncers
from detecting cancellation and exiting gracefully.

- Add context.Context parameter to LeafSyncTask.OnLeafs() interface
  to enable context propagation through the leaf processing call chain.

- Update CodeQueue.AddCode() to accept context and check ctx.Done()
  before blocking on channel sends, preventing indefinite blocking
  when Code Syncer stops consuming during shutdown.

- Update all OnLeafs implementations (mainTrieTask, storageTrieTask,
  trieSegment, atomic syncer) to accept and pass context through
  the call chain.

- Add context parameter to startSyncing() and createSegments()
  methods, checking cancellation before blocking channel sends to
  the segments work queue.

- Add context cancellation check in BlockSyncer before checking
  blocks on disk, ensuring it responds during the initial scan phase.

- Update sync/client/leaf_syncer.go to pass context to OnLeafs()
  callbacks.

This ensures all syncers detect cancellation immediately and exit
gracefully instead of hanging until timeout.
… tracking

- Remove canceledWG from NewCancelAwareSyncer() and related test assertions.
- Unify and simplify function comments.
- Make startedWG nil-safe in all helpers.
- Remove TestSyncerRegistry_WrappedContextCanceledError.
@powerslider powerslider self-assigned this Nov 21, 2025
@powerslider powerslider requested a review from a team as a code owner November 21, 2025 14:39
…eafs

- Add context cancellation check in `storageTrieTask.OnLeafs` before
  processing each account to allow early exit during shutdown.
- Add comment explaining `context.Background()` usage in `CodeQueue.init()`
  since it runs during construction before sync starts.
Add a `Finalizer` interface to provide explicit cleanup operations for
syncers. This ensures cleanup (like flushing batches to disk) is
performed reliably even on cancellation or early returns.

- Add `Finalizer` interface to `sync/types.go` for explicit cleanup.
- Attach `Finalize()` in `CodeQueue` that finalizes code fetching to this new interface.
- Gather finalization logic in a `Finalize()` for StateSyncer to flush in-progress trie batches.
- Implement `Finalize()` for AtomicSyncer to commit pending database changes.
- Add `FinalizeAll()` to SyncerRegistry with defer to ensure cleanup runs.
- Remove `OnFailure` callback mechanism (replaced by `Finalizer`).

resolves #1089

Signed-off-by: Tsvetan Dimitrov ([email protected])
@powerslider powerslider force-pushed the powerslider/1089-finalize-syncers branch from a977649 to 5131d0c Compare November 21, 2025 14:51
Base automatically changed from powerslider/1410-context-cancel-misleading-logs to master November 24, 2025 14:16
Comment on lines +161 to +165
if err := eg.Wait(); err != nil {
return err
}
return err

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := eg.Wait(); err != nil {
return err
}
return err
return nil
return eg.Wait()

func (t *stateSync) onSyncFailure() {
// Finalize checks if there are any in-progress tries and flushes their batches to disk
// to preserve progress. This is called by the syncer registry on sync failure or cancellation.
func (t *stateSync) Finalize() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based off the name alone, it seems like this shouldn't occur on success

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok, but you should update the comments


// Finalizer provides a mechanism to perform cleanup operations after a sync operation.
// This is useful for handling inflight requests, flushing to disk, or other cleanup tasks.
type Finalizer interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting you made this a separate interface. I totally agree that we shouldn't force all syncers to adhere to this, but should we instead make it something like:

```suggestion
type DynamicSyncer interface {
     Syncer
     UpdateSyncTarget(???) error

I'm honestly not sure, what do you think? My only concern with the current approach is that it seems unintuitive to check the types all the time, and rather we would expect the caller to either dynamic sync everything or static sync everything, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually now agree we this. This seems necessary for the state syncer, for example. Should it take in an error? (e.g. handle the cleanup different in the error vs no error case?)

// This ensures that even if the sync is cancelled or fails, we preserve
// the progress up to the last fully synced height.
func (s *Syncer) Finalize() error {
if s.db == nil {
Copy link
Contributor

@alarso16 alarso16 Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would db be nil?


// Finalizer provides a mechanism to perform cleanup operations after a sync operation.
// This is useful for handling inflight requests, flushing to disk, or other cleanup tasks.
type Finalizer interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think you should add this sort of comment on the syncer registry, since it doesn't say it will call finalize

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a cleanup/finalize procedure preventing corruption of synced state due to sync process errors

3 participants