Skip to content

Checkpoint sync state on all cancellation types#694

Open
arreyder wants to merge 1 commit intomainfrom
crr/fix-shutdown-c1z-race
Open

Checkpoint sync state on all cancellation types#694
arreyder wants to merge 1 commit intomainfrom
crr/fix-shutdown-c1z-race

Conversation

@arreyder
Copy link
Contributor

@arreyder arreyder commented Feb 21, 2026

Summary

  • Sync loop now checkpoints in-memory progress on all cancellation types (e.g. worker shutdown SIGTERM), not just DeadlineExceeded (run duration timeout)
  • Uses context.WithTimeout(context.Background(), 30s) for the checkpoint since the caller's context may already be cancelled — same pattern as c1file.Close()
  • Without this fix, up to 10s of sync progress (minCheckpointInterval) is silently lost on every rolling deploy

Context

Combined with the c1-side WorkerStopTimeout fix (c1 PR #14204), this ensures the full graceful shutdown chain works:

  1. Worker shutdown → Sync() checkpoints state before returning (this PR)
  2. syncer.Close() → WAL checkpoint + local saveC1z
  3. c1's manager.SaveC1Z() → uploads to S3 within the 25s grace period

Test plan

  • New TestCancellationCheckpoints — cancels context mid-sync, verifies checkpoint persists, resumes successfully
  • All existing pkg/sync/... tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved sync cancellation handling to save progress through checkpointing before exiting
  • Tests

    • Added test coverage for cancellation and checkpoint resume workflows

…eeded

When a Temporal worker shuts down (SIGTERM during rolling deploy), Sync()
returned immediately without checkpointing in-memory sync progress. The sync
loop only checkpointed on DeadlineExceeded (run duration timeout), silently
losing up to 10s of progress on other cancellation causes like worker shutdown.

Use a background context with 30s timeout for the checkpoint since the caller's
context may already be cancelled, matching the pattern in c1file.Close().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

No actionable comments were generated in the recent review. 🎉


Walkthrough

The sync package's cancellation handling now ensures checkpoints persist before exiting when context cancellation occurs. A new test validates that syncs can resume from checkpoints after mid-operation cancellation.

Changes

Cohort / File(s) Summary
Cancellation Checkpoint Logic
pkg/sync/syncer.go
Modified context cancellation handler to perform a checkpoint operation with a 30-second timeout using a background context before returning. Logs informational message on cancellation and errors on checkpoint failure, while still returning the original cancellation error.
Cancellation Resume Test
pkg/sync/syncer_test.go
Added TestCancellationCheckpoints test with supporting cancelOnResourcesMockConnector mock type. Validates that syncs can be interrupted, checkpointed, and resumed from that checkpoint in subsequent sync invocations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 When cancellation calls so loud,
We checkpoint with determined cloud,
A timeout guards our graceful flight,
Resume from where we paused mid-night,
Sync flows on, ever bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the main objective: checkpointing sync state on cancellation types instead of only on timeout.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch crr/fix-shutdown-c1z-race

Comment @coderabbitai help to get the list of available commands and usage tips.

@arreyder arreyder enabled auto-merge (squash) February 21, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant