Skip to content

Conversation

@chaptersix
Copy link
Contributor

@chaptersix chaptersix commented Feb 9, 2026

What changed

  • Add MigrateSchedule RPC to admin handler and CHASM scheduler service
  • V1 scheduler workflow receives a migrate signal, runs a local activity that calls MigrateSchedule to create the CHASM schedule from V1 state
  • CreateSchedulerFromMigration initializes a full CHASM scheduler tree (generator, invoker, backfillers, visibility) from the migrated V1 state, preserving the conflict token for client compatibility
  • LegacyToMigrateScheduleRequest converts V1 InternalState + ScheduleInfo into the migration request format, including running/completed workflows as buffered starts and ongoing backfills
  • On success, the V1 workflow terminates. On failure, it logs and continues running normally
  • If a V2 schedule already exists, migration treats it as success (idempotent) and terminates the V1 workflow

Why

Support migrating from workflow-backed (V1) schedulers to CHASM (V2) schedulers. The admin API (MigrateSchedule) signals the V1 workflow, which snapshots its state and creates the V2 schedule in a single local activity.

Signals during migration

Signals received while the migration local activity is executing are dropped if the migration succeeds (the workflow terminates without consuming them).

Migration activity retry policy

The migration local activity uses a restricted retry policy (1 attempt, 60s schedule-to-close) rather than the default (unlimited retries, 1h). A persistent failure should fail fast and let the workflow continue, rather than blocking for up to an hour.

Follow-up PRs

  • V2 to V1 conversion (rollback)
  • Sentinel key handling
  • Automatic migration (triggered when the workflow wakes up)
  • tdbg command for triggering migration
  • v2/v1 schedules both exist
  • callbacks being attached to workflows (should we handle that here?)

Add the infrastructure for migrating schedules from the workflow-backed
scheduler (V1) to the CHASM-backed scheduler (V2):

- Add MigrateSchedule RPC to CHASM scheduler service proto
- Add MigrateScheduleRequest/Response messages with migration state
- Implement AdminHandler.MigrateSchedule to signal V1 workflow
- Add migrate signal handler in V1 scheduler workflow
- Add MigrateSchedule activity to call CHASM scheduler service
- Update migration function to accept proto types directly
- Wire up SchedulerServiceClient in worker service fx module
Add handler and logic in chasm/lib/scheduler to create a CHASM
scheduler from migrated V1 state:
- CreateSchedulerFromMigration initializes scheduler with migrated state
- MigrateSchedule handler uses StartExecution with reject duplicate policy
- Tests for migration functionality
if errors.As(err, &alreadyStartedErr) {
return nil, serviceerror.NewWorkflowExecutionAlreadyStarted(
"CHASM schedule already exists",
"",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we include a this info n the request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will circle back to this.

…migration test

LegacyToSchedulerMigrationState was returning *SchedulerMigrationState
but the MigrateSchedule activity expects *MigrateScheduleRequest.
Rename to LegacyToMigrateScheduleRequest and return the full request
with NamespaceId populated.

Also fix the migrate signal channel (was incorrectly using
SignalNameForceCAN instead of SignalNameMigrate), add
TestScheduleMigrationV1ToV2 integration test, expose SchedulerClient
from test cluster, and fix staticcheck SA4006 lint errors in
scheduler_test.go.
Comment on lines 1005 to 1008
// inc the sequence number to prevent to invalidate signals to
// this workflow after the migration has started.
// they should target the chasm scheduler after this point
s.incSeqNo()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the conflict token is not checked until a signal is processed. So this likely has no benefit.

"namespace", s.State.Namespace,
"schedule-id", s.State.ScheduleId,
)
return nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything else that should be done before closing the workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will drop any signals received during the migration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will drop any signals received during the migration.

yep, regrettably true, though updates are already accepted (or rejected) asynchronously, so it's not new behavior.

Add activity-level tests for MigrateSchedule covering success,
already-exists (idempotent), and error paths. Add workflow-level tests
for migrate signal handling: success terminates workflow, failure
continues, and signals are still processed after a failed migration.

Cap migration local activity to 1 attempt with 60s schedule-to-close
timeout instead of inheriting the default 1h with unlimited retries.

Remove unnecessary incSeqNo() before migration -- the conflict token
change is never visible externally since it's in-memory only, and
queued signals are dropped on workflow termination regardless.
@chaptersix chaptersix marked this pull request as ready for review February 11, 2026 15:03
@chaptersix chaptersix requested review from a team as code owners February 11, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants