Skip to content

fix(vitess): recover late migration context#232

Open
aparajon wants to merge 3 commits into
mainfrom
armand/planetscale-progress-context
Open

fix(vitess): recover late migration context#232
aparajon wants to merge 3 commits into
mainfrom
armand/planetscale-progress-context

Conversation

@aparajon
Copy link
Copy Markdown
Collaborator

@aparajon aparajon commented Jun 4, 2026

Why

SchemaBot can sometimes create/start a PlanetScale deploy request before Vitess exposes the deploy's migration_context in SHOW VITESS_MIGRATIONS.

When that happens, SchemaBot still knows the deploy request ID, but it cannot show the richer Vitess progress view: per-table/per-shard progress, row counts, ETA, and related migration details. Without a recovery path, progress can stay limited to coarse deploy-request status even after the Vitess rows appear.

What

  • Save the set of Vitess migration contexts that existed before the deploy starts.
  • Persist that baseline on vitess_apply_data so it survives normal progress polling and restarts.
  • If progress polling later sees deploy metadata but no migration_context, query SHOW VITESS_MIGRATIONS again and recover a context from current rows.
  • Select recovered contexts deterministically: ignore baseline contexts, prefer the earliest requested_timestamp at or after the deploy request creation time, and avoid guessing when multiple untimed new contexts exist.
  • Persist a recovered context from both the active apply worker loop and read/API progress path so recovery survives worker restarts and scheduler owner handoff.
  • Store Vitess row timestamps with each baseline context for operator debugging.

Risk Assessment

Low. This only improves PlanetScale progress enrichment when resume state is missing migration_context; deploy-request state remains the source of truth for apply state.

The storage change is additive: a nullable JSON column on vitess_apply_data.

Generated with Amp

Copilot AI review requested due to automatic review settings June 4, 2026 19:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves PlanetScale/Vitess progress enrichment by re-discovering a missing Vitess migration_context during Progress() polling (when the persisted resume state lacks it), using deploy/request timestamps to select the most relevant context and adding regression tests for the selection logic.

Changes:

  • Add fallback migration context discovery in Engine.Progress() when ResumeState.MigrationContext is empty.
  • Introduce timestamp-based context selection helpers (migrationContextDiscoveryAfter, latestMigrationContextAfter) and parse requested_timestamp from SHOW VITESS_MIGRATIONS.
  • Add tests covering context cutoff selection and discovery “after” timestamp behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/engine/planetscale/progress.go Adds “late” migration context discovery during polling and implements timestamp-based context selection.
pkg/engine/planetscale/progress_test.go Adds regression tests for context cutoff selection and discovery-after timestamp computation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/engine/planetscale/progress.go Outdated
Comment thread pkg/engine/planetscale/progress_test.go Outdated
@aparajon aparajon force-pushed the armand/planetscale-progress-context branch 5 times, most recently from 317afd5 to fd31453 Compare June 4, 2026 22:01
Co-authored-by: Amp <amp@ampcode.com>
@aparajon aparajon force-pushed the armand/planetscale-progress-context branch from fd31453 to ca91b28 Compare June 4, 2026 22:31
@aparajon aparajon marked this pull request as ready for review June 4, 2026 22:37
@aparajon aparajon requested review from Kiran01bm and morgo as code owners June 4, 2026 22:37
@aparajon aparajon changed the title fix(planetscale): recover late migration context fix(vitess): recover late migration context Jun 4, 2026
Comment thread pkg/engine/planetscale/progress.go Outdated
Comment on lines +86 to +89
req.ResumeState = &engine.ResumeState{
MigrationContext: migrationContext,
Metadata: req.ResumeState.Metadata,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we better persist the recovered number somewhere - ex: in a column that exists (migration_context) so that it turns the per-poll 9-query sweep into a one-time cost ?

example impact summary from my agent session:

Vitess DB with 8 keyspaces; Apply Z hit the late-context bug → stored migration_context = ""

Every CLI `schemabot status` / UI poll for Apply Z:
   BuildResumeState(vad)  →  migration_context still ""  (never persisted)
   → discoverMigrationContext:
        ListKeyspaces            (1 PlanetScale API call)
        SHOW VITESS_MIGRATIONS   × 8 keyspaces  (8 vtgate queries)
   → result thrown away, not saved
   → next poll repeats the same 9 queries

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — this is resolved now in 7db975b2. LocalClient.Progress persists a recovered ResumeState.MigrationContext back into vitess_apply_data.migration_context when the stored value is missing/different, so the expensive context discovery sweep becomes a one-time recovery path instead of repeating on every poll. I also added a regression test for that path.

Co-authored-by: Amp <amp@ampcode.com>
@aparajon aparajon force-pushed the armand/planetscale-progress-context branch from 384004d to 7db975b Compare June 5, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants