Skip to content

cloud: Premium supports data migration#22821

Open
alastori wants to merge 9 commits intopingcap:release-8.5from
alastori:premium-data-migration-8.5
Open

cloud: Premium supports data migration#22821
alastori wants to merge 9 commits intopingcap:release-8.5from
alastori:premium-data-migration-8.5

Conversation

@alastori
Copy link
Copy Markdown
Collaborator

What is changed, added or deleted?

This PR adds documentation for the Data Migration feature on TiDB Cloud Premium, which is launching in Public Preview, and extends the canonical Cloud DM docs to render correctly for Premium readers.

New file:

  • tidb-cloud/premium/premium-data-migration.md: Premium-tier overview that mirrors the structure of premium-export.md. Covers the 4-step wizard (Configure source and target connection, Choose objects to be migrated, Pre-check, Review and start migration), Public Preview limitations, the Logical / Physical existing-data-migration mode choice (with PITR / changefeed and concurrent-job caveats for physical mode), supported source databases, prerequisites, privileges (including PROCESS), and post-creation job management (Pause / Resume / Delete).

Modified files:

  • TOC-tidb-cloud-premium.md: adds the new Premium-tier overview, plus the canonical Cloud DM doc and the incremental-only Cloud DM doc as siblings, so Premium customers can reach the detailed reference content from the Premium navigation.
  • tidb-cloud/migrate-from-mysql-using-data-migration.md: adds Premium tier rendering — 16 inline tier-name placeholders gain a Premium variant, plus three new Premium-variant content blocks (Public Preview note, supported sources matrix, and the Physical / Logical existing-data-migration mode discussion including the physical-mode caveats).
  • tidb-cloud/migrate-incremental-data-from-mysql-using-data-migration.md: adds Premium tier-name substitutions so the incremental-only guide renders cleanly when accessed from the Premium TOC.

The Dedicated and Essential renderings of all three docs are unchanged; the Premium additions are purely additive <CustomContent plan=\"premium\"> blocks alongside the existing Dedicated and Essential blocks.

Verification

The Premium-tier overview was verified end-to-end against the dev environment with a real MySQL source connection, walking all four wizard steps (Configure → Choose objects → Pre-check → Review), confirming wizard text labels, dropdown options, and behavioral details (such as the 11-item pre-check, the Pre-check warnings dialog, and the Logical-default existing-data-migration mode). Wizard-text drift between Premium and the canonical doc was reconciled in this PR (for example: "Pre-check" hyphenated, "Check Again" instead of "Recheck", "Incremental only" instead of "Incremental Data Only").

Which TiDB version(s) do your changes apply to?

  • release-8.5 (current TiDB Cloud release)

What is the related PR or file when changing an API or RFC?

N/A — documentation only.

Do your changes match any of the following descriptions?

  • Addition of new document
  • Addition of <CustomContent plan=\"premium\"> blocks in existing docs
  • Update of structure (TOC entries)

Add a new Public Preview guide for using the Data Migration feature
on TiDB Cloud Premium, plus the corresponding entry in the Premium
TOC. Mirrors the structure of premium-export.md.
- Update wizard structure to 4 steps (add Precheck as Step 3)
- Tighten Job Name constraints language to match wizard helper text
- Note that Private Link is in development and not yet generally available

Verified against the Premium DM proto enums and the dev wizard text;
prod release tag does not yet include the Private Link backend
support, so the doc deliberately documents Public-only connectivity.
The 60-second safe-mode behavior is implemented in the legacy DM
stack (used by Dedicated and Essential) and does not apply to the
Premium DM service. Verified via dataflow-service-ng/app/models/
premium_dm/ which contains no safe-mode references.
Verified the complete wizard flow against the dev environment with a
real MySQL source connection. Several corrections:

- Step 2 has two controls under Migration Type: "Migration process"
  (Full + Incremental / Incremental only) and "Existing data migration
  mode" (Logical default / Physical). Document both.
- Object selection is an All / Customize toggle, with Customize
  revealing a transfer-list pattern between source and selected.
- Step 3 is named "Pre-check" (hyphenated) in the UI; "Check Again"
  re-runs; warnings can be ignored via a confirmation dialog.
- Mode label is "Incremental only", not "Incremental Data Only".
- Step 4 review shows three sections: Job Configuration, Source
  Connection Profile, Target Connection Profile.
- PROCESS privilege is also recommended; pre-check warns when missing.
Safe mode is implemented in the tiflow DM kernel (used by Premium DM
via the agent layer), not in the cloud control plane. The earlier
removal was based on a search of the dataflow-service repo only,
which is incomplete. Restoring the 60-second safe-mode note so the
Premium doc matches the underlying replication engine behavior.
Customers reading the new Premium DM guide cross-reference the
canonical Cloud DM doc for binary-log setup, privileges, and
limitations. Without Premium variants in the canonical doc, those
links would either render Dedicated-default content or leave tier
placeholders blank.

Changes:

- TOC-tidb-cloud-premium.md: add the canonical and incremental-only
  Cloud DM docs as siblings of premium-data-migration.md so Premium
  customers can navigate to them.
- tidb-cloud/migrate-from-mysql-using-data-migration.md: add Premium
  tier to all inline tier-name placeholders, plus three new Premium
  variant blocks: Public Preview note, supported sources matrix, and
  the Physical / Logical mode discussion (including PITR /
  changefeed and concurrent-job caveats for physical mode).
- tidb-cloud/migrate-incremental-data-from-mysql-using-data-migration.md:
  add Premium tier to all inline tier-name placeholders.
- tidb-cloud/premium/premium-data-migration.md: add the two
  physical-mode caveats (PITR / changefeed; concurrent-job limit)
  inline so they are visible in the Premium-tier overview without
  requiring readers to click through.

The Dedicated and Essential renderings of all three docs are
unchanged.
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hfxsd for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added missing-translation-status This PR does not have translation status info. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 28, 2026
@alastori
Copy link
Copy Markdown
Collaborator Author

cc @Oreoxmt

The canonical Cloud DM doc anchors are:
- "grant-required-privileges-to-the-migration-user-in-the-source-mysql-database"
  (note "source-mysql", not just "source")
- "grant-required-privileges-for-migration" (parent ### section; the
  target-side ## #### heading uses CustomContent variants and the
  rendered anchor is not stable, so link to the parent instead)

Detected by the internal-links-anchors CI job on PR pingcap#22821.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for migrating data to TiDB Cloud Premium using the Data Migration feature, including a new guide and updates to existing migration docs to incorporate Premium-specific details like logical and physical migration modes. The review feedback focuses on style guide adherence, specifically recommending the removal of passive voice, ensuring consistent terminology, using backticks for SQL keywords, and correcting minor grammatical and tense issues.

<CustomContent plan="premium">

- For {{{ .premium }}}, both logical mode (default) and physical mode are supported. Logical mode exports rows as SQL statements and replays them on the target instance, consuming Request Capacity Units (RCUs) on the target during the load. Physical mode uses `IMPORT INTO` on the target instance and is recommended for large datasets where load throughput and cost are priorities.
- When you use physical mode and the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the {{{ .premium }}} instance. Otherwise, the migration job will be stuck. If you need to enable PITR or have any changefeed, use logical mode instead to migrate data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid using passive voice. State the subject clearly.

Suggested change
- When you use physical mode and the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the {{{ .premium }}} instance. Otherwise, the migration job will be stuck. If you need to enable PITR or have any changefeed, use logical mode instead to migrate data.
- When you use physical mode and the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the {{{ .premium }}} instance. Otherwise, the migration job stops. If you need to enable PITR or have any changefeed, use logical mode instead to migrate data.
References
  1. Avoid passive voice overuse. (link)


> **Note:**
>
> The Data Migration feature for {{{ .premium }}} is currently in Public Preview. During Public Preview, the source database must be reachable over a public network endpoint, and the source connection cannot be reused across migration jobs. For details, see [Limitations](#limitations).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid using passive voice. State the subject clearly.

Suggested change
> The Data Migration feature for {{{ .premium }}} is currently in Public Preview. During Public Preview, the source database must be reachable over a public network endpoint, and the source connection cannot be reused across migration jobs. For details, see [Limitations](#limitations).
The Data Migration feature for {{{ .premium }}} is currently in Public Preview. During Public Preview, the source database must be reachable over a public network endpoint, and you cannot reuse the source connection across migration jobs. For details, see [Limitations](#limitations).
References
  1. Avoid passive voice overuse. (link)


When you use physical mode, the following limitations apply:

- After the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the {{{ .premium }}} instance. Otherwise, the migration job will be stuck. If you need to enable PITR or have any changefeed, use logical mode instead.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid using passive voice. State the subject clearly.

Suggested change
- After the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the {{{ .premium }}} instance. Otherwise, the migration job will be stuck. If you need to enable PITR or have any changefeed, use logical mode instead.
- After the migration job has started, do **NOT** enable PITR (Point-in-time Recovery) or have any changefeed on the {{{ .premium }}} instance. Otherwise, the migration job stops. If you need to enable PITR or have any changefeed, use logical mode instead.
References
  1. Avoid passive voice overuse. (link)

### General limitations

- The system databases `mysql`, `information_schema`, `performance_schema`, and `sys` are filtered out and not migrated, even if you select all databases.
- During existing data migration, if the target database already contains the table to be migrated and there are duplicate keys, the rows with duplicate keys are replaced.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid using passive voice. State the subject clearly.

Suggested change
- During existing data migration, if the target database already contains the table to be migrated and there are duplicate keys, the rows with duplicate keys are replaced.
- During existing data migration, if the target database already contains the table to be migrated and there are duplicate keys, TiDB Cloud replaces the rows with duplicate keys.
References
  1. Avoid passive voice overuse. (link)


- The system databases `mysql`, `information_schema`, `performance_schema`, and `sys` are filtered out and not migrated, even if you select all databases.
- During existing data migration, if the target database already contains the table to be migrated and there are duplicate keys, the rows with duplicate keys are replaced.
- During incremental data migration, if a migration job recovers from an abrupt error, it might enter safe mode for 60 seconds. During safe mode, `INSERT` statements are migrated as `REPLACE`, and `UPDATE` statements as `DELETE` and `REPLACE`. For source tables without primary keys or non-null unique indexes, this can result in duplicated rows in the target instance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Use backticks for SQL keywords and avoid passive voice.

Suggested change
- During incremental data migration, if a migration job recovers from an abrupt error, it might enter safe mode for 60 seconds. During safe mode, `INSERT` statements are migrated as `REPLACE`, and `UPDATE` statements as `DELETE` and `REPLACE`. For source tables without primary keys or non-null unique indexes, this can result in duplicated rows in the target instance.
- During incremental data migration, if a migration job recovers from an abrupt error, it might enter safe mode for 60 seconds. During safe mode, TiDB Cloud migrates `INSERT` statements as `REPLACE`, and `UPDATE` statements as `DELETE` and `REPLACE`. For source tables without primary keys or non-null unique indexes, this can result in duplicated rows in the target instance.
References
  1. Code snippets, command names, options, and paths should be in backticks. (link)
  2. Avoid passive voice overuse. (link)


4. On the **Configure source and target connection** step, enter the following information:

- **Job Name**: a name for the migration job. The default value is `migration_job_{timestamp}`. The name must start with a letter, can contain letters, numbers, underscores (`_`), and hyphens (`-`), and must be less than 60 characters.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Use 'fewer' for countable items and prefer present tense.

Suggested change
- **Job Name**: a name for the migration job. The default value is `migration_job_{timestamp}`. The name must start with a letter, can contain letters, numbers, underscores (`_`), and hyphens (`-`), and must be less than 60 characters.
- **Job Name**: a name for the migration job. The default value is `migration_job_{timestamp}`. The name must start with a letter, contains letters, numbers, underscores (`_`), and hyphens (`-`), and must be fewer than 60 characters.
References
  1. Prefer present tense unless describing historical behavior. (link)


In the **Select Objects to Migrate** section, choose:

- **All** (default): migrate every database and table on the source. The system databases (`mysql`, `information_schema`, `performance_schema`, `sys`) are excluded automatically.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Avoid using passive voice. State the subject clearly.

Suggested change
- **All** (default): migrate every database and table on the source. The system databases (`mysql`, `information_schema`, `performance_schema`, `sys`) are excluded automatically.
- **All** (default): migrate every database and table on the source. TiDB Cloud automatically excludes the system databases (`mysql`, `information_schema`, `performance_schema`, `sys`).
References
  1. Avoid passive voice overuse. (link)


### Step 3: Pre-check

The console runs the pre-check against the source database, network connectivity, and the target {{{ .premium }}} instance. The progress bar shows **Running {percentage}%** while checks execute, and **Finished 100%** when complete. The summary line reports total items, completed, passed, with warning, and failed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Grammar correction: 'with warnings' instead of 'with warning'.

Suggested change
The console runs the pre-check against the source database, network connectivity, and the target {{{ .premium }}} instance. The progress bar shows **Running {percentage}%** while checks execute, and **Finished 100%** when complete. The summary line reports total items, completed, passed, with warning, and failed.
The console runs the pre-check against the source database, network connectivity, and the target {{{ .premium }}} instance. The progress bar shows **Running {percentage}%** while checks execute, and **Finished 100%** when complete. The summary line reports the total number of items, including those that are completed, passed, with warnings, or failed.
References
  1. Correct English grammar, spelling, and punctuation mistakes, if any. (link)

The review page shows three sections summarizing the migration job:

- **Job Configuration**: job name and migration type.
- **Source Connection Profile**: data source, host, port, connectivity method, username, SSL/TLS status, selected objects, and import mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Terminology consistency: use 'existing data migration mode' as defined earlier in the document.

Suggested change
- **Source Connection Profile**: data source, host, port, connectivity method, username, SSL/TLS status, selected objects, and import mode.
- **Source Connection Profile**: data source, host, port, connectivity method, username, SSL/TLS status, selected objects, and existing data migration mode.
References
  1. Use consistent terminology. (link)

Apply 7 of 9 Gemini suggestions on PR pingcap#22821, all marked low
priority and aligned with pingcap/docs styleguide:

- Active voice: replace "the source connection cannot be reused"
  with "you cannot reuse the source connection".
- Active voice: replace "rows ... are replaced" with "TiDB Cloud
  replaces the rows" in existing-data limitation.
- Active voice + subject clarity: replace "INSERT statements are
  migrated as ..." with "TiDB Cloud migrates INSERT statements
  as ...".
- Active voice: replace "the migration job will be stuck" with
  "the migration job stops" (Premium DM doc + canonical Cloud DM
  doc).
- Active voice + subject clarity: replace "system databases ... are
  excluded automatically" with "TiDB Cloud automatically excludes
  the system databases".
- Grammar: "with warning" -> "with warnings"; rephrase pre-check
  summary line for clarity.
- Terminology consistency: in Step 4 review section, replace
  "import mode" with "the existing data migration mode (shown as
  Import Mode on the review page)" to bridge the wizard's two
  labels for the same concept.

Skipped: the suggestion to use "fewer than 60 characters" /
"contains letters" instead of "less than 60 characters" / "can
contain letters" is intentionally rejected; the current wording
mirrors the wizard's helper text verbatim.
End-to-end wizard verification on the dev cluster created a real
migration job (id dmtskc3frek3p5fhy7ixu6wpj7cy2r4) and inspected
the post-creation experience:

- The Job Detail page does not expose action buttons (just Summary
  and Progress panels).
- The list-page actions menu (the "..." button at the end of each
  row) shows different items based on job status. While the job is
  in Creating state, only View and Delete are visible. Pause and
  Resume become available once the job reaches a running or paused
  state.

Doc previously implied Pause/Resume/Delete were always available
from the detail page or the list. Replaced with status-aware
phrasing and noted the Creating-state subset explicitly.

The dev cluster job remained in Creating for 9+ minutes without
transitioning, matching the March AS-IS report KI-5 (dev
infrastructure issue, not a feature gap), so Pause/Resume
behavior was confirmed via API surface (PausePremiumMigration /
ResumePremiumMigration RPCs in proto) rather than the UI.
@Oreoxmt
Copy link
Copy Markdown
Collaborator

Oreoxmt commented Apr 28, 2026

/cc @Oreoxmt

@ti-chi-bot ti-chi-bot Bot requested a review from Oreoxmt April 28, 2026 05:57
@Oreoxmt
Copy link
Copy Markdown
Collaborator

Oreoxmt commented Apr 28, 2026

/assign

@Oreoxmt Oreoxmt added translation/no-need No need to translate this PR. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. area/tidb-cloud This PR relates to the area of TiDB Cloud. for-cloud-release This PR is related to TiDB Cloud release. labels Apr 28, 2026
@ti-chi-bot ti-chi-bot Bot removed the missing-translation-status This PR does not have translation status info. label Apr 28, 2026
Copy link
Copy Markdown
Collaborator

@Oreoxmt Oreoxmt Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alastori I suggest removing tidb-cloud/premium/premium-data-migration.md, or reducing it to a short overview page only. Detailed supported source databases, prerequisites, and migration steps are already covered in tidb-cloud/migrate-from-mysql-using-data-migration.md and tidb-cloud/migrate-incremental-data-from-mysql-using-data-migration.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tidb-cloud This PR relates to the area of TiDB Cloud. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. for-cloud-release This PR is related to TiDB Cloud release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. translation/no-need No need to translate this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants