Skip to content

online-ddl: manual cleanup of gh-ost tables incorrectly triggers ALTER TABLE replay #12564

@Takashi-kun

Description

@Takashi-kun

Background

When a gh-ost migration fails, leftover tables such as _t_gho, _t_ghc, and _t_del must be cleaned up manually to avoid excessive load from dropping large tables (e.g., https://www.percona.com/blog/speed-up-your-large-table-drops-in-mysql/).

Problem

DM's online-ddl handler identifies gh-ost shadow/trash tables purely by table-name regex (e.g. ^_(.+)_gho$, ^_(.+)_del$). This means any DDL involving a matching table name is treated as an online DDL operation, regardless of who issued it.

When an operator manually renames the leftover ghost table for cleanup:

RENAME TABLE _t_gho TO will_be_deleted_t_gho;

DM sees GhostTable → RealTable and tries to replay the previously recorded ALTER TABLE statements against will_be_deleted_t_gho. Since that table doesn't exist in TiDB, the apply fails and blocks replication.

Fix

gh-ost embeds a /* gh-ost */ comment in all SQL statements it issues — CREATE TABLE, ALTER TABLE, RENAME TABLE, DROP TABLE, etc.:

create /* gh-ost */ table `schema`.`_t_gho` like `schema`.`t`
alter /* gh-ost */ table `schema`.`_t_gho` add column `n` int
rename /* gh-ost */ table `schema`.`t` to `schema`.`_t_del`, `schema`.`_t_gho` to `schema`.`t`

This comment is preserved verbatim in the MySQL binlog QUERY event.

Since ghost/trash tables (_gho, _ghc, _del) never exist in TiDB (DM does not replicate their creation downstream), any DDL event touching these tables that lacks the /* gh-ost */ comment is definitionally a manual operation and should be silently ignored. This means:

  • processOneDDL skips calling Apply() for any DDL whose first source table is a ghost or trash table and whose raw binlog SQL does not contain the /* gh-ost */ marker
  • Returns nil (no downstream DDL emitted), allowing replication to continue unblocked

Note on the initial fix (PR #12565): The first approach only guarded the RENAME TABLE ghost → real case and returned the statement as-is. However, since the ghost table never exists in TiDB, attempting to execute that RENAME downstream would also fail. The revised approach ignores the event entirely, which is the correct behavior.

Scope

This change is guarded by if s.cfg.OnlineDDL — it has no effect when online-ddl is disabled.

Limitations / Notes

  • This only applies to gh-ost. pt-osc does not embed a comparable comment in its statements, so pt-osc cleanup scenarios need a separate approach.

Steps to Reproduce

  1. Start a gh-ost migration that fails, leaving behind _t_gho with stored DDL metadata in DM.
  2. Manually run RENAME TABLE _t_gho TO will_be_deleted_t_gho upstream.
  3. Observe DM attempting (and failing) to apply the stored ALTER TABLE against will_be_deleted_t_gho in TiDB, blocking replication.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/dmIssues or PRs related to DM.contributionThis PR is from a community contributor.first-time-contributorIndicates that the PR was contributed by an external member and is a first-time contributor.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions