Skip to content

Conversation

@abaowhy
Copy link
Contributor

@abaowhy abaowhy commented Apr 21, 2025

Description

Before the successful renaming, a session accessed the ghost table, which had already unlocked the original table.
There is a very small probability that other sessions dml operations on the original table will occur,
and this dml operation will appear in the original table after renaming, resulting in data loss.
https://cloud.tencent.com/developer/article/2303777

This PR avoid the dataloss in this situation by checking that the rename session is waiting for the lock of the original table in performance_schema.metadata_locks. Only then will the lock of the original table be released, and finally rename can be executed. If rename does not wait for the lock, it will exit gh-ost.

…hich had already unlocked the original table.

There is a very small probability that other sessions dml operations on the original table will occur,
and this dml operation will appear in the original table after renaming, resulting in data loss.
@arthurschreiber
Copy link
Contributor

@abaowhy Hey, thanks for the bug report and the code changes! 🙇‍♂️ ❤️

Is there a straightforward way to reproduce this issue? It'd be good to have a test cases that could show that a) this issue exists and b) the proposed code changes fix that issue. 🤔

@abaowhy
Copy link
Contributor Author

abaowhy commented Apr 24, 2025

@abaowhy Hey, thanks for the bug report and the code changes! 🙇‍♂️ ❤️

Is there a straightforward way to reproduce this issue? It'd be good to have a test cases that could show that a) this issue exists and b) the proposed code changes fix that issue. 🤔

Cutover single testing is quite difficult, so I first provided a picture to illustrate
image

@cenkore
Copy link

cenkore commented Apr 29, 2025

It looks similar to this PR #1269 .

@abaowhy
Copy link
Contributor Author

abaowhy commented Apr 30, 2025

@abaowhy Hey, thanks for the bug report and the code changes! 🙇‍♂️ ❤️

Is there a straightforward way to reproduce this issue? It'd be good to have a test cases that could show that a) this issue exists and b) the proposed code changes fix that issue. 🤔

@arthurschreiber I have provided test cases that issue can be stably reproduced on master and fixed in PR.

@abaowhy
Copy link
Contributor Author

abaowhy commented May 8, 2025

@meiji163 I have provided test cases, please take some time to review, and if there are any issues, I will promptly correct them.

@abaowhy
Copy link
Contributor Author

abaowhy commented May 12, 2025

@meiji163 I have fixed three issues during the inspection. Could you please take some time to check again

@abaowhy
Copy link
Contributor Author

abaowhy commented May 16, 2025

@meiji163 @timvaillancourt @rashiq I have provided test cases that issue can be stably reproduced on master and fixed in PR,please take some time to review, and if there are any issues, I will promptly correct them.

@meiji163
Copy link
Contributor

@abaowhy I was able to reproduce the bug with your test case. 谢谢!

@meiji163 meiji163 merged commit 7c3b9a1 into github:master Jun 27, 2025
8 checks passed
@smartinec
Copy link

This change is now requiring "performance_schema" to be enabled or the script fails with this error because of the query at the beginning of StateMetadataLockInstrument():

2025-11-21 17:48:22 ERROR query performance_schema.setup_instruments with name wait/lock/metadata/sql/mdl error: sql: no rows in result set
2025-11-21 17:48:22 ERROR Unable to enable metadata lock instrument, see further error details. Bailing out
2025-11-21 17:48:22 INFO Tearing down inspector
2025-11-21 17:48:22 INFO Tearing down applier
2025-11-21 17:48:22 INFO Tearing down streamer
2025-11-21 17:48:22 FATAL 2025-11-21 17:48:22 ERROR query performance_schema.setup_instruments with name wait/lock/metadata/sql/mdl error: sql: no rows in result set

Any way to bypass this check?

@smartinec smartinec mentioned this pull request Nov 24, 2025
2 tasks
@thenam153
Copy link

I’m planning to use gh-ost to run migrations on my system, but I’m hitting an error because Performance Schema is not enabled. Enabling Performance Schema requires a reboot, and since I’m running on Amazon Aurora (RDS), I’m trying to avoid a restart on the writer instance. Are there any alternative solutions?

[2026/01/09 08:53:41] [info] binlogsyncer.go:443 begin to sync binlog from position (mysql-bin-changelog.650881, 696)
[2026/01/09 08:53:41] [info] binlogsyncer.go:409 Connected to mysql 8.0.42 server
[2026/01/09 08:53:41] [info] binlogsyncer.go:868 rotate to (mysql-bin-changelog.650881, 696)
2026-01-09 08:53:42 ERROR query performance_schema.setup_instruments with name wait/lock/metadata/sql/mdl error: sql: no rows in result set
2026-01-09 08:53:42 ERROR Unable to enable metadata lock instrument, see further error details. Bailing out
2026-01-09 08:53:42 FATAL 2026-01-09 08:53:42 ERROR query performance_schema.setup_instruments with name wait/lock/metadata/sql/mdl error: sql: no rows in result set

thenam153 pushed a commit to thenam153/gh-ost that referenced this pull request Jan 11, 2026
meiji163 added a commit that referenced this pull request Jan 16, 2026
Since #1536, performance_schema.metadata_locks is required to check the rename session holds the metadata lock on the migrated table during cut-over. On some setups such as Aurora RDS performance_schema is not enabled by default and it may be infeasible to enable.
This PR adds --skip-metadata-lock-check flag to skip the check in this case.
@meiji163
Copy link
Contributor

@smartinec @thenam153 See --skip-metadata-lock-check from #1616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants