Skip to content

kafka-consumer(ticdc): tolerate replayed resolved and DDL events#12596

Open
wlwilliamx wants to merge 1 commit intopingcap:masterfrom
wlwilliamx:fix/consumer-deal-duplicate-msg
Open

kafka-consumer(ticdc): tolerate replayed resolved and DDL events#12596
wlwilliamx wants to merge 1 commit intopingcap:masterfrom
wlwilliamx:fix/consumer-deal-duplicate-msg

Conversation

@wlwilliamx
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #12595

What is changed and how it works?

  • treat replayed resolved/checkpoint fallback in cmd/kafka-consumer as duplicate delivery instead of a fatal error
  • deduplicate replayed DDL events by logical DDL identity instead of pointer identity
  • add regression tests covering replayed resolved/checkpoint handling and equivalent versus split DDL events

Check List

Tests

  • Unit test
  • Manual test

Questions

Will it cause performance regression or break compatibility?

No. This only makes the standalone Kafka consumer tolerate duplicate MQ delivery in line with TiCDC's at-least-once behavior.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix `cdc_kafka_consumer` to tolerate replayed resolved/checkpoint and equivalent DDL messages under duplicate MQ delivery.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 8, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 8, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign benjamin2037 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 8, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the Kafka consumer's resilience to replayed events by relaxing watermark fallback checks and introducing logical DDL deduplication. A critical feedback point highlights that the current deduplication logic for split DDLs (e.g., from RENAME TABLES) is insufficient when replaying sequences, as it only compares against the single most recent event. The reviewer suggests using CommitTs and Seq ordering to correctly identify and ignore replayed DDLs.

// So to tell if a DDL is redundant or not, we must check the equivalence of
// the current DDL and the DDL with max CommitTs.
if ddl == w.ddlWithMaxCommitTs {
if isEquivalentDDLEvent(ddl, w.ddlWithMaxCommitTs) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The deduplication logic here only checks if the incoming DDL is equivalent to the last appended DDL. This is insufficient for replayed sequences of split DDLs (e.g., from a RENAME TABLES job).

If a sequence like [DDL_A(Seq:1), DDL_B(Seq:2)] is replayed, DDL_A will be compared against DDL_B. Since their Seq numbers differ, isEquivalentDDLEvent returns false, and DDL_A is appended again. This leads to redundant DDL execution and a consumer panic.

Since DDLs are strictly ordered by (CommitTs, Seq) on partition 0, any DDL with a Seq less than or equal to the current maximum for the same CommitTs should be ignored. It is also recommended to add a regression test for replayed sequences of split DDLs.

Suggested change
if isEquivalentDDLEvent(ddl, w.ddlWithMaxCommitTs) {
if w.ddlWithMaxCommitTs != nil && ddl.CommitTs == w.ddlWithMaxCommitTs.CommitTs && ddl.Seq <= w.ddlWithMaxCommitTs.Seq {

@wlwilliamx wlwilliamx marked this pull request as ready for review April 8, 2026 10:36
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026
@wlwilliamx
Copy link
Copy Markdown
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cmd/kafka-consumer does not tolerate replayed resolved and DDL messages

1 participant