What did you do?
Investigated failing TiCDC Kafka consumer tests on current upstream/master and traced duplicate-delivery handling in cmd/kafka-consumer.
The bug can be reproduced logically as follows:
- Consume a resolved/checkpoint message that advances a partition watermark.
- Replay an older or equivalent MQ marker message under TiCDC's at-least-once delivery semantics.
- Observe that
cmd/kafka-consumer either panics on resolved fallback or queues an equivalent DDL again.
The concrete problems are:
partitionProgress.updateWatermark treats resolved/checkpoint fallback as fatal in the consumer path.
appendDDL deduplicates DDL by pointer identity instead of logical DDL identity.
What did you expect to see?
cmd/kafka-consumer should tolerate replayed resolved/checkpoint markers and logically equivalent replayed DDL events, because duplicate MQ delivery is a normal scenario under TiCDC's at-least-once contract.
What did you see instead?
cmd/kafka-consumer can panic with a resolved fallback error when it sees a replayed resolved/checkpoint marker, and it can enqueue an equivalent DDL more than once because replayed DDL events are decoded into different objects.
This makes normal duplicate-delivery scenarios fail tests and can lead to repeated DDL execution attempts in the standalone Kafka consumer.
Versions of the cluster
Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):
Not collected. The issue was confirmed by code inspection and focused tests against current upstream/master.
Upstream TiKV version (execute tikv-server --version):
Not collected. The issue was confirmed by code inspection and focused tests against current upstream/master.
TiCDC version (execute cdc version):
Current upstream/master at investigation time. The fix was validated with:
go test ./cmd/kafka-consumer/...
What did you do?
Investigated failing TiCDC Kafka consumer tests on current
upstream/masterand traced duplicate-delivery handling incmd/kafka-consumer.The bug can be reproduced logically as follows:
cmd/kafka-consumereither panics on resolved fallback or queues an equivalent DDL again.The concrete problems are:
partitionProgress.updateWatermarktreats resolved/checkpoint fallback as fatal in the consumer path.appendDDLdeduplicates DDL by pointer identity instead of logical DDL identity.What did you expect to see?
cmd/kafka-consumershould tolerate replayed resolved/checkpoint markers and logically equivalent replayed DDL events, because duplicate MQ delivery is a normal scenario under TiCDC's at-least-once contract.What did you see instead?
cmd/kafka-consumercan panic with a resolved fallback error when it sees a replayed resolved/checkpoint marker, and it can enqueue an equivalent DDL more than once because replayed DDL events are decoded into different objects.This makes normal duplicate-delivery scenarios fail tests and can lead to repeated DDL execution attempts in the standalone Kafka consumer.
Versions of the cluster
Upstream TiDB cluster version (execute
SELECT tidb_version();in a MySQL client):Not collected. The issue was confirmed by code inspection and focused tests against current upstream/master.Upstream TiKV version (execute
tikv-server --version):Not collected. The issue was confirmed by code inspection and focused tests against current upstream/master.TiCDC version (execute
cdc version):