Skip to content

[Feature][MySQL CDC] MySQL cdc support start by time #9144

@davidzollo

Description

@davidzollo

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

MySQL CDC support start by time.

Currently, the MySQL CDC source supports starting from specific binlog position or GTID. However, in many real-world scenarios, users expect to start a synchronization job based on a human-friendly timestamp, such as:

  • Resume from "2024-04-01 10:00:00" after failure
  • Backfill data since a specific time window

Adding support for start-time (e.g. 2024-04-10 08:00:00) will greatly simplify CDC task configuration and make SeaTunnel more user-friendly in operational scenarios.


source {
  MySQL-CDC {
    hostname = "xxx"
    port = 3306
    ...
    start-time = "2024-04-10 08:00:00"  # Suggested new feature
  }
}

Error handling:

Case Behavior
start-time too old, binlog already purged Fail fast with clear error:Start time is earlier than binlog available. Earliest = 2024-04-08 11:00:00
start-time too new (after current time) Allowed, CDC will wait until matching binlog is produced
Time parsing failure Job fails with IllegalArgumentException

User Scenario:

In real-world CDC scenarios, users often face recovery requirements like:

“I want to resume this CDC pipeline from 2024-04-10 00:00:00”

“I want to only capture changes after yesterday 08:00”

“Binlog filename is not available, but timestamp is known”

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions