Skip to content

sync_diff_inspector checkpoint integration test flakes on resumed chunk selection #12553

@joechenrh

Description

@joechenrh

Which jobs are flaking?

  • pingcap/tiflow/pull_syncdiff_integration_test

Which test(s) are flaking?

  • sync_diff_inspector/tests/sync_diff_inspector/checkpoint/run.sh
  • The flaky assertion is in the bucket-checkpoint resume path, where the script picks the first resumed chunk by sorting lowerBounds only.

Jenkins logs or GitHub Actions link

https://do.pingcap.net/jenkins/blue/organizations/jenkins/pingcap%2Ftiflow%2Fpull_syncdiff_integration_test/detail/pull_syncdiff_integration_test/815/pipeline

Observed failure from build 815:
- resumed candidate logged as `39 upperBounds= indexCode=0:1-250:0:1`
- old assertion expected a fixed resumed index pattern and failed on `first_chunk_index`

Anything else we need to know

  • Does this test exist for other branches as well?
    • Likely yes, if the same checkpoint test script is present.
  • Has there been a high frequency of failure lately?
    • The same symptom has been seen in other PRs.
  • Related parent tracking issue:
  • Root cause summary:
    • The checkpoint test assumes the first resumed chunk selected from logs is stable after sorting by lowerBounds only. In practice, multiple resumed chunks can share the same lower bound, so the script may pick a different valid resumed chunk and fail nondeterministically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    component/testUnit tests and integration tests component.severity/minortype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions