Skip to content

fix: prevent duplicate rows during MSSQL CDC backfill by waiting for the async capture to agent catch-up#843

Open
vishalm0509 wants to merge 13 commits intostagingfrom
fix/mssql_cdc_cursor
Open

fix: prevent duplicate rows during MSSQL CDC backfill by waiting for the async capture to agent catch-up#843
vishalm0509 wants to merge 13 commits intostagingfrom
fix/mssql_cdc_cursor

Conversation

@vishalm0509
Copy link
Collaborator

@vishalm0509 vishalm0509 commented Feb 23, 2026

Description

  • MSSQL's CDC capture agent runs asynchronously, so sys.fn_cdc_get_max_lsn() can lag behind the
    transaction log at the start of a sync. This causes the backfill to read rows that later appear
    again in the CDC change stream, producing duplicates.
  • Before recording the initial CDC cursor, we now poll sys.dm_cdc_log_scan_sessions until the
    agent completes a non-throttled scan (tran_count < maxtrans), ensuring the CDC max LSN reflects
    all committed transactions.
  • Gracefully degrades: if VIEW DATABASE STATE / VIEW DATABASE PERFORMANCE STATE permission is
    missing, falls back to the previous behavior with a warning.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Scenario A
  • Scenario B

Screenshots or Recordings

Documentation

Related PR's (If Any):

@vishalm0509 vishalm0509 marked this pull request as ready for review February 24, 2026 05:45

if !hasPermission {
logger.Warnf("VIEW DATABASE STATE permission not granted; LSN may be lagging behind the transaction log")
return m.currentMaxLSN(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this split fallback logic at one place

Co-authored-by: vishal-datazip <vishal@datazip.io>
@vishalm0509 vishalm0509 temporarily deployed to integration_tests March 15, 2026 08:02 — with GitHub Actions Inactive
@vishalm0509 vishalm0509 deployed to integration_tests March 15, 2026 10:17 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants