Skip to content

PCSM-237: Fix intermittent test failures due to race condition in wait_for_current_optime()#156

Merged
inelpandzic merged 8 commits intomainfrom
PCSM-237-current-op-time-race-condition
Dec 2, 2025
Merged

PCSM-237: Fix intermittent test failures due to race condition in wait_for_current_optime()#156
inelpandzic merged 8 commits intomainfrom
PCSM-237-current-op-time-race-condition

Conversation

@inelpandzic
Copy link
Collaborator

@inelpandzic inelpandzic commented Dec 1, 2025

PCSM-237 Powered by Pull Request Badge

Problem

Tests would intermittently fail with assertions indicating that operations performed on the source were not replicated to the target before the test comparison ran.

Root cause: The wait_for_current_optime() method had two critical race conditions:

Race Condition 1: Clock Advancement with ping Command

The original implementation used the ping command to capture a cluster timestamp:

  1. ping doesn't ensure clusterTime returned is after all previous write operations

    • The ping command advances the cluster time view but doesn't guarantee that prior operations have been written to the oplog
    • The collection creation operation might still be in-flight when we capture the timestamp
  2. BSON Timestamp comparison causes premature wait completion

    • BSON Timestamps have two components: Timestamp(seconds, increment)
    • Comparison prioritizes seconds over increment: if seconds differ, the increment is ignored
    • When MongoDB's clock advances to the next second during the wait:
      • We're waiting for Timestamp(S, 10) where S is the current second
      • PCSM has only replicated up to Timestamp(S, 2)
      • Collection creation at Timestamp(S, 3-9) is pending
      • Clock advances: PCSM reaches Timestamp(S+1, 1) (new second)
      • Condition Timestamp(S, 10) <= Timestamp(S+1, 1) evaluates to TRUE
      • Wait exits successfully despite operations from second S never being replicated
  3. Result: Test comparison runs before the collection creation operation is replicated.

Race Condition 2: Two-Phase Replication Lag

Even after fixing the timestamp issue, tests still failed intermittently because PCSM's replication has two phases:

  1. Phase 1 (fast): Read operation from oplog → update lastReplicatedOpTime in status
  2. Phase 2 (slower): Write the operation to the target database

When wait_for_current_optime() returned immediately after detecting curr_optime <= last_applied_op, Phase 1 was complete but Phase 2 might still be in progress. The test comparison would run before operations were fully written to the target database.

This was discovered when adding debug print statements accidentally fixed the tests - the print statements added ~100-200ms of delay (HTTP calls + string formatting + I/O) which gave Phase 2 enough time to complete.

Solution

  1. Ensure clusterTime reflects all prior write ops:

    • Use appendOplogNote to get clusterTime in wait_for_current_optime()
  2. Add post-wait delay for Phase 2 completion:

    • After the wait condition succeeds, sleep for 300ms before returning
    • Ensures PCSM has finished writing operations to the target database
    • Makes the implicit timing dependency explicit and deterministic

This fix eliminates both race conditions by ensuring proper operation ordering in the oplog and allowing sufficient time for operations to be fully applied to the target database, making tests deterministic.

Copy link
Member

@boris-ilijic boris-ilijic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're waiting for Timestamp(S, 10) where S is the current second

should be Timestamp(S, 1)

For ts comparison it's possible to use: https://pkg.go.dev/go.mongodb.org/mongo-driver/bson/primitive#Timestamp

@inelpandzic inelpandzic merged commit 100da38 into main Dec 2, 2025
29 of 31 checks passed
@inelpandzic inelpandzic deleted the PCSM-237-current-op-time-race-condition branch December 2, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants