Skip to content

Conversation

@FGasper
Copy link
Collaborator

@FGasper FGasper commented Nov 20, 2024

This project’s test suite has been provisioning its own clusters via custom logic. That logic can’t provision sharded clusters, though.

REP-5299 (PR #47) documents a fix for sharding. To test that ticket’s fix we need to run the test suite against sharded clusters. Thus, this changeset:

  • Removes the suite’s custom provisioning logic.
  • Uses mtools and m to provision clusters.
  • Adds tests against 7.0 and 8.0, while removing tests against rapid releases like 6.2.

Additional notes/changes:

  • CI now runs the tests with race detection.
  • A handful of tests seem to break in sharded clusters. REP-5299 may resolve those, so I’m marking them as skipped for now.
  • macOS is removed from CI. There seems little reason to test it, and it’s reliably slower than Ubuntu.
  • CI now just uses Go’s stable version. There’s little reason to retain compatibility with prior Go releases.
  • The metadata cluster is always unsharded because there seems little reason to shard it. It’s also always MongoDB’s latest stable release because there seems little reason to test older releases.
  • The change stream thread now Close()s its change stream.
  • The logs about resume token timestamps now just say timestampTime rather than clockTime.
  • Since the tests now need buildInfo, this changeset refactors that to a struct and moves it to util.

@FGasper FGasper requested a review from tdq45gj November 20, 2024 15:31
@FGasper FGasper marked this pull request as draft November 20, 2024 15:42
@FGasper FGasper marked this pull request as ready for review November 20, 2024 17:24
Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % one comment

task := VerificationTask{QueryFilter: qfilter}

verifier := NewVerifier(VerifierSettings{})
//verifier.SetStartClean(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be commented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s the way it is upstream, curiously enough.

@FGasper FGasper merged commit 53ded13 into mongodb-labs:main Nov 20, 2024
33 checks passed
@FGasper FGasper deleted the REP-5306-tests_to_integration branch November 20, 2024 19:20
FGasper added a commit that referenced this pull request Nov 22, 2024
Previously to finalize its change stream the verifier would just TryNext() until no responses arrived. This works fine for unsharded clusters, but for sharded clusters it can potentially miss events.

This changeset adopts more “tried-and-true” logic: grab a timestamp via `appendOplogNote`, then iterate the change stream until the stream’s resume token’s timestamp meets or exceeds that timestamp. As verification of that change, this changeset removes the test skips that REP-5306 (PR #48) imposed for sharded clusters.

The timestamp-grabbing logic is unconditionally retryable, which requires an update to the retryer module to implement. This changeset refactors that module a bit to prevent further duplication.

Additional changes:
- go.sum is now part of the repo.
- Go 1.22 is now a minimum requirement.
- Tests no longer spew output when the worker delay is 0. The log message for that delay is also tweaked to show the delay.
- Mention in code of the defunct “refetch” collection is removed.
- The change stream no longer logs whenever a batch of events arrives. For busy migrations it would be excessive.
- Verifier instances in tests now wait 10ms rather than 15s between generations, which makes a few tests much quicker.
- A new CheckRunner improves the ergonomics of running CheckDriver in tests.
- TestGenerationalRechecking used to verify that an insert after WritesOff() gets handled as a recheck. This doesn’t make a lot of sense since WritesOff() means, by definition, that no more events should happen on the server. Thus, this changeset removes that test, which was proving rather race-prone with the rest of these changes.
- The tests now use majority read & write concern.
- CI now parallelizes a bit more of its setup.
- A new CheckRunner struct smooths over running Check from e2e tests.
- Tests now use a 10ms verificationStatusCheckInterval rather than 15-minute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants