Skip to content

Conversation

@FGasper
Copy link
Collaborator

@FGasper FGasper commented Nov 29, 2024

We need retry logic around find cursor iteration, but the retryer’s IterationSuccess() method won’t work properly with how we compare partitions document-by-document. If, for example, the source hangs but we call IterationSuccess() on every destination success, we’ll never time out the retries. But we need IterationSuccess() to work or else there are no retries for partitions that take a long time to read.

This changeset fixes that by rewriting the retryer to take multiple callbacks. Each callback now takes a context that the retryer uses to stop the function whenever another thread has failed.

Since migration-verifier doesn’t handle DDL events, there’s no point to handling collection UUIDs. Thus, the initial commits in this PR resolve longstanding technical debt.

This also synchronizes migration-verifier’s internal list of transient errors with mongosync’s.

@FGasper FGasper force-pushed the REP-5329-simplify-retryer branch 2 times, most recently from 33d4342 to 948c454 Compare November 29, 2024 19:18
@FGasper FGasper marked this pull request as ready for review December 2, 2024 13:00
Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % one small question

}

if cursor.Err() == nil {
state.NoteSuccess()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the retryer refreshes lastResetTime once the error group finishes. Is it redundant to call NoteSuccess before the callback function returns?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. Removed.

@FGasper FGasper removed the request for review from autarch December 3, 2024 16:34
Since DDL events are forbidden, collection UUIDs are useless to
migration-verifier. This changeset removes them from the retryer
in anticipation of further changes.
This rewrites much of the retryer so that it can run multiple
callbacks in parallel. The parallel retry logic is plugged into
the verifier’s document comparison logic.

This also synchronizes migration-verifier’s internal list of
retryable errors.
@FGasper FGasper force-pushed the REP-5329-simplify-retryer branch from 50e5938 to 5a4b35d Compare December 3, 2024 16:37
@FGasper FGasper requested a review from autarch December 3, 2024 16:37
Copy link
Collaborator

@autarch autarch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link

@edobranov edobranov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FGasper FGasper merged commit 31cc703 into mongodb-labs:main Dec 3, 2024
33 checks passed
@FGasper FGasper deleted the REP-5329-simplify-retryer branch December 3, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants