-
Notifications
You must be signed in to change notification settings - Fork 16
REP-5329 Teach retryer to run +1 callback in parallel. #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
33d4342 to
948c454
Compare
tdq45gj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % one small question
internal/verifier/compare.go
Outdated
| } | ||
|
|
||
| if cursor.Err() == nil { | ||
| state.NoteSuccess() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the retryer refreshes lastResetTime once the error group finishes. Is it redundant to call NoteSuccess before the callback function returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. Removed.
Since DDL events are forbidden, collection UUIDs are useless to migration-verifier. This changeset removes them from the retryer in anticipation of further changes.
This rewrites much of the retryer so that it can run multiple callbacks in parallel. The parallel retry logic is plugged into the verifier’s document comparison logic. This also synchronizes migration-verifier’s internal list of retryable errors.
50e5938 to
5a4b35d
Compare
autarch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
edobranov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We need retry logic around
findcursor iteration, but the retryer’s IterationSuccess() method won’t work properly with how we compare partitions document-by-document. If, for example, the source hangs but we call IterationSuccess() on every destination success, we’ll never time out the retries. But we need IterationSuccess() to work or else there are no retries for partitions that take a long time to read.This changeset fixes that by rewriting the retryer to take multiple callbacks. Each callback now takes a context that the retryer uses to stop the function whenever another thread has failed.
Since migration-verifier doesn’t handle DDL events, there’s no point to handling collection UUIDs. Thus, the initial commits in this PR resolve longstanding technical debt.
This also synchronizes migration-verifier’s internal list of transient errors with mongosync’s.