-
Notifications
You must be signed in to change notification settings - Fork 15
REP-5219 Make migration-verifier process change events in batch. #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FGasper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this looks good, but I wonder if we can’t simplify it some.
Also the dupe-key handling seems like a point of potential concern. How necessary is that?
internal/verifier/change_stream.go
Outdated
| // HandleChangeStreamEvent performs the necessary work for change stream events that occur during | ||
| // operation. | ||
| func (verifier *Verifier) HandleChangeStreamEvent(ctx context.Context, changeEvent *ParsedEvent) error { | ||
| func (verifier *Verifier) HandleChangeStreamEvent(changeEvent *ParsedEvent) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what was the advantage of doing the buffering here versus closer to the read from the cursor? I was thinking we’d read all the events into a slice then pass that slice to this function (renamed HandleChangeStreamEvents).
We wouldn’t need separate handle-vs-flush methods in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I made the changes.
| } | ||
|
|
||
| eventsRead++ | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be error handling for cs.Err() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the if statement above.
internal/verifier/change_stream.go
Outdated
|
|
||
| // StartChangeStream starts the change stream. | ||
| func (verifier *Verifier) StartChangeStream(ctx context.Context) error { | ||
| func (verifier *Verifier) StartChangeStream(ctx context.Context, batchSize *int32) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a setting? Aren’t we just taking whatever batch size we get from the source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for a test to pass in a small batch size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was that necessary, though?
getMore always returns after a relatively short period of inactivity with just whatever events it happens to have seen. So the batch size should be naturally limited to whichever events you’ve triggered … ?
internal/verifier/change_stream.go
Outdated
| } | ||
|
|
||
| dbName, collName := SplitNamespace(namespace) | ||
| if err := verifier.insertRecheckDocs(ctx, dbName, collName, ids, dataSizes); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach will cause us to block between each namespace’s events, I think.
Could we instead persist all of the events for all namespaces at once?
internal/verifier/recheck.go
Outdated
| } | ||
|
|
||
| // Silence any duplicate key errors as recheck docs should have existed. | ||
| if mongo.IsDuplicateKeyError(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dupe key errors can exist alongside other errors.
We should probably instead look for non-simple dupe-key errors so that anything that isn’t a dupe-key will still be handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there shouldn't be any duplicate key error here because it's actually replace rather than insert.
|
|
||
| pprofInterval time.Duration | ||
|
|
||
| changeEventRecheckBuf ChangeEventRecheckBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: We shouldn’t need this if we switch to reading all events from the cursor into a slice.
| "the verifier should flush a recheck doc after a batch", | ||
| ) | ||
| suite.Require().Empty(verifier.changeEventRecheckBuf["testDB.testColl1"]) | ||
| suite.Require().Empty(verifier.changeEventRecheckBuf["testDB.testColl2"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid testing verifier internals here? It’d be ideal not to depend on them.
autarch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % some small stuff. No need for another review.
internal/verifier/recheck.go
Outdated
| func (verifier *Verifier) insertRecheckDocsWhileLocked( | ||
| ctx context.Context, | ||
| dbName, collName string, documentIDs []interface{}, dataSizes []int) error { | ||
| dbNames []string, collNames []string, documentIDs []interface{}, dataSizes []int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is a very weird way to format this. I think it'd more readable as one arg per line.
internal/verifier/change_stream.go
Outdated
| verifier.mux.Lock() | ||
| defer verifier.mux.Unlock() | ||
|
|
||
| return verifier.insertRecheckDocsWhileLocked(ctx, dbNames, collNames, docIDs, dataSizes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make more sense to put the locking in the insertRecheckDocsWhileLocked method. Right now every caller has to remember to use the mutex properly, which seems like a recipe for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it this way just this week to accommodate stats. If we’re going to change this I’d rather it’d be in a separate PR since the changes don’t really seem germane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the change event stats from this function in this PR. I think it makes sense to make this change here as well.
internal/verifier/change_stream.go
Outdated
| } | ||
|
|
||
| if eventsRead > 0 { | ||
| verifier.logger.Debug().Msgf("Received a batch of %d events", eventsRead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also want to log the 0 events case? I don't have enough context to know if that would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to log if there's no event. We've recommended users to run the m-v with debug log level, so I don't want too many log lines even at debug level.
internal/verifier/change_stream.go
Outdated
| } | ||
|
|
||
| if eventsRead > 0 { | ||
| verifier.logger.Debug().Msgf("Received a batch of %d events", eventsRead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| verifier.logger.Debug().Msgf("Received a batch of %d events", eventsRead) | |
| verifier.logger.Debug().Int("eventsCount", eventsRead).Msgf("Received a batch of events.") |
internal/verifier/change_stream.go
Outdated
| verifier.logger.Debug().Msgf("Received a batch of %d events", eventsRead) | ||
| } | ||
|
|
||
| err := verifier.HandleChangeStreamEvents(ctx, changeEventBatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this handling logic also be under the >0 condition?
| } | ||
|
|
||
| return gotEvent, errors.Wrap(cs.Err(), "change stream iteration failed") | ||
| return eventsRead > 0, errors.Wrap(cs.Err(), "change stream iteration failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange to defer handling cs.Err() until after HandleChangeStreamEvents() is called. It’d look a bit more idiomatic if the check were closer to the TryNext.
| dbNames []string, | ||
| collNames []string, | ||
| documentIDs []interface{}, | ||
| dataSizes []int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If we’re doing a separate slice for each category it’d seem a bit cleaner if there were 1 slice with a struct that contains these data points.
FGasper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
| suite.Require().Equal(VerificationStatus{TotalTasks: 1, FailedTasks: 1}, *status) | ||
|
|
||
| checkContinueChan <- struct{}{} | ||
| require.NoError(suite.T(), errGroup.Wait()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
This makes the change stream reader:
RemainingBatchSize == 0It also fixes the bug that the
eventRecordercounts each event twice.