Skip to content

Conversation

@FGasper
Copy link
Collaborator

@FGasper FGasper commented Nov 6, 2024

The worker threads look for status=added tasks when finding work to do. When it grabs a task it sets status=processing; thus, no other threads will try to grab that task.

If the verifier crashes, though, any tasks that were status=processing need to be reset; otherwise no worker thread will ever pick them up, and the verifier will hang.

This changeset implements that reset of status=processing tasks so that the verifier will be able to finish after a restart.

Specifically:

  • If the primary task is found processing, this means that the initial listing of namespaces was interrupted. Thus, we now reset the primary task to added, and all other tasks are deleted.
  • If any collection-verification tasks are found processing, that means we never finished writing out the collection’s partitions and checking its metadata. Any existing partition tasks are unusuable, so they get deleted, and the collection-verification task is reset to added.
  • Any remaining processing partition tasks are reset to added.

The use of transactions to do these rollbacks means that the verifier’s metadata can no longer be a standalone mongod.

@FGasper FGasper requested review from autarch and tdq45gj November 6, 2024 18:32
@pmeredit
Copy link
Contributor

pmeredit commented Nov 6, 2024

Commenting here for posterity, this LGTM

Copy link
Collaborator

@autarch autarch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % I don't really have a ton of context on this code base.


func (verifier *Verifier) doInMetaTransaction(
ctx context.Context,
todo func(context.Context, mongo.SessionContext) error,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think todo is not the best name, given how we use TODO comments in our code base.

Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FGasper FGasper force-pushed the REP-5218-roll-back-tasks branch from b43dfb2 to 97f194c Compare November 8, 2024 15:54
@FGasper FGasper merged commit 0b9c7bd into mongodb-labs:main Nov 8, 2024
0 of 5 checks passed
@FGasper FGasper deleted the REP-5218-roll-back-tasks branch November 8, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants