Skip to content

Conversation

@FGasper
Copy link
Collaborator

@FGasper FGasper commented Nov 15, 2024

Previously all change events’ document sizes were recorded “pessimistically” as 16 MiB. This helped to avoid OOMs. It came at a cost, though: when the recheck queue is converted to recheck tasks, those tasks are sized so as to approximate the configured partition size. Thus, if the partition size was 400 MiB (the default), only 25 change events could fit into a recheck task. If there are 250,000 pending rechecks—not unfeasible for a large, busy data set after generation 0—that’s 10,000 tasks to create and perform, which is inefficient.

PR #34 all but eliminates the OOMs, which undercuts that “pessimism”’s benefit. It makes more sense now to allow for the possibility of large recheck tasks in order to minimize the number of tasks. Moreover, we can get pretty good confidence about document sizes from change events anyway:

  • Insert & replace events always include the fullDocument.
  • Update events can be configured to include the current fullDocument.
  • Delete events refer to a document that probably no longer exists, so we can safely estimate its size to be “small”.

This changeset, then:

  1. configures the change stream to include fullDocument in update events, and
  2. records document sizes from the change event.

Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall. I wonder if it's necessary to check the fullDocument missing case since.

OpType string `bson:"operationType"`
Ns *Namespace `bson:"ns,omitempty"`
DocKey DocKey `bson:"documentKey,omitempty"`
DocSize *int `bson:"fullDocument,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BSON tag should probably be documentSize.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! Good catch.

bson.D{{"$type", "$fullDocument"}},
}},
}}, // fullDocument exists
{"then", bson.D{{"$bsonSize", "$fullDocument"}}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$bsonSize is added in v.4.4. Can we fall back to the old slower logic for v4.2 and below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking this. I checked as far back as 5.0 but not earlier.

That makes me think we should just do fullDocument. I don’t think this is where any bottlenecks are anyway.

Thoughts?

{"$cond", bson.D{
{"if", bson.D{
{"$ne", bson.A{
"missing",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the fullDocument missing case? $bsonSize returns 0 on null value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should avoid “magic values” when possible. I realize that 0 often serves in that role, but if we can avoid that I think it’s a small code-quality win.

@FGasper FGasper requested a review from tdq45gj November 15, 2024 14:18
Copy link
Collaborator

@mtrussotto mtrussotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateLookup is fairly expensive (not the network cost, but the query), but I can believe the savings from better partitioning outweigh that.

@FGasper
Copy link
Collaborator Author

FGasper commented Nov 15, 2024

@mtrussotto I agree re the savings. Moreover, mongosync’s embedded verifier uses updateLookup in its change stream, so this is at least an expense that we’ve separately determined is tolerable.

Copy link
Collaborator

@tdq45gj tdq45gj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I was thinking that we don't need to make the migration-verifier run as fast as on supported versions. This could be simply setting setting the document size field to a large constant for 4.2. But I'll leave it up to you.

@FGasper
Copy link
Collaborator Author

FGasper commented Nov 15, 2024

@tdq45gj Customers who are on 4.2 are likely still there because of the onerousness of upgrading, which suggests that their data sets could be large. So it’s probably better not to retain “pessimism” for 4.2 IMO.

@FGasper FGasper merged commit ab0ed50 into mongodb-labs:main Nov 15, 2024
5 checks passed
@FGasper FGasper deleted the REP-5283-sensible-change-event-doc-sizes branch November 15, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants