You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
REP-6492 Switch to $sampleRate-style partitioning when possible (#128)
$sample-based partitioning has proven problematic for some years now because it often creates highly-imbalanced partitions.
This changeset switches partitioning to use $sampleRate instead. Because this entails a full index scan it tends to be slower; we offset that by creating partition tasks immediately as we receive sampled partition boundaries rather than all at once at the end of the aggregation.
Because MongoDB 4.2 lacked $sampleRate (and $rand as well), the legacy partitioning logic remains for use with that server version.
Both legacy & $sampleRate partitioning are made to use `available` read concern and `secondaryPreferred` read preference. These aggregations don’t need consistency, but they benefit substantially from speed & minimizing workload on the primary.
A few simplifications are made here as well. For example, `MongosyncID` is removed from the PartitionKey struct since it’s never actually relevant, and certain parameters to the legacy partitioner are made constant (since they were always used thus). The `util.Divide` function is renamed `util.DivideToF64` to clarify that its return is a float.
0 commit comments