You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
execbuilder: add "average lookup ratio" parallelization heuristic for lookup joins
The DistSender API forces its users to make a choice between cross-range
parallelism (which is needed for performance) and setting memory limits
(which is needed for stability). Streamer was introduced to address this
limitation, but it comes with some requirements, one of which is that it
needs to have access to LeafTxns. However, mutation statements must run
in the RootTxn, so we never use the Streamer for those and fall back to
using the DistSender API directly (via `txnKVFetcher`). There, we
currently have the following heuristic:
- if we know that each input row results in at most one lookup row, then
we consider such a lookup to be "safe" parallelization, so we disable
usage of memory limits on the BatchHeader. This is the case for index
joins (when we always expect to get exactly one looked up row) as well
as lookup joins that have "equality columns are key" property (when we
expect at most one looked up row).
- otherwise, if we have a multi-key lookup join, we use the default
fetcher memory limits (TargetBytes of 10MiB), which disables cross-range
parallelism.
Most commonly this will affect mutation statements and will have a more
pronouanced effect on the multi-region tables, so this commit extends the
heuristic for when we consider it to be "safe" for parallelization.
Namely, we now calculate the average lookup ratio based on the lookup
equality columns and the available table / column statistics, and if the
ratio doesn't exceed the allowed limit, then we'll enable the
parallelism. To a certain degree, this heuristic resembles the "equality
columns are key" heuristic that we already utilize, but instead of
a guaranteed maximum on the lookup ratio we use the estimated average.
What we're trying to prevent with the existing and the new heuristics is
the case when we construct such a KV batch that the KV response will
overwhelm (read "will OOM") the node issuing the KV batch. In the
existing heuristic we say that "if lookup ratio is guaranteed to not
exceed one, then it should be safe".
I believe that the new heuristic should be safe in practice for most
deployments due to the following reasons:
- we already have an implicit limiting behavior in the join reader due
to its execution model (it first buffers some number of rows, up to
2MiB in size when not using the streamer, deduplicates the lookup spans,
and performs the lookup of all those spans in a single KV batch).
Empirical testing shows that we expect to have at most 25k lookups in
that single KV batch.
- this will have impact only when the streamer is not used, which most
commonly will mean we're executing a mutation, and in our docs we
advocate for not performing large mutations. (I'm stretching things
a bit here since even if we modify small amount of data, to compute that
we might read a lot, which could be destabilizing if we disable KV
limits. Yet a similar argument could be made that our current
"equality columns are key" heuristic is not safe - it's possible to
construct a scenario where we look up large amounts of data.)
In order to prevent this new heuristic from exploding in some edge
cases, two guardrails are added:
- in order to handle a scenario where the lookup ratio is not evenly
distributed (i.e. different input rows can result in vastly different
number of looked up rows), we'll disable the heuristic if the max lookup
ratio exceeds the allowed limit.
- in order to handle a scenario where looked rows are very large, we'll
disable the heuristic if the estimated average lookup row size exceeds
the allowed limit. (Note that we don't have this kind of protection in
the existing heuristics.)
I plan to do some more empirical runs to fine-tune the default values of
the newly added session variables, but the current defaults are:
- `parallelize_multi_key_lookup_joins_avg_lookup_ratio = 10`
- `parallelize_multi_key_lookup_joins_max_lookup_ratio = 10000`
- `parallelize_multi_key_lookup_joins_avg_lookup_row_size = 100 KiB`.
In order to de-risk rollout of this feature, we will initially apply the
new heuristic only to mutations of multi-region tables. New session
variable `parallelize_multi_key_lookup_joins_only_on_mr_mutations` can
be set to `false` to apply the heuristic to all statements, regardless
of the table being multi-region.
Release note (performance improvement): Mutation statements (UPDATEs and
DELETEs) that perform lookup joins into multi-region tables (perhaps as
part of a CASCADE) are now more likely to parallelize the lookups across
ranges which improves their performance.
0 commit comments