-
Notifications
You must be signed in to change notification settings - Fork 1.7k
perf: improve performance of vectorized_equal_to
for PrimitiveGroupValueBuilder
in multi group by aggregation
#17977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: improve performance of vectorized_equal_to
for PrimitiveGroupValueBuilder
in multi group by aggregation
#17977
Conversation
let iter = izip!( | ||
lhs_rows.iter(), | ||
rhs_rows.iter(), | ||
equal_to_results.iter_mut(), | ||
); | ||
|
||
for (&lhs_row, &rhs_row, equal_to_result) in iter { | ||
// Has found not equal to in previous column, don't need to check | ||
if !*equal_to_result { | ||
continue; | ||
} | ||
|
||
// Perf: skip null check (by short circuit) if input is not nullable | ||
let exist_null = self.nulls.is_null(lhs_row); | ||
let input_null = array.is_null(rhs_row); | ||
if let Some(result) = nulls_equal_to(exist_null, input_null) { | ||
*equal_to_result = result; | ||
continue; | ||
} | ||
|
||
// Otherwise, we need to check their values | ||
*equal_to_result = self.group_values[lhs_row].is_eq(array.value(rhs_row)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the code from vectorized_equal_to
and removed the if NULLABLE
as we will always get here if nullable
@alamb can you please run
|
self.group_values[lhs_row] | ||
} else { | ||
// SAFETY: indices are guaranteed to be in bounds | ||
unsafe { *self.group_values.get_unchecked(lhs_row) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As lhs_row
is not checked here te be in bounds, this method would need to be marked unsafe as well.
Which issue does this PR close?
N/A
Rationale for this change
Making multi column aggregation even faster
What changes are included in this PR?
In
PrimitiveGroupValueBuilder.vectorized_equal_to
always evaluate and use unchecked as both of these changes are what making the code compile to SIMD.Are these changes tested?
Existing tests
Are there any user-facing changes?
Nope
I tried a LOT of variations GodBolt
from splitting to fixed size chunks and trying to get auto-vectorization to use gather and creating bitmask to even testing portable SIMD (just to see what it will generate).
this version only optimize the non null path for the moment as it is the easiest.
once and if we change from
&mut [bool]
to mutable packed bits we could:64
items (I tried different variations to see what is the best - you can tweak in the godbolt above with different type and size to check for yourself), 64 is not necessarily the best but it will be the fastest I think for doing AND with theequal_to_results
boolean bufferequal_to_results
equal to0x00
(i.e. all false)