Skip to content

Conversation

rluvaton
Copy link
Member

@rluvaton rluvaton commented Oct 8, 2025

Which issue does this PR close?

N/A

Rationale for this change

Making multi column aggregation even faster

What changes are included in this PR?

In PrimitiveGroupValueBuilder.vectorized_equal_to always evaluate and use unchecked as both of these changes are what making the code compile to SIMD.

Are these changes tested?

Existing tests

Are there any user-facing changes?

Nope


I tried a LOT of variations GodBolt
from splitting to fixed size chunks and trying to get auto-vectorization to use gather and creating bitmask to even testing portable SIMD (just to see what it will generate).

this version only optimize the non null path for the moment as it is the easiest.

once and if we change from &mut [bool] to mutable packed bits we could:

  1. evaluate in chunks of 64 items (I tried different variations to see what is the best - you can tweak in the godbolt above with different type and size to check for yourself), 64 is not necessarily the best but it will be the fastest I think for doing AND with the equal_to_results boolean buffer
  2. add optimization for nullable as well by just doing bitwise operation at 64 items at a time and avoid the cost of getting each bit manually
  3. skip 64 items right away if the the equal_to_results equal to 0x00 (i.e. all false)

@rluvaton rluvaton added the performance Make DataFusion faster label Oct 8, 2025
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Oct 8, 2025
Comment on lines +112 to +135
let iter = izip!(
lhs_rows.iter(),
rhs_rows.iter(),
equal_to_results.iter_mut(),
);

for (&lhs_row, &rhs_row, equal_to_result) in iter {
// Has found not equal to in previous column, don't need to check
if !*equal_to_result {
continue;
}

// Perf: skip null check (by short circuit) if input is not nullable
let exist_null = self.nulls.is_null(lhs_row);
let input_null = array.is_null(rhs_row);
if let Some(result) = nulls_equal_to(exist_null, input_null) {
*equal_to_result = result;
continue;
}

// Otherwise, we need to check their values
*equal_to_result = self.group_values[lhs_row].is_eq(array.value(rhs_row));
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the code from vectorized_equal_to and removed the if NULLABLE as we will always get here if nullable

@rluvaton
Copy link
Member Author

rluvaton commented Oct 8, 2025

@alamb can you please run aggregate_vectorized benchmark with these changes?

fn bench_vectorized_append(c: &mut Criterion) {

self.group_values[lhs_row]
} else {
// SAFETY: indices are guaranteed to be in bounds
unsafe { *self.group_values.get_unchecked(lhs_row) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As lhs_row is not checked here te be in bounds, this method would need to be marked unsafe as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants