Skip to content

Conversation

@JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Oct 31, 2025

Adds support for the exponential_histogram type in the Evaluator, Aggregator and Grouping Aggregator code generation. MvEvaluator and ConvertEvaluator have not been touched yet, we'll get to those when we actually need them for exponential histograms.

The code generation is changed as follows:

  • The "scratch" concept which previously was only used for BytesRef has been generalized to also work with exponential histograms, as there the getter also requires a scratch
  • Added support for types never having vectors (and not even having a vector type).

In my understanding, ExponentialHistogramBlocks will never have a corresponding Vector , as they don't have a memory layout which directly benefits from e.g. SIMD. Vectorization still works by first "extracting" the sub-blocks, which might be vector-backed.

For example MIN(histogram) will be implemented as a surrogate MIN(HISTOGRAM_MIN(histogram)), where HISTOGRAM_MIN is a function which directly returns the min-subblock, which in turn might be a vector.

To see these code-generation changes in action, please have a look at my PoC PR:

Note that I'm not planning on exposing those two functions to end-users for now. Instead, they will be just used as surrogate for the existing PERCENTILE agg.

@JonasKunz JonasKunz added the :Analytics/ES|QL AKA ESQL label Oct 31, 2025
@elasticsearchmachine elasticsearchmachine added v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 31, 2025
@JonasKunz JonasKunz added >non-issue and removed external-contributor Pull request authored by a developer outside the Elasticsearch team v9.3.0 labels Oct 31, 2025
if (intermediateState.stream().map(IntermediateStateDesc::elementType).anyMatch(n -> n.equals("BYTES_REF"))) {
builder.addStatement("$T scratch = new $T()", BYTES_REF, BYTES_REF);
for (IntermediateStateDesc interState : intermediateState) {
interState.addScratchDeclaration(builder);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was buggy prior to my change, but without causing a defect?

If there were multiple BYTES_REF state-members, they would have shared the same scratch, which seems incorrect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems possible. I imagine we don't have any intermediate states with two strings.

@JonasKunz JonasKunz marked this pull request as ready for review October 31, 2025 14:28
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@JonasKunz JonasKunz requested a review from nik9000 October 31, 2025 14:30
@nik9000
Copy link
Member

nik9000 commented Oct 31, 2025

In my understanding, ExponentialHistogramBlocks will never have a corresponding Vector , as they don't have a memory layout which directly benefits from e.g. SIMD. Vectorization still works by first "extracting" the sub-blocks, which might be vector-backed.

That's fine. Maybe they will one day we'll do it then.


this.hasOnlyBlockArguments = this.aggParams.stream().allMatch(a -> a instanceof BlockArgument);
this.tryToUseVectors = aggParams.stream().anyMatch(a -> (a instanceof BlockArgument) == false)
&& aggParams.stream().noneMatch(a -> a instanceof StandardArgument && a.dataType(false) == null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer making a method in Argument that's, like, hasVector or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 00d6d78.

if (intermediateState.stream().map(IntermediateStateDesc::elementType).anyMatch(n -> n.equals("BYTES_REF"))) {
builder.addStatement("$T scratch = new $T()", BYTES_REF, BYTES_REF);
for (IntermediateStateDesc interState : intermediateState) {
interState.addScratchDeclaration(builder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems possible. I imagine we don't have any intermediate states with two strings.

@JonasKunz JonasKunz requested a review from nik9000 November 3, 2025 09:13
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JonasKunz JonasKunz enabled auto-merge (squash) November 3, 2025 15:15
@JonasKunz JonasKunz merged commit 2047c9a into elastic:main Nov 3, 2025
33 of 34 checks passed
@JonasKunz JonasKunz deleted the exp-histo-codegen-support branch November 3, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants