-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Document aggregation code generation #121644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
idegtiarenko
merged 9 commits into
elastic:main
from
idegtiarenko:document_agg_code_generation
Feb 11, 2025
Merged
Changes from 2 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
b3a89d7
Document aggregation code generation
idegtiarenko f0e3938
update
idegtiarenko e9087c5
update docs
idegtiarenko 4ec362f
Merge branch 'main' into document_agg_code_generation
idegtiarenko 10722e1
Merge branch 'main' into document_agg_code_generation
idegtiarenko 71fa03b
Merge branch 'main' into document_agg_code_generation
idegtiarenko 52b7e2e
update
idegtiarenko 587c800
update
idegtiarenko c19125f
Merge branch 'main' into document_agg_code_generation
idegtiarenko File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -105,8 +105,51 @@ | |
* | ||
* <h3>Creating aggregators for your function</h3> | ||
* <p> | ||
* Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc. | ||
* Aggregators contain the core logic of how to combine values, what to store, how to process data, etc. | ||
* Currently, we rely on code generation (per aggregation per type) in order to implement such functionality. | ||
* This approach was picked for performance reasons (namely to avoid virtual method calls and boxing types). | ||
* As a result we could not rely on interfaces implementation and generics. | ||
* </p> | ||
* <p> | ||
* In order to implement aggregation logic create your class (typically named "${FunctionName}${Type}Aggregator"). | ||
ivancea marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Annotate it with {@link org.elasticsearch.compute.ann.Aggregator} and {@link org.elasticsearch.compute.ann.GroupingAggregator} | ||
* The first one is responsible for an entire data set aggregation, while the second one is responsible for grouping within buckets. | ||
* </p> | ||
* <h4>Before you start implementing it, please note that:</h4> | ||
* <ul> | ||
* <li>All methods must be public static</li> | ||
* <li> | ||
* init, initSingle, initGrouping could declare optional BigArrays, DriverContext arguments that are going to be injected automatically. | ||
* It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier. | ||
* </li> | ||
* <li> | ||
* combine, combineStates, combineIntermediate, evaluateFinal methods (see below) could be omitted and generated automatically | ||
|
||
* when both input type I and mutable accumulator state SS and GS are primitive (DOUBLE, INT). | ||
* </li> | ||
* <li> | ||
* </li> | ||
* <li>TBD explain {@code IntermediateState}</li> | ||
* <li>TBD explain special internal state `seen`</li> | ||
* </ul> | ||
* <h4>Aggregation expects:</h4> | ||
* <ul> | ||
* <li>type SS (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.AggregatorState}</li> | ||
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li> | ||
* <li>{@code SS init()} or {@code SS initSingle()} returns empty initialized aggregation state</li> | ||
* <li>{@code void combine(SS state, I input)} or {@code SS combine(SS state, I input)} adds input entry to the aggregation state</li> | ||
* <li>{@code void combineIntermediate(SS state, intermediate states)} adds serialized aggregation state to the current aggregation state (used to combine results across different nodes)</li> | ||
* <li>{@code Block evaluateFinal(SS state, DriverContext)} converts the inner state of the aggregation to the result column</li> | ||
* </ul> | ||
* <h4>Grouping aggregation expects:</h4> | ||
* <ul> | ||
* <li>type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState}</li> | ||
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li> | ||
* <li>{@code GS init()} or {@code GS initGrouping()} returns empty initialized grouping aggregation state</li> | ||
* <li>{@code void combine(GS state, int groupId, T input)} adds input entry to the corresponding group (bucket) of the grouping aggregation state</li> | ||
* <li>{@code void combineStates(GS targetState, int targetGroupId, GS otherState, int otherGroupId)} merges other grouped aggregation state into the first one</li> | ||
* <li>{@code void combineIntermediate(GS current, int groupId, intermediate states)} adds serialized aggregation state to the current grouped aggregation state (used to combine results across different nodes)</li> | ||
* <li>{@code Block evaluateFinal(GS state, IntVectorSelected, DriverContext)} converts the inner state of the grouping aggregation to the result column</li> | ||
* </ul> | ||
* <ol> | ||
* <li> | ||
* Copy an existing aggregator to use as a base. You'll usually make one per type. Check other classes to see the naming pattern. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.