-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Document aggregation code generation #121644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
b3a89d7
f0e3938
e9087c5
4ec362f
10722e1
71fa03b
52b7e2e
587c800
c19125f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -105,8 +105,74 @@ | |
* | ||
* <h3>Creating aggregators for your function</h3> | ||
* <p> | ||
* Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc. | ||
* Aggregators contain the core logic of how to combine values, what to store, how to process data, etc. | ||
* Currently, we rely on code generation (per aggregation per type) in order to implement such functionality. | ||
* This approach was picked for performance reasons (namely to avoid virtual method calls and boxing types). | ||
* As a result we could not rely on interfaces implementation and generics. | ||
* </p> | ||
* <p> | ||
* In order to implement aggregation logic create your class (typically named "${FunctionName}${Type}Aggregator"). | ||
ivancea marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* It must be placed in `org.elasticsearch.compute.aggregation` in order to be picked up by code generation. | ||
* Annotate it with {@link org.elasticsearch.compute.ann.Aggregator} and {@link org.elasticsearch.compute.ann.GroupingAggregator} | ||
* The first one is responsible for an entire data set aggregation, while the second one is responsible for grouping within buckets. | ||
* </p> | ||
* <h4>Before you start implementing it, please note that:</h4> | ||
* <ul> | ||
* <li>All methods must be public static</li> | ||
* <li> | ||
* init/initSingle/initGrouping could have optional BigArrays, DriverContext arguments that are going to be injected automatically. | ||
|
||
* It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier. | ||
* </li> | ||
* <li> | ||
* combine, combineStates, combineIntermediate, evaluateFinal methods (see below) could be omitted and generated automatically | ||
|
||
* when both input type I and mutable accumulator state SS and GS are primitive (DOUBLE, INT). | ||
* </li> | ||
* <li> | ||
* Code generation expects at least one IntermediateState field that is going to be used to keep | ||
* the serialized state of the aggregation (eg AggregatorState and GroupingAggregatorState). | ||
* It must be defined even if you rely on autogenerated implementation for the primitive types. | ||
* </li> | ||
* </ul> | ||
* <h4>Aggregation expects:</h4> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The text here and below is very detailed and makes mentions of specific methods, parameters etc. This seems to have overlap with the javadoc of the I think it's better to say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was treating it as an entry point into aggregation function manual. |
||
* <ul> | ||
* <li> | ||
* type SS (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements | ||
|
||
* {@link org.elasticsearch.compute.aggregation.AggregatorState} | ||
* </li> | ||
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li> | ||
* <li>{@code SS init()} or {@code SS initSingle()} returns empty initialized aggregation state</li> | ||
* <li>{@code void combine(SS state, I input)} or {@code SS combine(SS state, I input)} adds input entry to the aggregation state</li> | ||
* <li> | ||
* {@code void combineIntermediate(SS state, intermediate states)} adds serialized aggregation state | ||
* to the current aggregation state (used to combine results across different nodes) | ||
* </li> | ||
* <li>{@code Block evaluateFinal(SS state, DriverContext)} converts the inner state of the aggregation to the result column</li> | ||
* </ul> | ||
* <h4>Grouping aggregation expects:</h4> | ||
* <ul> | ||
* <li> | ||
* type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements | ||
|
||
* {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState} | ||
* </li> | ||
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li> | ||
* <li>{@code GS init()} or {@code GS initGrouping()} returns empty initialized grouping aggregation state</li> | ||
* <li> | ||
* {@code void combine(GS state, int groupId, I input)} adds input entry to the corresponding group (bucket) | ||
* of the grouping aggregation state | ||
* </li> | ||
* <li> | ||
* {@code void combineStates(GS targetState, int targetGroupId, GS otherState, int otherGroupId)} merges other grouped | ||
* aggregation state into the first one | ||
* </li> | ||
* <li> | ||
* {@code void combineIntermediate(GS current, int groupId, intermediate states)} adds serialized aggregation state | ||
* to the current grouped aggregation state (used to combine results across different nodes) | ||
* </li> | ||
* <li> | ||
* {@code Block evaluateFinal(GS state, IntVectorSelected, DriverContext)} converts the inner state | ||
* of the grouping aggregation to the result column | ||
* </li> | ||
* </ul> | ||
* <ol> | ||
* <li> | ||
* Copy an existing aggregator to use as a base. You'll usually make one per type. Check other classes to see the naming pattern. | ||
|
@@ -117,31 +183,8 @@ | |
* </p> | ||
* </li> | ||
* <li> | ||
* The methods in the aggregator will define how it will work: | ||
* <ul> | ||
* <li> | ||
* Adding the `type init()` method will autogenerate the code to manage the state, using your returned value | ||
* as the initial value for each group. | ||
* </li> | ||
* <li> | ||
* Adding the `type initSingle()` or `type initGrouping()` methods will use the state object you return there instead. | ||
* <p> | ||
* You will also have to provide `evaluateIntermediate()` and `evaluateFinal()` methods this way. | ||
* </p> | ||
* </li> | ||
* </ul> | ||
* Depending on the way you use, adapt your `combine*()` methods to receive one or other type as their first parameters. | ||
* </li> | ||
* <li> | ||
* If it's also a {@link org.elasticsearch.compute.ann.GroupingAggregator}, you should provide the same methods as commented before: | ||
* <ul> | ||
* <li> | ||
* Add an `initGrouping()`, unless you're using the `init()` method | ||
* </li> | ||
* <li> | ||
* Add all the other methods, with the state parameter of the type of your `initGrouping()`. | ||
* </li> | ||
* </ul> | ||
* Implement (or create an empty) methods according to the above list. | ||
* Also check {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc as it contains generated method usage. | ||
* </li> | ||
* <li> | ||
* Make a test for your aggregator. | ||
|
@@ -152,16 +195,8 @@ | |
* </p> | ||
* </li> | ||
* <li> | ||
* Check the Javadoc of the {@link org.elasticsearch.compute.ann.Aggregator} | ||
* and {@link org.elasticsearch.compute.ann.GroupingAggregator} annotations. | ||
* Add/Modify them on your aggregator. | ||
* </li> | ||
* <li> | ||
* The {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc explains the static methods you should add. | ||
* </li> | ||
* <li> | ||
* After implementing the required methods (Even if they have a dummy implementation), | ||
* run the CsvTests to generate some extra required classes. | ||
* Code generation is triggered when running the tests. | ||
* Run the CsvTests to generate the code. Generated code should include: | ||
* <p> | ||
* One of them will be the {@code AggregatorFunctionSupplier} for your aggregator. | ||
* Find it by its name ({@code <Aggregation-name><Type>AggregatorFunctionSupplier}), | ||
|
Uh oh!
There was an error while loading. Please reload this page.