Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,51 @@
*
* <h3>Creating aggregators for your function</h3>
* <p>
* Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc.
* Aggregators contain the core logic of how to combine values, what to store, how to process data, etc.
* Currently, we rely on code generation (per aggregation per type) in order to implement such functionality.
* This approach was picked for performance reasons (namely to avoid virtual method calls and boxing types).
* As a result we could not rely on interfaces implementation and generics.
* </p>
* <p>
* In order to implement aggregation logic create your class (typically named "${FunctionName}${Type}Aggregator").
* Annotate it with {@link org.elasticsearch.compute.ann.Aggregator} and {@link org.elasticsearch.compute.ann.GroupingAggregator}
* The first one is responsible for an entire data set aggregation, while the second one is responsible for grouping within buckets.
* </p>
* <h4>Before you start implementing it, please note that:</h4>
* <ul>
* <li>All methods must be public static</li>
* <li>
* init, initSingle, initGrouping could declare optional BigArrays, DriverContext arguments that are going to be injected automatically.
* It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier.
* </li>
* <li>
* combine, combineStates, combineIntermediate, evaluateFinal methods (see below) could be omitted and generated automatically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I think we usually format mentions of methods with {@code ...}. (Also applies to this javadoc in general.)

* when both input type I and mutable accumulator state SS and GS are primitive (DOUBLE, INT).
* </li>
* <li>
* </li>
* <li>TBD explain {@code IntermediateState}</li>
* <li>TBD explain special internal state `seen`</li>
* </ul>
* <h4>Aggregation expects:</h4>
* <ul>
* <li>type SS (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.AggregatorState}</li>
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
* <li>{@code SS init()} or {@code SS initSingle()} returns empty initialized aggregation state</li>
* <li>{@code void combine(SS state, I input)} or {@code SS combine(SS state, I input)} adds input entry to the aggregation state</li>
* <li>{@code void combineIntermediate(SS state, intermediate states)} adds serialized aggregation state to the current aggregation state (used to combine results across different nodes)</li>
* <li>{@code Block evaluateFinal(SS state, DriverContext)} converts the inner state of the aggregation to the result column</li>
* </ul>
* <h4>Grouping aggregation expects:</h4>
* <ul>
* <li>type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState}</li>
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
* <li>{@code GS init()} or {@code GS initGrouping()} returns empty initialized grouping aggregation state</li>
* <li>{@code void combine(GS state, int groupId, T input)} adds input entry to the corresponding group (bucket) of the grouping aggregation state</li>
* <li>{@code void combineStates(GS targetState, int targetGroupId, GS otherState, int otherGroupId)} merges other grouped aggregation state into the first one</li>
* <li>{@code void combineIntermediate(GS current, int groupId, intermediate states)} adds serialized aggregation state to the current grouped aggregation state (used to combine results across different nodes)</li>
* <li>{@code Block evaluateFinal(GS state, IntVectorSelected, DriverContext)} converts the inner state of the grouping aggregation to the result column</li>
* </ul>
* <ol>
* <li>
* Copy an existing aggregator to use as a base. You'll usually make one per type. Check other classes to see the naming pattern.
Expand Down