Skip to content

Commit 58277e7

Browse files
authored
Document aggregation code generation (#121644)
1 parent a36b327 commit 58277e7

File tree

2 files changed

+84
-46
lines changed
  • x-pack/plugin/esql

2 files changed

+84
-46
lines changed

x-pack/plugin/esql/compute/ann/src/main/java/org/elasticsearch/compute/ann/Aggregator.java

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,8 @@
3939
* <p>
4040
* The generation code also looks for the optional methods {@code combineIntermediate}
4141
* and {@code evaluateFinal} which are used to combine intermediate states and
42-
* produce the final output. If the first is missing then the generated code will
43-
* call the {@code combine} method to combine intermediate states. If the second
44-
* is missing the generated code will make a block containing the primitive from
45-
* the state. If either of those don't have sensible interpretations then the code
46-
* generation code will throw an error, aborting the compilation.
42+
* produce the final output. Please note, those are auto-generated when aggregating
43+
* primitive types such as boolean, int, long, float, double.
4744
* </p>
4845
*/
4946
@Target(ElementType.TYPE)

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/package-info.java

Lines changed: 82 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,6 @@
8585
* To introduce your aggregation to the engine:
8686
* <ul>
8787
* <li>
88-
* Add it to {@code org.elasticsearch.xpack.esql.planner.AggregateMapper}.
89-
* Check all usages of other aggregations there, and replicate the logic.
90-
* </li>
91-
* <li>
9288
* Implement serialization for your aggregation by implementing
9389
* {@link org.elasticsearch.common.io.stream.NamedWriteable#getWriteableName},
9490
* {@link org.elasticsearch.common.io.stream.NamedWriteable#writeTo},
@@ -97,16 +93,92 @@
9793
* {@link org.elasticsearch.xpack.esql.expression.function.aggregate.AggregateWritables#getNamedWriteables}.
9894
* </li>
9995
* <li>
100-
* Do the same with {@link org.elasticsearch.xpack.esql.expression.function.EsqlFunctionRegistry}.
96+
* Add it to {@link org.elasticsearch.xpack.esql.expression.function.EsqlFunctionRegistry}.
10197
* </li>
10298
* </ul>
10399
* </li>
104100
* </ol>
105101
*
106102
* <h3>Creating aggregators for your function</h3>
107103
* <p>
108-
* Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc.
104+
* Aggregators contain the core logic of how to combine values, what to store, how to process data, etc.
105+
* Currently, we rely on code generation (per aggregation per type) in order to implement such functionality.
106+
* This approach was picked for performance reasons (namely to avoid virtual method calls and boxing types).
107+
* As a result we could not rely on interfaces implementation and generics.
108+
* </p>
109+
* <p>
110+
* In order to implement aggregation logic create your class (typically named "${FunctionName}${Type}Aggregator").
111+
* It must be placed in `org.elasticsearch.compute.aggregation` in order to be picked up by code generation.
112+
* Annotate it with {@link org.elasticsearch.compute.ann.Aggregator} and {@link org.elasticsearch.compute.ann.GroupingAggregator}
113+
* The first one is responsible for an entire data set aggregation, while the second one is responsible for grouping within buckets.
109114
* </p>
115+
* <h4>Before you start implementing it, please note that:</h4>
116+
* <ul>
117+
* <li>All methods must be public static</li>
118+
* <li>
119+
* {@code init/initSingle/initGrouping} could have optional {@link org.elasticsearch.common.util.BigArrays} or
120+
* {@link org.elasticsearch.compute.operator.DriverContext} arguments that are going to be injected automatically.
121+
* It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier.
122+
* </li>
123+
* <li>
124+
* {@code combine, combineStates, combineIntermediate, evaluateFinal} methods (see below) could be generated automatically
125+
* when both input type I and mutable accumulator state AggregatorState and GroupingAggregatorState are primitive (DOUBLE, INT).
126+
* </li>
127+
* <li>
128+
* Code generation expects at least one IntermediateState field that is going to be used to keep
129+
* the serialized state of the aggregation (eg AggregatorState and GroupingAggregatorState).
130+
* It must be defined even if you rely on autogenerated implementation for the primitive types.
131+
* </li>
132+
* </ul>
133+
* <h4>Aggregation expects:</h4>
134+
* <ul>
135+
* <li>
136+
* type AggregatorState (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements
137+
* {@link org.elasticsearch.compute.aggregation.AggregatorState}
138+
* </li>
139+
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
140+
* <li>{@code AggregatorState init()} or {@code AggregatorState initSingle()} returns empty initialized aggregation state</li>
141+
* <li>
142+
* {@code void combine(AggregatorState state, I input)} or {@code AggregatorState combine(AggregatorState state, I input)}
143+
* adds input entry to the aggregation state
144+
* </li>
145+
* <li>
146+
* {@code void combineIntermediate(AggregatorState state, intermediate states)} adds serialized aggregation state
147+
* to the current aggregation state (used to combine results across different nodes)
148+
* </li>
149+
* <li>
150+
* {@code Block evaluateFinal(AggregatorState state, DriverContext)} converts the inner state of the aggregation to the result
151+
* column
152+
* </li>
153+
* </ul>
154+
* <h4>Grouping aggregation expects:</h4>
155+
* <ul>
156+
* <li>
157+
* type GroupingAggregatorState (a mutable state used to accumulate result of the grouping aggregation) to be public,
158+
* not inner and implements {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState}
159+
* </li>
160+
* <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
161+
* <li>
162+
* {@code GroupingAggregatorState init()} or {@code GroupingAggregatorState initGrouping()} returns empty initialized grouping
163+
* aggregation state
164+
* </li>
165+
* <li>
166+
* {@code void combine(GroupingAggregatorState state, int groupId, I input)} adds input entry to the corresponding group (bucket)
167+
* of the grouping aggregation state
168+
* </li>
169+
* <li>
170+
* {@code void combineStates(GroupingAggregatorState targetState, int targetGroupId, GS otherState, int otherGroupId)}
171+
* merges other grouped aggregation state into the first one
172+
* </li>
173+
* <li>
174+
* {@code void combineIntermediate(GroupingAggregatorState current, int groupId, intermediate states)} adds serialized
175+
* aggregation state to the current grouped aggregation state (used to combine results across different nodes)
176+
* </li>
177+
* <li>
178+
* {@code Block evaluateFinal(GroupingAggregatorState state, IntVectorSelected, DriverContext)} converts the inner state
179+
* of the grouping aggregation to the result column
180+
* </li>
181+
* </ul>
110182
* <ol>
111183
* <li>
112184
* Copy an existing aggregator to use as a base. You'll usually make one per type. Check other classes to see the naming pattern.
@@ -117,31 +189,8 @@
117189
* </p>
118190
* </li>
119191
* <li>
120-
* The methods in the aggregator will define how it will work:
121-
* <ul>
122-
* <li>
123-
* Adding the `type init()` method will autogenerate the code to manage the state, using your returned value
124-
* as the initial value for each group.
125-
* </li>
126-
* <li>
127-
* Adding the `type initSingle()` or `type initGrouping()` methods will use the state object you return there instead.
128-
* <p>
129-
* You will also have to provide `evaluateIntermediate()` and `evaluateFinal()` methods this way.
130-
* </p>
131-
* </li>
132-
* </ul>
133-
* Depending on the way you use, adapt your `combine*()` methods to receive one or other type as their first parameters.
134-
* </li>
135-
* <li>
136-
* If it's also a {@link org.elasticsearch.compute.ann.GroupingAggregator}, you should provide the same methods as commented before:
137-
* <ul>
138-
* <li>
139-
* Add an `initGrouping()`, unless you're using the `init()` method
140-
* </li>
141-
* <li>
142-
* Add all the other methods, with the state parameter of the type of your `initGrouping()`.
143-
* </li>
144-
* </ul>
192+
* Implement (or create an empty) methods according to the above list.
193+
* Also check {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc as it contains generated method usage.
145194
* </li>
146195
* <li>
147196
* Make a test for your aggregator.
@@ -152,16 +201,8 @@
152201
* </p>
153202
* </li>
154203
* <li>
155-
* Check the Javadoc of the {@link org.elasticsearch.compute.ann.Aggregator}
156-
* and {@link org.elasticsearch.compute.ann.GroupingAggregator} annotations.
157-
* Add/Modify them on your aggregator.
158-
* </li>
159-
* <li>
160-
* The {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc explains the static methods you should add.
161-
* </li>
162-
* <li>
163-
* After implementing the required methods (Even if they have a dummy implementation),
164-
* run the CsvTests to generate some extra required classes.
204+
* Code generation is triggered when running the tests.
205+
* Run the CsvTests to generate the code. Generated code should include:
165206
* <p>
166207
* One of them will be the {@code AggregatorFunctionSupplier} for your aggregator.
167208
* Find it by its name ({@code <Aggregation-name><Type>AggregatorFunctionSupplier}),

0 commit comments

Comments
 (0)