update docs

idegtiarenko · idegtiarenko · commit e9087c517a45 · 2025-02-06T15:44:05.000+01:00
diff --git a/x-pack/plugin/esql/compute/ann/src/main/java/org/elasticsearch/compute/ann/Aggregator.java b/x-pack/plugin/esql/compute/ann/src/main/java/org/elasticsearch/compute/ann/Aggregator.java
@@ -37,18 +37,10 @@
  *     are ever collected.
  * </p>
  * <p>
- *     The generation code will also look for a method called {@code combineValueCount}
- *     which is called once per received block with a count of values. NOTE: We may
- *     not need this after we convert AVG into a composite operation.
- * </p>
- * <p>
  *     The generation code also looks for the optional methods {@code combineIntermediate}
  *     and {@code evaluateFinal} which are used to combine intermediate states and
- *     produce the final output. If the first is missing then the generated code will
- *     call the {@code combine} method to combine intermediate states. If the second
- *     is missing the generated code will make a block containing the primitive from
- *     the state. If either of those don't have sensible interpretations then the code
- *     generation code will throw an error, aborting the compilation.
+ *     produce the final output. Please note, those are auto-generated when aggregating
+ *     primitive types such as boolean, int, long, float, double.
  * </p>
  */
 @Target(ElementType.TYPE)
diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/package-info.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/package-info.java
@@ -112,43 +112,66 @@
  * </p>
  * <p>
  *     In order to implement aggregation logic create your class (typically named "${FunctionName}${Type}Aggregator").
+ *     It must be placed in `org.elasticsearch.compute.aggregation` in order to be picked up by code generation.
  *     Annotate it with {@link org.elasticsearch.compute.ann.Aggregator} and {@link org.elasticsearch.compute.ann.GroupingAggregator}
  *     The first one is responsible for an entire data set aggregation, while the second one is responsible for grouping within buckets.
  * </p>
  * <h4>Before you start implementing it, please note that:</h4>
  * <ul>
  *     <li>All methods must be public static</li>
  *     <li>
- *         init, initSingle, initGrouping could declare optional BigArrays, DriverContext arguments that are going to be injected automatically.
+ *         init/initSingle/initGrouping could have optional BigArrays, DriverContext arguments that are going to be injected automatically.
  *         It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier.
  *     </li>
  *     <li>
  *         combine, combineStates, combineIntermediate, evaluateFinal methods (see below) could be omitted and generated automatically
  *         when both input type I and mutable accumulator state SS and GS are primitive (DOUBLE, INT).
  *     </li>
  *     <li>
+ *         Code generation expects at least one IntermediateState field that is going to be used to keep
+ *         the serialized state of the aggregation (eg AggregatorState and GroupingAggregatorState).
+ *         It must be defined even if you rely on autogenerated implementation for the primitive types.
  *     </li>
- *     <li>TBD explain {@code IntermediateState}</li>
- *     <li>TBD explain special internal state `seen`</li>
  * </ul>
  * <h4>Aggregation expects:</h4>
  * <ul>
- *     <li>type SS (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.AggregatorState}</li>
+ *     <li>
+ *         type SS (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements
+ *         {@link org.elasticsearch.compute.aggregation.AggregatorState}
+ *     </li>
  *     <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
  *     <li>{@code SS init()} or {@code SS initSingle()} returns empty initialized aggregation state</li>
  *     <li>{@code void combine(SS state, I input)} or {@code SS combine(SS state, I input)} adds input entry to the aggregation state</li>
- *     <li>{@code void combineIntermediate(SS state, intermediate states)} adds serialized aggregation state to the current aggregation state (used to combine results across different nodes)</li>
+ *     <li>
+ *         {@code void combineIntermediate(SS state, intermediate states)} adds serialized aggregation state
+ *         to the current aggregation state (used to combine results across different nodes)
+ *     </li>
  *     <li>{@code Block evaluateFinal(SS state, DriverContext)} converts the inner state of the aggregation to the result column</li>
  * </ul>
  * <h4>Grouping aggregation expects:</h4>
  * <ul>
- *     <li>type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState}</li>
+ *     <li>
+ *         type GS (a mutable state used to accumulate result of the grouping aggregation) to be public, not inner and implements
+ *         {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState}
+ *     </li>
  *     <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li>
  *     <li>{@code GS init()} or {@code GS initGrouping()} returns empty initialized grouping aggregation state</li>
- *     <li>{@code void combine(GS state, int groupId, T input)} adds input entry to the corresponding group (bucket) of the grouping aggregation state</li>
- *     <li>{@code void combineStates(GS targetState, int targetGroupId, GS otherState, int otherGroupId)} merges other grouped aggregation state into the first one</li>
- *     <li>{@code void combineIntermediate(GS current, int groupId, intermediate states)} adds serialized aggregation state to the current grouped aggregation state (used to combine results across different nodes)</li>
- *     <li>{@code Block evaluateFinal(GS state, IntVectorSelected, DriverContext)} converts the inner state of the grouping aggregation to the result column</li>
+ *     <li>
+ *         {@code void combine(GS state, int groupId, I input)} adds input entry to the corresponding group (bucket)
+ *         of the grouping aggregation state
+ *     </li>
+ *     <li>
+ *         {@code void combineStates(GS targetState, int targetGroupId, GS otherState, int otherGroupId)} merges other grouped
+ *         aggregation state into the first one
+ *     </li>
+ *     <li>
+ *         {@code void combineIntermediate(GS current, int groupId, intermediate states)} adds serialized aggregation state
+ *         to the current grouped aggregation state (used to combine results across different nodes)
+ *     </li>
+ *     <li>
+ *         {@code Block evaluateFinal(GS state, IntVectorSelected, DriverContext)} converts the inner state
+ *         of the grouping aggregation to the result column
+ *     </li>
  * </ul>
  * <ol>
  *     <li>
@@ -160,31 +183,8 @@
  *         </p>
  *     </li>
  *     <li>
- *         The methods in the aggregator will define how it will work:
- *         <ul>
- *             <li>
- *                 Adding the `type init()` method will autogenerate the code to manage the state, using your returned value
- *                 as the initial value for each group.
- *             </li>
- *             <li>
- *                 Adding the `type initSingle()` or `type initGrouping()` methods will use the state object you return there instead.
- *                 <p>
- *                     You will also have to provide `evaluateIntermediate()` and `evaluateFinal()` methods this way.
- *                 </p>
- *             </li>
- *         </ul>
- *         Depending on the way you use, adapt your `combine*()` methods to receive one or other type as their first parameters.
- *     </li>
- *     <li>
- *         If it's also a {@link org.elasticsearch.compute.ann.GroupingAggregator}, you should provide the same methods as commented before:
- *         <ul>
- *             <li>
- *                 Add an `initGrouping()`, unless you're using the `init()` method
- *             </li>
- *             <li>
- *                 Add all the other methods, with the state parameter of the type of your `initGrouping()`.
- *             </li>
- *         </ul>
+ *         Implement (or create an empty) methods according to the above list.
+ *         Also check {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc as it contains generated method usage.
  *     </li>
  *     <li>
  *         Make a test for your aggregator.
@@ -195,16 +195,8 @@
  *         </p>
  *     </li>
  *     <li>
- *         Check the Javadoc of the {@link org.elasticsearch.compute.ann.Aggregator}
- *         and {@link org.elasticsearch.compute.ann.GroupingAggregator} annotations.
- *         Add/Modify them on your aggregator.
- *     </li>
- *     <li>
- *         The {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc explains the static methods you should add.
- *     </li>
- *     <li>
- *         After implementing the required methods (Even if they have a dummy implementation),
- *         run the CsvTests to generate some extra required classes.
+ *         Code generation is triggered when running the tests.
+ *         Run the CsvTests to generate the code. Generated code should include:
  *         <p>
  *             One of them will be the {@code AggregatorFunctionSupplier} for your aggregator.
  *             Find it by its name ({@code <Aggregation-name><Type>AggregatorFunctionSupplier}),