|
85 | 85 | * To introduce your aggregation to the engine: |
86 | 86 | * <ul> |
87 | 87 | * <li> |
88 | | - * Add it to {@code org.elasticsearch.xpack.esql.planner.AggregateMapper}. |
89 | | - * Check all usages of other aggregations there, and replicate the logic. |
90 | | - * </li> |
91 | | - * <li> |
92 | 88 | * Implement serialization for your aggregation by implementing |
93 | 89 | * {@link org.elasticsearch.common.io.stream.NamedWriteable#getWriteableName}, |
94 | 90 | * {@link org.elasticsearch.common.io.stream.NamedWriteable#writeTo}, |
|
97 | 93 | * {@link org.elasticsearch.xpack.esql.expression.function.aggregate.AggregateWritables#getNamedWriteables}. |
98 | 94 | * </li> |
99 | 95 | * <li> |
100 | | - * Do the same with {@link org.elasticsearch.xpack.esql.expression.function.EsqlFunctionRegistry}. |
| 96 | + * Add it to {@link org.elasticsearch.xpack.esql.expression.function.EsqlFunctionRegistry}. |
101 | 97 | * </li> |
102 | 98 | * </ul> |
103 | 99 | * </li> |
104 | 100 | * </ol> |
105 | 101 | * |
106 | 102 | * <h3>Creating aggregators for your function</h3> |
107 | 103 | * <p> |
108 | | - * Aggregators contain the core logic of your aggregation. That is, how to combine values, what to store, how to process data, etc. |
| 104 | + * Aggregators contain the core logic of how to combine values, what to store, how to process data, etc. |
| 105 | + * Currently, we rely on code generation (per aggregation per type) in order to implement such functionality. |
| 106 | + * This approach was picked for performance reasons (namely to avoid virtual method calls and boxing types). |
| 107 | + * As a result we could not rely on interfaces implementation and generics. |
| 108 | + * </p> |
| 109 | + * <p> |
| 110 | + * In order to implement aggregation logic create your class (typically named "${FunctionName}${Type}Aggregator"). |
| 111 | + * It must be placed in `org.elasticsearch.compute.aggregation` in order to be picked up by code generation. |
| 112 | + * Annotate it with {@link org.elasticsearch.compute.ann.Aggregator} and {@link org.elasticsearch.compute.ann.GroupingAggregator} |
| 113 | + * The first one is responsible for an entire data set aggregation, while the second one is responsible for grouping within buckets. |
109 | 114 | * </p> |
| 115 | + * <h4>Before you start implementing it, please note that:</h4> |
| 116 | + * <ul> |
| 117 | + * <li>All methods must be public static</li> |
| 118 | + * <li> |
| 119 | + * {@code init/initSingle/initGrouping} could have optional {@link org.elasticsearch.common.util.BigArrays} or |
| 120 | + * {@link org.elasticsearch.compute.operator.DriverContext} arguments that are going to be injected automatically. |
| 121 | + * It is also possible to declare any number of arbitrary arguments that must be provided via generated Supplier. |
| 122 | + * </li> |
| 123 | + * <li> |
| 124 | + * {@code combine, combineStates, combineIntermediate, evaluateFinal} methods (see below) could be generated automatically |
| 125 | + * when both input type I and mutable accumulator state AggregatorState and GroupingAggregatorState are primitive (DOUBLE, INT). |
| 126 | + * </li> |
| 127 | + * <li> |
| 128 | + * Code generation expects at least one IntermediateState field that is going to be used to keep |
| 129 | + * the serialized state of the aggregation (eg AggregatorState and GroupingAggregatorState). |
| 130 | + * It must be defined even if you rely on autogenerated implementation for the primitive types. |
| 131 | + * </li> |
| 132 | + * </ul> |
| 133 | + * <h4>Aggregation expects:</h4> |
| 134 | + * <ul> |
| 135 | + * <li> |
| 136 | + * type AggregatorState (a mutable state used to accumulate result of the aggregation) to be public, not inner and implements |
| 137 | + * {@link org.elasticsearch.compute.aggregation.AggregatorState} |
| 138 | + * </li> |
| 139 | + * <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li> |
| 140 | + * <li>{@code AggregatorState init()} or {@code AggregatorState initSingle()} returns empty initialized aggregation state</li> |
| 141 | + * <li> |
| 142 | + * {@code void combine(AggregatorState state, I input)} or {@code AggregatorState combine(AggregatorState state, I input)} |
| 143 | + * adds input entry to the aggregation state |
| 144 | + * </li> |
| 145 | + * <li> |
| 146 | + * {@code void combineIntermediate(AggregatorState state, intermediate states)} adds serialized aggregation state |
| 147 | + * to the current aggregation state (used to combine results across different nodes) |
| 148 | + * </li> |
| 149 | + * <li> |
| 150 | + * {@code Block evaluateFinal(AggregatorState state, DriverContext)} converts the inner state of the aggregation to the result |
| 151 | + * column |
| 152 | + * </li> |
| 153 | + * </ul> |
| 154 | + * <h4>Grouping aggregation expects:</h4> |
| 155 | + * <ul> |
| 156 | + * <li> |
| 157 | + * type GroupingAggregatorState (a mutable state used to accumulate result of the grouping aggregation) to be public, |
| 158 | + * not inner and implements {@link org.elasticsearch.compute.aggregation.GroupingAggregatorState} |
| 159 | + * </li> |
| 160 | + * <li>type I (input to your aggregation function), usually primitive types and {@link org.apache.lucene.util.BytesRef}</li> |
| 161 | + * <li> |
| 162 | + * {@code GroupingAggregatorState init()} or {@code GroupingAggregatorState initGrouping()} returns empty initialized grouping |
| 163 | + * aggregation state |
| 164 | + * </li> |
| 165 | + * <li> |
| 166 | + * {@code void combine(GroupingAggregatorState state, int groupId, I input)} adds input entry to the corresponding group (bucket) |
| 167 | + * of the grouping aggregation state |
| 168 | + * </li> |
| 169 | + * <li> |
| 170 | + * {@code void combineStates(GroupingAggregatorState targetState, int targetGroupId, GS otherState, int otherGroupId)} |
| 171 | + * merges other grouped aggregation state into the first one |
| 172 | + * </li> |
| 173 | + * <li> |
| 174 | + * {@code void combineIntermediate(GroupingAggregatorState current, int groupId, intermediate states)} adds serialized |
| 175 | + * aggregation state to the current grouped aggregation state (used to combine results across different nodes) |
| 176 | + * </li> |
| 177 | + * <li> |
| 178 | + * {@code Block evaluateFinal(GroupingAggregatorState state, IntVectorSelected, DriverContext)} converts the inner state |
| 179 | + * of the grouping aggregation to the result column |
| 180 | + * </li> |
| 181 | + * </ul> |
110 | 182 | * <ol> |
111 | 183 | * <li> |
112 | 184 | * Copy an existing aggregator to use as a base. You'll usually make one per type. Check other classes to see the naming pattern. |
|
117 | 189 | * </p> |
118 | 190 | * </li> |
119 | 191 | * <li> |
120 | | - * The methods in the aggregator will define how it will work: |
121 | | - * <ul> |
122 | | - * <li> |
123 | | - * Adding the `type init()` method will autogenerate the code to manage the state, using your returned value |
124 | | - * as the initial value for each group. |
125 | | - * </li> |
126 | | - * <li> |
127 | | - * Adding the `type initSingle()` or `type initGrouping()` methods will use the state object you return there instead. |
128 | | - * <p> |
129 | | - * You will also have to provide `evaluateIntermediate()` and `evaluateFinal()` methods this way. |
130 | | - * </p> |
131 | | - * </li> |
132 | | - * </ul> |
133 | | - * Depending on the way you use, adapt your `combine*()` methods to receive one or other type as their first parameters. |
134 | | - * </li> |
135 | | - * <li> |
136 | | - * If it's also a {@link org.elasticsearch.compute.ann.GroupingAggregator}, you should provide the same methods as commented before: |
137 | | - * <ul> |
138 | | - * <li> |
139 | | - * Add an `initGrouping()`, unless you're using the `init()` method |
140 | | - * </li> |
141 | | - * <li> |
142 | | - * Add all the other methods, with the state parameter of the type of your `initGrouping()`. |
143 | | - * </li> |
144 | | - * </ul> |
| 192 | + * Implement (or create an empty) methods according to the above list. |
| 193 | + * Also check {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc as it contains generated method usage. |
145 | 194 | * </li> |
146 | 195 | * <li> |
147 | 196 | * Make a test for your aggregator. |
|
152 | 201 | * </p> |
153 | 202 | * </li> |
154 | 203 | * <li> |
155 | | - * Check the Javadoc of the {@link org.elasticsearch.compute.ann.Aggregator} |
156 | | - * and {@link org.elasticsearch.compute.ann.GroupingAggregator} annotations. |
157 | | - * Add/Modify them on your aggregator. |
158 | | - * </li> |
159 | | - * <li> |
160 | | - * The {@link org.elasticsearch.compute.ann.Aggregator} JavaDoc explains the static methods you should add. |
161 | | - * </li> |
162 | | - * <li> |
163 | | - * After implementing the required methods (Even if they have a dummy implementation), |
164 | | - * run the CsvTests to generate some extra required classes. |
| 204 | + * Code generation is triggered when running the tests. |
| 205 | + * Run the CsvTests to generate the code. Generated code should include: |
165 | 206 | * <p> |
166 | 207 | * One of them will be the {@code AggregatorFunctionSupplier} for your aggregator. |
167 | 208 | * Find it by its name ({@code <Aggregation-name><Type>AggregatorFunctionSupplier}), |
|
0 commit comments