-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL: Add TBUCKET function #131449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: Add TBUCKET function #131449
Changes from 1 commit
27941a7
fe6dd2b
a5fe0fa
45be0fe
e72467f
5bd6f36
e4a2e04
cc16a42
84717ba
b378d10
466e563
06cd95a
6626682
8cfca3c
ad7fba6
be418fc
b0d5599
f934369
28f0bea
d255687
7216d45
de606d1
a57929e
2eceb51
2fbd13e
0dff032
36063f0
8a90391
a28a651
bfb23c3
117eaac
03f45d9
cbe757b
bc23cdb
014fd83
8bf1c07
91d75a0
86b8d06
83bfd2a
ed92d11
92b02d1
2bf9111
24e8870
2e2ea88
ee5bfc0
3d52fbb
5603fae
e74f6ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
// TBUCKET-specific tests | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unable to write such a test because BUCKET on a numeric field returns a numeric value, which can legally participate in arithmetic in the STATS select list, even if the same BUCKET expression is also used as the grouping key. That is what the test mentioned is testing. In case of TBUCKET, the error occurs because a grouping function that appears in the BY clause is treated as a grouping key and cannot simultaneously be used as an aggregate expression in the same STATS statement unless it’s referenced exactly the same way or via its alias from the BY list. ES|QL treats BUCKET as a grouping function; once it’s in BY, it can’t be re-used as an aggregate expression unless it’s just referenced as the exact same expression or via the alias. That triggers the verification exception about a grouping function being used as an aggregate after being declared in BY. To confirm this I rewrote the test with BUCKET which led to the same verification exception:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you try this query to see if
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying to test it missing the "year" after "1" 🤦🏻♂️ My bad. Your example works in both cases - tbucket and bucket. The test has been added to tbucket.csv-spec. Thanks for that! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It still didn't work due to the implicit nature of the @timestamp field and inability to resolve such fields. I found a workaround for that by creating a set of functions that require implicit timestamps, then recursively moving through the plan, checking if the functions require a timestamp and adding it to the nameList. See the changes in the class FieldNameUtils. Please, let me know what you think about this change and if there are better ways to implement this. |
||
|
||
docsGettingStartedBucketStatsByMedian#[skip:-8.13.99, reason:BUCKET renamed in 8.14] | ||
// tag::gs-bucket-stats-by-median[] | ||
FROM sample_data | ||
| KEEP @timestamp, event_duration | ||
| STATS median_duration = MEDIAN(event_duration) BY bucket = TBUCKET(@timestamp, "1 hour") | ||
// end::gs-bucket-stats-by-median[] | ||
| SORT bucket | ||
; | ||
|
||
median_duration:double | bucket:date | ||
3107561.0 |2023-10-23T12:00:00.000Z | ||
1756467.0 |2023-10-23T13:00:00.000Z | ||
; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we expect There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is expected. I've added a test to the union_types.csv-spec. Thank you! |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,6 @@ | |
public class GroupingWritables { | ||
|
||
public static List<NamedWriteableRegistry.Entry> getNamedWriteables() { | ||
return List.of(Bucket.ENTRY, Categorize.ENTRY); | ||
return List.of(Bucket.ENTRY, Categorize.ENTRY, TBucket.ENTRY); | ||
|
||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.xpack.esql.expression.function.grouping; | ||
|
||
import org.elasticsearch.common.Rounding; | ||
import org.elasticsearch.common.io.stream.NamedWriteableRegistry; | ||
import org.elasticsearch.common.io.stream.StreamInput; | ||
import org.elasticsearch.common.io.stream.StreamOutput; | ||
import org.elasticsearch.compute.operator.EvalOperator.ExpressionEvaluator; | ||
import org.elasticsearch.xpack.esql.capabilities.PostOptimizationVerificationAware; | ||
import org.elasticsearch.xpack.esql.common.Failures; | ||
import org.elasticsearch.xpack.esql.core.expression.Expression; | ||
import org.elasticsearch.xpack.esql.core.expression.FoldContext; | ||
import org.elasticsearch.xpack.esql.core.tree.NodeInfo; | ||
import org.elasticsearch.xpack.esql.core.tree.Source; | ||
import org.elasticsearch.xpack.esql.core.type.DataType; | ||
import org.elasticsearch.xpack.esql.expression.LocalSurrogateExpression; | ||
import org.elasticsearch.xpack.esql.expression.function.FunctionInfo; | ||
import org.elasticsearch.xpack.esql.expression.function.FunctionType; | ||
import org.elasticsearch.xpack.esql.expression.function.Param; | ||
import org.elasticsearch.xpack.esql.expression.function.TwoOptionalArguments; | ||
import org.elasticsearch.xpack.esql.expression.function.scalar.date.DateTrunc; | ||
import org.elasticsearch.xpack.esql.io.stream.PlanStreamInput; | ||
import org.elasticsearch.xpack.esql.stats.SearchStats; | ||
|
||
import java.io.IOException; | ||
import java.util.List; | ||
|
||
import static org.elasticsearch.xpack.esql.core.expression.TypeResolutions.ParamOrdinal.SECOND; | ||
import static org.elasticsearch.xpack.esql.core.expression.TypeResolutions.isType; | ||
import static org.elasticsearch.xpack.esql.expression.Validations.isFoldable; | ||
import static org.elasticsearch.xpack.esql.expression.function.scalar.date.DateTrunc.maybeSubstituteWithRoundTo; | ||
import static org.elasticsearch.xpack.esql.session.Configuration.DEFAULT_TZ; | ||
|
||
/** | ||
* Splits dates and numbers into a given number of buckets. There are two ways to invoke | ||
* this function: with a user-provided span (explicit invocation mode), or a span derived | ||
* from a number of desired buckets (as a hint) and a range (auto mode). | ||
* In the former case, two parameters will be provided, in the latter four. | ||
leontyevdv marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
*/ | ||
public class TBucket extends GroupingFunction.EvaluatableGroupingFunction | ||
implements | ||
PostOptimizationVerificationAware, | ||
TwoOptionalArguments, | ||
LocalSurrogateExpression { | ||
leontyevdv marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(Expression.class, "TBucket", TBucket::new); | ||
|
||
private final Expression field; | ||
private final Expression buckets; | ||
|
||
@FunctionInfo( | ||
returnType = { "double", "date", "date_nanos" }, | ||
description = """ | ||
Creates groups of values - buckets - out of a datetime or numeric input. | ||
The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.""", | ||
examples = {}, | ||
type = FunctionType.GROUPING | ||
) | ||
public TBucket( | ||
Source source, | ||
@Param( | ||
name = "field", | ||
type = { "integer", "long", "double", "date", "date_nanos" }, | ||
description = "Numeric or date expression from which to derive buckets." | ||
) Expression field, | ||
@Param( | ||
name = "buckets", | ||
type = { "integer", "long", "double", "date_period", "time_duration" }, | ||
description = "Target number of buckets, or desired bucket size if `from` and `to` parameters are omitted." | ||
) Expression buckets | ||
) { | ||
super(source, List.of(field, buckets)); | ||
this.field = field; | ||
this.buckets = buckets; | ||
} | ||
|
||
private TBucket(StreamInput in) throws IOException { | ||
this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class), in.readNamedWriteable(Expression.class)); | ||
} | ||
|
||
|
||
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
source().writeTo(out); | ||
out.writeNamedWriteable(field); | ||
out.writeNamedWriteable(buckets); | ||
} | ||
|
||
@Override | ||
public String getWriteableName() { | ||
return ENTRY.name; | ||
} | ||
|
||
@Override | ||
public ExpressionEvaluator.Factory toEvaluator(ToEvaluator toEvaluator) { | ||
leontyevdv marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Rounding.Prepared preparedRounding = getDateRounding(toEvaluator.foldCtx(), null, null); | ||
return DateTrunc.evaluator(field.dataType(), source(), toEvaluator.apply(field), preparedRounding); | ||
} | ||
|
||
/** | ||
* Returns the date rounding from this bucket function if the target field is a date type; otherwise, returns null. | ||
*/ | ||
public Rounding.Prepared getDateRoundingOrNull(FoldContext foldCtx) { | ||
return getDateRounding(foldCtx, null, null); | ||
} | ||
|
||
private Rounding.Prepared getDateRounding(FoldContext foldContext, Long min, Long max) { | ||
assert DataType.isTemporalAmount(buckets.dataType()) : "Unexpected span data type [" + buckets.dataType() + "]"; | ||
return DateTrunc.createRounding(buckets.fold(foldContext), DEFAULT_TZ, min, max); | ||
} | ||
|
||
@Override | ||
protected TypeResolution resolveType() { | ||
if (childrenResolved() == false) { | ||
return new TypeResolution("Unresolved children"); | ||
} | ||
return isType(buckets, DataType::isTemporalAmount, sourceText(), SECOND, "date_period", "time_duration"); | ||
} | ||
|
||
@Override | ||
public void postOptimizationVerification(Failures failures) { | ||
String operation = sourceText(); | ||
failures.add(isFoldable(buckets, operation, SECOND)); | ||
} | ||
|
||
@Override | ||
public DataType dataType() { | ||
return field.dataType(); | ||
} | ||
|
||
@Override | ||
public Expression replaceChildren(List<Expression> newChildren) { | ||
return new TBucket(source(), newChildren.get(0), newChildren.get(1)); | ||
} | ||
|
||
@Override | ||
protected NodeInfo<? extends Expression> info() { | ||
return NodeInfo.create(this, TBucket::new, field, buckets); | ||
} | ||
|
||
public Expression field() { | ||
return field; | ||
} | ||
|
||
public Expression buckets() { | ||
return buckets; | ||
} | ||
|
||
@Override | ||
public String toString() { | ||
return "Bucket{" + "field=" + field + ", buckets=" + buckets + "}"; | ||
} | ||
|
||
@Override | ||
public Expression surrogate(SearchStats searchStats) { | ||
// LocalSubstituteSurrogateExpressions should make sure this doesn't happen | ||
assert searchStats != null : "SearchStats cannot be null"; | ||
return maybeSubstituteWithRoundTo( | ||
source(), | ||
field(), | ||
buckets(), | ||
searchStats, | ||
(interval, minValue, maxValue) -> getDateRounding(FoldContext.small(), minValue, maxValue) | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to also include a test or two for
TBUCKET
in an eval; something like| EVAL key = TBUCKET(1 hour) | STATS minimum = MIN(whatever) BY key