-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Replace grouping by DateFormat with DateTrunc
#129277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ESQL: Replace grouping by DateFormat with DateTrunc
#129277
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
…tting # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/ReplaceAggregateNestedExpressionWithEval.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
|
Thanks for your review, @fang-xing-esql! I’ve updated the branch based on your comments. I revised the logic for inferring the minimal time interval corresponding to a given date format. Note that I’d appreciate it if you could take another look when you have time. |
|
buildkite test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding more tests and handle inlinestats @kanoshiou! I added some comment around the casting of the format to literal.
I'll need to look into the change related to inline stats more, as it seems a bit more complicated than I had expected. This change created an eval on top of aggregate, which makes inline stats a bit tricky.
And I'm wondering if you could elaborate a bit more about why this transformation does not apply to date_nanos? I tried some queries on date_nanos, and they seem work fine, the changes to the rule doesn't block date_nanos. Please find my experiments below(the 2nd and 4th query show equivalent results). Perhaps there is something that I missed here.
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from test1 | eval x= date_format(\"y-MM-dd\", nanos)"
}
'
millis | nanos | num | x
------------------------+---------------------------------------------------------------------------------------------+-------------------+---------------
2023-10-23T13:55:01.543Z|2023-10-23T13:55:01.543123456Z |1698069301543123456|2023-10-23
2023-10-23T13:55:01.543Z|2023-10-23T12:55:01.543123456Z |1698069301543123456|2023-10-23
1999-10-23T12:15:03.360Z|[2023-01-23T13:55:01.543123456Z, 2023-02-23T13:33:34.937193Z, 2023-03-23T12:15:03.360103847Z]|0 |null
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from test1 | stats count(*) by date_format(\"y-MM-dd\", nanos)"
}
'
count(*) |date_format("y-MM-dd", nanos)
---------------+-----------------------------
1 |null
2 |2023-10-23
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from test1 | eval x= date_trunc(1 day, nanos), y= date_format(\"y-MM-dd\", x)"
}
'
millis | nanos | num | x | y
------------------------+---------------------------------------------------------------------------------------------+-------------------+------------------------+---------------
2023-10-23T13:55:01.543Z|2023-10-23T13:55:01.543123456Z |1698069301543123456|2023-10-23T00:00:00.000Z|2023-10-23
2023-10-23T13:55:01.543Z|2023-10-23T12:55:01.543123456Z |1698069301543123456|2023-10-23T00:00:00.000Z|2023-10-23
1999-10-23T12:15:03.360Z|[2023-01-23T13:55:01.543123456Z, 2023-02-23T13:33:34.937193Z, 2023-03-23T12:15:03.360103847Z]|0 |null |null
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from test1 | stats count(*) by x = date_trunc(1 day, nanos) | eval y= date_format(\"y-MM-dd\", x)"
}
'
count(*) | x | y
---------------+------------------------+---------------
1 |null |null
2 |2023-10-23T00:00:00.000Z|2023-10-23
| evals.add(as); | ||
| if (asChild instanceof DateFormat df) { | ||
| // Extract the format pattern and field from DateFormat | ||
| Literal format = (Literal) df.format(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases when the format is not a constant value, this cast might not be safe here, for example this query below will fail. The concat function might need to be evaluated before this transformation from date_format to date_trunc.
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from sample_data | eval format = concat(\"yyyy\", \"mm\") | stats count(*) by date_format(format, @timestamp)"
}
'
{
"error" : {
"root_cause" : [
{
"type" : "class_cast_exception",
"reason" : "class org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute cannot be cast to class org.elasticsearch.xpack.esql.core.expression.Literal (org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute and org.elasticsearch.xpack.esql.core.expression.Literal are in unnamed module of loader java.net.URLClassLoader @de81be1)"
}
],
"type" : "class_cast_exception",
"reason" : "class org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute cannot be cast to class org.elasticsearch.xpack.esql.core.expression.Literal (org.elasticsearch.xpack.esql.core.expression.ReferenceAttribute and org.elasticsearch.xpack.esql.core.expression.Literal are in unnamed module of loader java.net.URLClassLoader @de81be1)"
},
"status" : 500
}
…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PropagateInlineEvals.java
…g-with-formatting
…g-with-formatting
|
Thanks for your review @fang-xing-esql! I thought you were referring to formatting nanosecond intervals using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the previous round of comments and for being patient, @kanoshiou! I went through the code changes again, and for most queries, the PR does a great job transforming date_format to date_trunc correctly — really appreciate the effort.
I’ve added a few comments in the code suggesting minor updates to the tests and additional explanations in areas that are a bit tricky to follow. Since aggregation can get complex and tricky, having broader test coverage will help ensure that future changes don’t unintentionally alter expected behavior.
I have tried some normal queries, they work as expected. And I'm trying to push it harder with a bit more tricky queries, and came across these situations. In ES|QL , we allow the flexibility that grouping can be referenced in the aggregate explicitly, and sometime this flexibility adds extra complexity. These two queries below don't error out in main.
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from test1 | stats count(*), concat(x, \"01\") by x = date_format(\"y-MM-dd\", nanos)"
}
'
{
"error" : {
"root_cause" : [
{
"type" : "null_pointer_exception",
"reason" : "Cannot invoke \"org.elasticsearch.xpack.esql.planner.Layout$ChannelAndType.channel()\" because the return value of \"org.elasticsearch.xpack.esql.planner.Layout.get(org.elasticsearch.xpack.esql.core.expression.NameId)\" is null"
}
],
"type" : "null_pointer_exception",
"reason" : "Cannot invoke \"org.elasticsearch.xpack.esql.planner.Layout$ChannelAndType.channel()\" because the return value of \"org.elasticsearch.xpack.esql.planner.Layout.get(org.elasticsearch.xpack.esql.core.expression.NameId)\" is null"
},
"status" : 500
}
+ curl -u elastic:password -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from test1 | inline stats count(*), concat(x, \"01\") by x = date_format(\"y-MM-dd\", nanos)"
}
'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_state_exception",
"reason" : "Found 1 problem\nline 1:14: Plan [Eval[[CONCAT(x{r}#106,01[KEYWORD]) AS concat(x, \"01\")#103]]] optimized incorrectly due to missing references [x{r}#106]"
}
],
"type" : "illegal_state_exception",
"reason" : "Found 1 problem\nline 1:14: Plan [Eval[[CONCAT(x{r}#106,01[KEYWORD]) AS concat(x, \"01\")#103]]] optimized incorrectly due to missing references [x{r}#106]"
},
"status" : 500
}
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PropagateInlineEvals.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PropagateInlineEvals.java
Outdated
Show resolved
Hide resolved
…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
|
I sincerely appreciate the time and effort you put into reviewing this, @fang-xing-esql! Thanks for your review! Your edge cases helped me deepen my understanding of Regarding the case where the |
|
buildkite test this |
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec
|
buildkite test this |
|
buildkite test this |
|
@kanoshiou First of all, we truly appreciate the effort you put into this PR as always. From a functional standpoint, I couldn't find any more issues or bugs at this moment. However, this change is not a simple bug fix, it’s more of a feature that could have potential performance implications. After internal discussion within the team carefully, we would recommend pausing and not merging this PR for now, considering the following aspects:
Overall, the logic changes in these two rules make sense, we like it, and we’d like to revisit this PR in the near future once we have a clearer understanding of its interaction with timezone support and potential performance implications. Thank you very much for your understanding, and we hope it make sense to you as well, don't hesitate to reach out to us for any questions. |
…tting # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec
|
Thanks for the detailed feedback and I completely understand your concerns, @fang-xing-esql. I'm considering adding some benchmarks for this PR. However, after reviewing the existing code under the As for timezone handling, I believe this PR should be fine as long as both |
Optimize date grouping with formatting in ESQL
This PR optimizes the performance of date grouping operations in ESQL by automatically converting
DATE_FORMATinGROUP BYclauses to more efficientDATE_TRUNCoperations. The optimization:DATE_FORMATpatterns to equivalentDATE_TRUNCintervalsExample optimization:
becomes:
Optimized plan:
Closes #114772