Skip to content

Conversation

@fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Aug 13, 2025

This is an enhancement on top of #132512 - merge the range query generated from the RoundTo function with the range query on the same field generated from predicates. The motivation is to measure if the performance is better when the range queries are merged when possible.

#132512 generates a list of query with tags, and each query leg attaches a range query to the main query, if there are range query on the same field in the main query, the two range queries are separated, and not merged together.

It looks like below

QueryBuilderAndTags{queryBuilder=[{
  "bool" : {
    "filter" : [
      {
        "esql_single_value" : {
          "field" : "date",
          "next" : {
            "range" : {
              "date" : {
                "gte" : "2023-10-19T00:00:00.000Z",
                "lte" : "2023-10-24T00:00:00.000Z",
                "time_zone" : "Z",
                "format" : "strict_date_optional_time",
                "boost" : 0.0
              }
            }
          },
          "source" : "date >= \"2023-10-19\"@2:9"
        }
      },
      {
        "esql_single_value" : {
          "field" : "date",
          "next" : {
            "range" : {
              "date" : {
                "gte" : "2023-10-21T00:00:00.000Z",
                "lt" : "2023-10-22T00:00:00.000Z",
                "time_zone" : "Z",
                "format" : "strict_date_optional_time",
                "boost" : 0.0
              }
            }
          },
          "source" : "date_trunc(1 day, date)@3:25"
        }
      }
    ],
    "boost" : 1.0
  }
}], tags=[1697846400000]

This PR merges this two range queries above into one range query when applicable(one is superset of the other), the above query leg will look like below.

QueryBuilderAndTags[query={
      {
        "esql_single_value" : {
          "field" : "date",
          "next" : {
            "range" : {
              "date" : {
                "gte" : "2023-10-21T00:00:00.000Z",
                "lt" : "2023-10-22T00:00:00.000Z",
                "time_zone" : "Z",
                "format" : "strict_date_optional_time",
                "boost" : 0.0
              }
            }
          },
          "source" : "round_to(date, \"2023-10-20\"::date,\"2023-10-21\"::date,\"2023-10-22\"::date,\"2023-10-23\"::date)@3:25"
        }
      }
}, tags=[1697846400000]]

Here are some performance measurement on main, the branch without merging range queries, and this branch that merges range queries

Q1 - monthly interval on 12 months of data, 12 buckets
FROM nyc_taxis 
| where dropoff_datetime < "2016-01-01" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 month, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 1790 ms
without merging range queries: 579 ms, 68% faster
merge range queries: 642 ms, 64% faster

Q2 - monthly interval on 9 months of data, 9 buckets
FROM nyc_taxis 
| where dropoff_datetime < "2015-10-01" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 month, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 1312 ms
without merging range queries: 447 ms, 66% faster
merge range queries: 473 ms, 64% faster

Q3 - monthly interval on 3 months of data, 3 buckets
FROM nyc_taxis 
| where dropoff_datetime <= "2015-03-01" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 month, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 413 ms
without merging range queries: 171 ms, 59% faster
merge range queries: 181 ms, 56% faster

Q4 - weekly interval on 12 weeks of data, 12 buckets
FROM nyc_taxis 
| where dropoff_datetime < "2015-03-23" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 week, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 585 ms
without merging range queries: 362 ms, 38% faster
merge range queries: 328 ms, 44% faster

Q5 - weekly interval on 9 weeks of data, 9 buckets
FROM nyc_taxis 
| where dropoff_datetime < "2015-03-01" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 week, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 456 ms
without merging range queries: 230 ms, 50% faster
merge range queries: 254 ms, 44% faster

Q6 - weekly interval on 3 weeks of data, 3 buckets
FROM nyc_taxis 
| where dropoff_datetime <= "2015-01-19" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 week, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 167 ms
without merging range queries: 152 ms, 25% faster
merge range queries: 136 ms, 19% faster

Q7 - daily interval on 12 days of data, 12 buckets
FROM nyc_taxis 
| where dropoff_datetime < "2015-01-13" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 week, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 136 ms
without merging range queries: 127 ms, 6% faster
merge range queries: 132 ms, 3% faster

Q8 - daily interval on 9 days of data, 9 buckets
FROM nyc_taxis 
| where dropoff_datetime < "2015-01-10" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 week, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 99 ms
without merging range queries: 93 ms, 6% faster
merge range queries: 101 ms, 2% slower

Q9 - daily interval on 3 days of data, 3 buckets
FROM nyc_taxis 
| where dropoff_datetime <= "2015-01-04" AND dropoff_datetime >= "2015-01-01" 
| eval dropoffs_over_time=date_trunc(1 week, dropoff_datetime) 
| stats c = count(dropoff_datetime) by dropoffs_over_time 
| sort dropoffs_over_time

main: 48 ms
without merging range queries: 50 ms, 5% slower
merge range queries: 46 ms, 2% faster

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql
Copy link
Member Author

#132512 was merged, we don't do the merge range queries right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants