Skip to content

Conversation

@fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Jul 22, 2025

This is a subtask of #131341.

The RoundToLinearSearchEvaluator does not outperform the manual evaluators, in some cases(data volume dependent), it’s even slower than the DateTruncDatetimeEvaluator.

Below are detailed profiling results (copied from PR #131341). This PR proposes replacing the linear search evaluators with manual ones. If the number of buckets is less than 11, the corresponding manual evaluator will be used instead. This helps performance regression tests when #131341 is merged.

{
            "operator" : "EvalOperator[evaluator=RoundToLongBinarySearchEvaluator[field=Attribute[channel=1]]]",
            "status" : {
              "process_nanos" : 47546640,
              "pages_processed" : 718,
              "rows_received" : 5131188,
              "rows_emitted" : 5131188
            }
},
{
            "operator" : "EvalOperator[evaluator=RoundToLongLinearSearchEvaluator[field=Attribute[channel=1]]]",
            "status" : {
              "process_nanos" : 36383473,
              "pages_processed" : 718,
              "rows_received" : 5131188,
              "rows_emitted" : 5131188
            }
}
{
            "operator" : "EvalOperator[evaluator=RoundToLong9Evaluator[field=Attribute[channel=1], p0=1419811200000, p1=1420416000000, p2=1421020800000, p3=1421625600000, p4=1422230400000, p5=1422835200000, p6=1423440000000, p7=1424044800000, p8=1424649600000]]",
            "status" : {
              "process_nanos" : 24817116,
              "pages_processed" : 718,
              "rows_received" : 5131188,
              "rows_emitted" : 5131188
            }
 }
{
            "operator" : "EvalOperator[evaluator=DateTruncDatetimeEvaluator[fieldVal=Attribute[channel=1], rounding=Rounding[WEEK_OF_WEEKYEAR in Z][fixed to midnight]]]",
            "status" : {
              "process_nanos" : 33711389,
              "pages_processed" : 761,
              "rows_received" : 5699770,
              "rows_emitted" : 5699770
            }
}

The correctness of these manual evaluators is well covered by the existing RoundToTests.
There may be more opportunities to fine-tune performance by experimenting with a wider variety of bucket counts and data volumes. From the current observations, smaller data volumes appear to be more sensitive to the choice of evaluator.

@fang-xing-esql fang-xing-esql added :Performance All issues related to Elasticsearch performance including regressions and investigations :Analytics/ES|QL AKA ESQL v9.2.0 labels Jul 22, 2025
"src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser*.java",
"src/main/generated/**/*.java",
"src/main/generated-src/generated/**/*.java"
"src/main/generated-src/**/*.java"
Copy link
Member Author

@fang-xing-esql fang-xing-esql Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip the format check on all files under src/main/generated-src/, I don't see this subfolder src/main/generated-src/generated/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked that we were running spotless on these - it forces us to write the templates in a way that keeps the style consistent. It's a pain though. Can you try removing this and fixing the templates? I know it's really picky and annoying. But it helps keep the code more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tricks!

@fang-xing-esql fang-xing-esql marked this pull request as ready for review July 24, 2025 15:26
@fang-xing-esql fang-xing-esql requested a review from nik9000 July 24, 2025 15:26
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Performance Meta label for performance team labels Jul 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-perf (Team:Performance)

@gbanasiak gbanasiak removed :Performance All issues related to Elasticsearch performance including regressions and investigations Team:Performance Meta label for performance team labels Jul 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

"src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBaseParser*.java",
"src/main/generated/**/*.java",
"src/main/generated-src/generated/**/*.java"
"src/main/generated-src/**/*.java"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked that we were running spotless on these - it forces us to write the templates in a way that keeps the style consistent. It's a pain though. Can you try removing this and fixing the templates? I know it's really picky and annoying. But it helps keep the code more readable.

}

@Evaluator(extraName = "5")
static double process(double field, @Fixed double p0, @Fixed double p1, @Fixed double p2, @Fixed double p3, @Fixed double p4) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to all of the process methods that say that it's basically a hand unrolled binary search?

@Fixed double p5,
@Fixed double p6
) {
if (field < p3) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be p4 instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p3 is the median of these 7 numbers.

@fang-xing-esql
Copy link
Member Author

Thanks for reviewing @nik9000! Comments to those manual binary search evaluators are added.

@fang-xing-esql fang-xing-esql merged commit 33fbe66 into elastic:main Jul 25, 2025
33 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 25, 2025
…-tracking

* upstream/main: (106 commits)
  Pipelines: Add `created_date` and `modified_date` (elastic#130847)
  add thread pool change availability (elastic#131734)
  Add failure store availability info / and port over privileges (elastic#131729)
  add availability information for ssl handshake timeout settings (elastic#131786)
  add availability information for rescore_vector (elastic#131710)
  add availability to oversample value of 0 (elastic#131707)
  clarify hnsw filter heuristic setting availability (elastic#131715)
  add availability info for default heap dump path change (elastic#131713)
  clarify default algorithms per stack version (elastic#131728)
  Refine error messages in `Fork` for correctness and clarity. (elastic#131701)
  [ES|QL] Replace RoundTo linear search evaluator with manual evaluators (elastic#131733)
  ESQL: Fix buildParams in tests with --configuration-cache (elastic#131826)
  Unmute `CrossClusterEsqlRCS2EnrichUnavailableRemotesIT#testEsqlEnrichWithSkipUnavailable` (elastic#131916)
  Allow templates for `.chat-*` index template (elastic#131914)
  ESQL: Fix NPE on empty to_lower/to_upper call (elastic#131917)
  Fix score computation in ES91Int4VectorsScorer (elastic#131905)
  Register a blob cache long counter metric for total evicted regions (elastic#131862)
  Move plan attribute resolution to its own component (elastic#131830)
  Make restore support multi-project (elastic#131661)
  Use logically more correct expression (elastic#131869)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants