Skip to content

Conversation

@tvernum
Copy link
Contributor

@tvernum tvernum commented Aug 13, 2025

This changes the implementation of Glob (used by FilterPath) to use a non-recursive algorithm for improved efficiency and stability

This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability
@tvernum tvernum requested a review from rjernst August 13, 2025 09:02
@tvernum tvernum requested a review from a team as a code owner August 13, 2025 09:02
@tvernum tvernum added :Core/Infra/REST API REST infrastructure and utilities >refactoring auto-backport Automatically create backport pull requests when merged v9.2.0 v8.18.6 v9.0.6 v9.1.3 v8.19.3 labels Aug 13, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Aug 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@ldematte
Copy link
Contributor

Very interesting! Do you have benchmarks/a rough indication of the speedup?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a minor suggestion. Thanks for all the tests!

// There is another star, with a literal in between the current position and that '*'
// That is, we have "*literal*"
// We want the first '*' to consume everything up until the first occurrence of "literal" in the input string
int match = str.indexOf(pattern.substring(patternIndex, nextStar), strIndex);
Copy link
Member

@rjernst rjernst Aug 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could move this to a helper method that performs charAt itself using a range of indices instead of needing to construct a substring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried that, but it was slower (almost an order of magnitude so).

I suspect that's becauseString.indexOf marked as @IntrinsicCandidate and the intrinsic indexOf is fast enough to offset the cost of the substring copy.

@tvernum
Copy link
Contributor Author

tvernum commented Aug 18, 2025

Very interesting! Do you have benchmarks/a rough indication of the speedup?

AFAIK, the only existing benchmark we have in this space is FilterContentBenchmark and it doesn't really show much because it doesn't use complex Globs.

I wrote a small benchmark here (I don't think it's worth merging because then we need to maintain the code and it doesn't have long term value).

The new implementation is faster in all but 2 cases:

  • It's identical for a single literal "*" (because both implementations optimize for that)
  • It's slightly slower for simple infix expressions ("*something*") and I haven't worked out exactly why, but the difference isn't significant enough to really worry about:
Benchmark main (ns/op) PR (ns/op) PR / main (smaller is better)
GlobBenchmark.complex 4978.795 2920.159 59%
GlobBenchmark.infix 1101.739 1219.746 111%
GlobBenchmark.multipleAsterisk 325463.383 55009.392 17%
GlobBenchmark.pathological 11549.677 3335.185 29%
GlobBenchmark.prefix 2062.921 672.972 33%
GlobBenchmark.singleAsterisk 29.35 29.288 100%
GlobBenchmark.suffix 851.258 542.771 64%

It's the multiple asterisk and pathological cases that are the main target of the PR, but the prefix and suffix improvements are a nice bonus.

@ldematte
Copy link
Contributor

It's the multiple asterisk and pathological cases that are the main target of the PR, but the prefix and suffix improvements are a nice bonus.

Very cool, and nice improvements! 👍

@tvernum tvernum merged commit d2387cf into elastic:main Aug 19, 2025
34 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
9.0
9.1
8.19

tvernum added a commit to tvernum/elasticsearch that referenced this pull request Aug 19, 2025
This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability
tvernum added a commit to tvernum/elasticsearch that referenced this pull request Aug 19, 2025
This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability
tvernum added a commit to tvernum/elasticsearch that referenced this pull request Aug 19, 2025
This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability
elasticsearchmachine pushed a commit that referenced this pull request Aug 19, 2025
This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability
elasticsearchmachine pushed a commit that referenced this pull request Aug 19, 2025
This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability
elasticsearchmachine pushed a commit that referenced this pull request Aug 19, 2025
This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 19, 2025
…improv

* upstream/main: (92 commits)
  ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064)
  Fix 404s in REST API landing page (elastic#133086)
  Fix release tests for OptimizerVerificationTests (elastic#133100)
  Make Glob non-recursive (elastic#132798)
  Update ES|QL function list for release versions (elastic#133096)
  Split transport version func test into abstract base (elastic#133035)
  Omit project ID from snapshot metrics (elastic#133098)
  Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013
  Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015
  Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097
  Rename initial to unreferenced in transport versions (elastic#133082)
  Rename exception type header (elastic#133045)
  ESQL: Pluggable tests for Operator status (elastic#132876)
  ESQL: Mark new signatures in MIN and MAX (elastic#132980)
  Don't try to serialize half-baked cluster info (elastic#132756)
  migrate ml_rollover_legacy_indices transport version (elastic#133008)
  Enable `exclude_source_vectors` by default for new indices (elastic#131907)
  Expose APIs needed by flush during translog replay (elastic#132960)
  Change reporting_user role to leverage reserved kibana privileges (elastic#132766)
  Update TasksIT for batched execution (elastic#132762)
  ...
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 19, 2025
* upstream/main: (58 commits)
  ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064)
  Fix 404s in REST API landing page (elastic#133086)
  Fix release tests for OptimizerVerificationTests (elastic#133100)
  Make Glob non-recursive (elastic#132798)
  Update ES|QL function list for release versions (elastic#133096)
  Split transport version func test into abstract base (elastic#133035)
  Omit project ID from snapshot metrics (elastic#133098)
  Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013
  Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015
  Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097
  Rename initial to unreferenced in transport versions (elastic#133082)
  Rename exception type header (elastic#133045)
  ESQL: Pluggable tests for Operator status (elastic#132876)
  ESQL: Mark new signatures in MIN and MAX (elastic#132980)
  Don't try to serialize half-baked cluster info (elastic#132756)
  migrate ml_rollover_legacy_indices transport version (elastic#133008)
  Enable `exclude_source_vectors` by default for new indices (elastic#131907)
  Expose APIs needed by flush during translog replay (elastic#132960)
  Change reporting_user role to leverage reserved kibana privileges (elastic#132766)
  Update TasksIT for batched execution (elastic#132762)
  ...
elasticsearchmachine pushed a commit that referenced this pull request Aug 20, 2025
* Make Glob non-recursive (#132798)

This changes the implementation of `Glob` (used by `FilterPath`)
to use a non-recursive algorithm for improved efficiency and
stability

* [Test] Don't include '*' in glob pattern literals (#133114) (#133125)

This changes the random patterns that are generated inside `GlobTests`
to not generate `*` characters when a literal string is intended
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Core/Infra/REST API REST infrastructure and utilities >refactoring Team:Core/Infra Meta label for core/infra team v8.18.6 v8.19.3 v9.0.6 v9.1.3 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants