-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make Glob non-recursive #132798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Glob non-recursive #132798
Conversation
This changes the implementation of `Glob` (used by `FilterPath`) to use a non-recursive algorithm for improved efficiency and stability
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
Very interesting! Do you have benchmarks/a rough indication of the speedup? |
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a minor suggestion. Thanks for all the tests!
| // There is another star, with a literal in between the current position and that '*' | ||
| // That is, we have "*literal*" | ||
| // We want the first '*' to consume everything up until the first occurrence of "literal" in the input string | ||
| int match = str.indexOf(pattern.substring(patternIndex, nextStar), strIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could move this to a helper method that performs charAt itself using a range of indices instead of needing to construct a substring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tried that, but it was slower (almost an order of magnitude so).
I suspect that's becauseString.indexOf marked as @IntrinsicCandidate and the intrinsic indexOf is fast enough to offset the cost of the substring copy.
AFAIK, the only existing benchmark we have in this space is I wrote a small benchmark here (I don't think it's worth merging because then we need to maintain the code and it doesn't have long term value). The new implementation is faster in all but 2 cases:
It's the multiple asterisk and pathological cases that are the main target of the PR, but the prefix and suffix improvements are a nice bonus. |
Very cool, and nice improvements! 👍 |
This changes the implementation of `Glob` (used by `FilterPath`) to use a non-recursive algorithm for improved efficiency and stability
This changes the implementation of `Glob` (used by `FilterPath`) to use a non-recursive algorithm for improved efficiency and stability
This changes the implementation of `Glob` (used by `FilterPath`) to use a non-recursive algorithm for improved efficiency and stability
…improv * upstream/main: (92 commits) ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064) Fix 404s in REST API landing page (elastic#133086) Fix release tests for OptimizerVerificationTests (elastic#133100) Make Glob non-recursive (elastic#132798) Update ES|QL function list for release versions (elastic#133096) Split transport version func test into abstract base (elastic#133035) Omit project ID from snapshot metrics (elastic#133098) Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013 Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015 Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097 Rename initial to unreferenced in transport versions (elastic#133082) Rename exception type header (elastic#133045) ESQL: Pluggable tests for Operator status (elastic#132876) ESQL: Mark new signatures in MIN and MAX (elastic#132980) Don't try to serialize half-baked cluster info (elastic#132756) migrate ml_rollover_legacy_indices transport version (elastic#133008) Enable `exclude_source_vectors` by default for new indices (elastic#131907) Expose APIs needed by flush during translog replay (elastic#132960) Change reporting_user role to leverage reserved kibana privileges (elastic#132766) Update TasksIT for batched execution (elastic#132762) ...
* upstream/main: (58 commits) ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064) Fix 404s in REST API landing page (elastic#133086) Fix release tests for OptimizerVerificationTests (elastic#133100) Make Glob non-recursive (elastic#132798) Update ES|QL function list for release versions (elastic#133096) Split transport version func test into abstract base (elastic#133035) Omit project ID from snapshot metrics (elastic#133098) Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013 Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015 Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097 Rename initial to unreferenced in transport versions (elastic#133082) Rename exception type header (elastic#133045) ESQL: Pluggable tests for Operator status (elastic#132876) ESQL: Mark new signatures in MIN and MAX (elastic#132980) Don't try to serialize half-baked cluster info (elastic#132756) migrate ml_rollover_legacy_indices transport version (elastic#133008) Enable `exclude_source_vectors` by default for new indices (elastic#131907) Expose APIs needed by flush during translog replay (elastic#132960) Change reporting_user role to leverage reserved kibana privileges (elastic#132766) Update TasksIT for batched execution (elastic#132762) ...
* Make Glob non-recursive (#132798) This changes the implementation of `Glob` (used by `FilterPath`) to use a non-recursive algorithm for improved efficiency and stability * [Test] Don't include '*' in glob pattern literals (#133114) (#133125) This changes the random patterns that are generated inside `GlobTests` to not generate `*` characters when a literal string is intended
This changes the implementation of
Glob(used byFilterPath) to use a non-recursive algorithm for improved efficiency and stability