-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fixed Character Class and Ranges WildCardQuery Optimizations #126154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed Character Class and Ranges WildCardQuery Optimizations #126154
Conversation
…ents to regexp in lucene (14193)
...ugin/wildcard/src/main/java/org/elasticsearch/xpack/wildcard/mapper/WildcardFieldMapper.java
Outdated
Show resolved
Hide resolved
...ugin/wildcard/src/main/java/org/elasticsearch/xpack/wildcard/mapper/WildcardFieldMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit on the fence here as we are trying to optimize more than before so we would need to expand our test and perform some performance test?
@iverase Good question I'm not sure if we do need more tests? I believe this essentially is optimizing what was previously optimized? Maybe we are catching some additional cases but my guess is that those cases we probably did previously catch and then slowly as Lucene has been optimized away from Union operations we've been missing them here. So I would buy we just need more test coverage here in general? And sorry I should have provided some of this detail in the summary. But why I think this is pretty close to the same is that previously a Union for character class and range was a combination of single characters that were part of that range (or a set of ranges as a character class). Those single code points were optimized into the
Having said that, I can completely understand the apprehension here. My only counter to that is that this particular use case will be a regression from prior versions in terms of performance. I'll defer to your / groups wisdom here though. I can put a PR that removes that ^ specific test instead for now, which should cause the test to pass as it's no longer being optimized. And target this PR against main for a subsequent release. Thoughts? |
We are now optimising I am thinking if we might only support |
By the way, you might want to merge the latest changes in the lucene_snapshot branch to get rid of most of the CI issues. |
Makes sense and seems like a reasonable compromise. I've updated the code to reflect that. I'll pull this out of draft as it seems like we might be narrowing down to a good solution for now. |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
...ugin/wildcard/src/main/java/org/elasticsearch/xpack/wildcard/mapper/WildcardFieldMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for iterating on it.
fixed character class and ranges lacking optimizations after improvements to regexp in lucene (14193)