-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Pushdown constructs doing case-insensitive regexes #128393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
caf2c94
3525880
f9b8e79
ae057c6
fdcdcb9
07411e1
7fa6686
bc25956
17b6db2
f70bc27
cb105fa
03c9202
10cd899
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| pr: 128393 | ||
| summary: Pushdown constructs doing case-insensitive regexes | ||
| area: ES|QL | ||
| type: enhancement | ||
| issues: | ||
| - 127479 |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,9 +21,10 @@ public RLikePattern(String regexpPattern) { | |
| } | ||
|
|
||
| @Override | ||
| public Automaton createAutomaton() { | ||
| public Automaton createAutomaton(boolean ignoreCase) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Expose ignoreCase as a property in StringPattern since it affects both the Automaton and javaRegex. The former can contain the mode but the latter doesn't so we need a way to bubble it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pattern is independent of how it's used for matching, casing-wise. The java regex version has it's own mechanism to flag case insensitivity and not sure it'd be trivial, or "safe", or even needed to modify it based on a method parameter. |
||
| int matchFlags = ignoreCase ? RegExp.CASE_INSENSITIVE : 0; | ||
| return Operations.determinize( | ||
| new RegExp(regexpPattern, RegExp.ALL | RegExp.DEPRECATED_COMPLEMENT).toAutomaton(), | ||
| new RegExp(regexpPattern, RegExp.ALL | RegExp.DEPRECATED_COMPLEMENT, matchFlags).toAutomaton(), | ||
| Operations.DEFAULT_DETERMINIZE_WORK_LIMIT | ||
| ); | ||
| } | ||
|
|
||
This file was deleted.
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above - make the parameter a class property. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,7 @@ | |
| package org.elasticsearch.xpack.esql.core.util; | ||
|
|
||
| import org.apache.lucene.document.InetAddressPoint; | ||
| import org.apache.lucene.search.WildcardQuery; | ||
| import org.apache.lucene.search.spell.LevenshteinDistance; | ||
| import org.apache.lucene.util.BytesRef; | ||
| import org.apache.lucene.util.CollectionUtil; | ||
|
|
@@ -178,6 +179,44 @@ public static String wildcardToJavaPattern(String pattern, char escape) { | |
| return regex.toString(); | ||
| } | ||
|
|
||
| /** | ||
| * Translates a Lucene wildcard pattern to a Lucene RegExp one. | ||
| * @param wildcard Lucene wildcard pattern | ||
| * @return Lucene RegExp pattern | ||
| */ | ||
| public static String luceneWildcardToRegExp(String wildcard) { | ||
| StringBuilder regex = new StringBuilder(); | ||
|
|
||
| for (int i = 0, wcLen = wildcard.length(); i < wcLen; i++) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
| char c = wildcard.charAt(i); // this will work chunking through Unicode as long as all values matched are ASCII | ||
| switch (c) { | ||
| case WildcardQuery.WILDCARD_STRING -> regex.append(".*"); | ||
| case WildcardQuery.WILDCARD_CHAR -> regex.append("."); | ||
| case WildcardQuery.WILDCARD_ESCAPE -> { | ||
| if (i + 1 < wcLen) { | ||
| // consume the wildcard escaping, consider the next char | ||
| char next = wildcard.charAt(i + 1); | ||
| i++; | ||
| switch (next) { | ||
| case WildcardQuery.WILDCARD_STRING, WildcardQuery.WILDCARD_CHAR, WildcardQuery.WILDCARD_ESCAPE -> | ||
| // escape `*`, `.`, `\`, since these are special chars in RegExp as well | ||
| regex.append("\\"); | ||
| // default: unnecessary escaping -- just ignore the escaping | ||
| } | ||
| regex.append(next); | ||
| } else { | ||
| // "else fallthru, lenient parsing with a trailing \" -- according to WildcardQuery#toAutomaton | ||
| regex.append("\\\\"); | ||
| } | ||
| } | ||
| case '$', '(', ')', '+', '.', '[', ']', '^', '{', '|', '}' -> regex.append("\\").append(c); | ||
| default -> regex.append(c); | ||
| } | ||
| } | ||
|
|
||
| return regex.toString(); | ||
| } | ||
|
|
||
| /** | ||
| * Translates a like pattern to a Lucene wildcard. | ||
| * This methods pays attention to the custom escape char which gets converted into \ (used by Lucene). | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped the now useless proxy-class.