-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Adding Contains ESQL String function #133016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @mjmbischoff, I've created a changelog YAML for you. |
|
If merged, can rewrite the following: But a nicer way to check multi valued fields for a value is still needed: #120782 |
|
#98545 (comment) should probably be updated. |
…ssumed it was a marker for documentation. But it's used to ignore the test cases. Re-enabling the tests and fixing them.
…ompatibility test environment. - not sure how to test it as, I feel like the version should be main on main / dev. Doing the dance for now.
I'm so sorry we haven't written a better way yet. |
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you drop the null checks in the eval method please? They are confusing because those things can't be null because of the @Evaluator annotation. Makes a reader have to go and look up stuff in the generated code.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/string.csv-spec
Outdated
Show resolved
Hide resolved
| } | ||
| String utf8ToString = str.utf8ToString(); | ||
| return utf8ToString.contains(substr.utf8ToString()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably faster to do it by hand without converting to a String first. But this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so as well, but having been bitten by character encoding subtleties before.. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. ESQL deals in utf-8 naturally like Elasticsearch and Lucene mostly do. It's probably fine to do the contains on raw strings because utf-8 is like that. But it's worth making super sure before you do it. So it can wait.
...l/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/Contains.java
Show resolved
Hide resolved
|
Looks great - I asked for two things that should happen before merging (the capability and the null check) and one thing you can do, or it can wait for a while (replace utf8ToString with a loop checking). |
|
While I think we need a 'passAlongNulls'/'dontHandleNulls' option for the generator; Here I'm perfectly fine with the handling, as all string functions seem to return null if any of the arguments are null. (For some it's undocumented - should probably be fixed) |
95% of our functions have "any null input returns null" semantics which is very SQL. Very in the spirit of "null means unknown" which SQL likes. In Elasticsearch But in this case I think "any null input returns null" is natural and good. |
If you have a list or want to add the docs, that'd be awesome. |
I've now added it to my (long) TODO list. 😅 |
💔 Backport failedThe backport operation could not be completed due to the following error: You can use sqren/backport to manually backport by running |
When we compile the code for `CONTAINS` we generate an evaluator java class and commit that, as is our ancient custom. But because elastic#133016 didn't see elastic#133392, we committed out of date code. That's fine because we regenerate the code on every compile. But it's annoying because every clone is out of date. This updates the generated file. You may be asking "why do you commit the generated code if you just generate it at compile time?" That's a good question! It's a grand tradition, one that we will probably one day leave behind. But let's celebrate it today by committing more code.
When we compile the code for `CONTAINS` we generate an evaluator java class and commit that, as is our ancient custom. But because #133016 didn't see #133392, we committed out of date code. That's fine because we regenerate the code on every compile. But it's annoying because every clone is out of date. This updates the generated file. You may be asking "why do you commit the generated code if you just generate it at compile time?" That's a good question! It's a grand tradition, one that we will probably one day leave behind. But let's celebrate it today by committing more code.
Adding
containsES:QL function, this function checks if a substring is contained within the given string, returning a boolean. This is equivalent with usinglocateand checkling for!=0