-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Avoid unintended attribute removal #127563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Avoid unintended attribute removal #127563
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
buildkite test this |
|
I'm really confused. I can get a result of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kanoshiou I really appreciate you provided this PR and you tried to understand where the fix needs to go.
Unfortunately, this is a trivial fix that doesn't quite catch the true reason for the initial failure.
The essence of the failure is the fact that lookup join can override the type user-provided field with its own type field. The type from eval is of type integer while the type that comes from the lookup join is of type keyword. Because the list of field names didn't contain type the Analyzer couldn't learn about that type that exists in message_types_lookup and the only thing it knew about its type is that it comes from eval.
lookup join (like enrich and maybe inlinestats) are special in the sense that they add some columns to the result, so some of the rules we have in EsqlSession.fieldNames need to be adjusted to account for these special characteristics. Because lookup join can add a field of the same name as type coming from an eval before the said lookup join command, we have a special check in fieldNames that forbids the removal of those Aliases.
Your proposed solution does this:
- places the input references of grok in a new variable, instead of the
referenceBuilderthat was used before - the special check above for lookup join is not applied to
referenceBuilderforgrokbecause the input reference is not there anymore - you add to
referenceBuilderthe content of the grok-special variable
This is a bypass of the lookup join special removal check.
Instead, let the grok references in referenceBuilder where they are and don't let the special lookup join check remove it. The issue is with the lookup join check, not with grok, so adjust that one. I have the feeling that check is applicable in other cases, only that we didn't catch those yet.
|
@astefan Thank you for your review and for taking the time to provide such a detailed explanation of the issue's root cause. Your insights have been incredibly helpful, and I truly appreciate your guidance on my code, it has given me a much deeper understanding. I’ve learned a lot from your feedback! I have reverted the bypass and implemented a modification that I believe addresses the issue. If there are any areas that need improvement or further adjustments, please let me know, I’d be happy to refine it further. |
|
@kanoshiou I have a simpler suggestion for the change in EsqlSession. Instead of the original code in we can do: It will slightly change some of the existent tests in Please, try it out and let me know your thoughts. |
...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java
Show resolved
Hide resolved
...in/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java
Show resolved
Hide resolved
|
buildkite test this |
|
Thank you for your review and suggestions, @astefan! I believe the statement |
Indeed, I forgot about this part. UnresolvedRelation needs to also be accounted for in
Yes
Yes, it's a fine balance. Being too aggressive in minimizing the list of fields could lead to some of the issues we've had recently. Having more fields in that list, but still avoiding to ask for all the fields ( |
…moval # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Outdated
Show resolved
Hide resolved
|
buildkite test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…moval # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/session/IndexResolverFieldNamesTests.java
|
@kanoshiou please, look into the conflicts. Most likely, they are coming from the freshly merged PR. |
|
@astefan conflicts resolved. |
|
buildkite test this |
|
Thank you @astefan! |
--------- Co-authored-by: Andrei Stefan <[email protected]>
--------- Co-authored-by: kanoshiou <[email protected]>
* ESQL: Avoid unintended attribute removal (#127563) --------- Co-authored-by: Andrei Stefan <[email protected]> * Checkstyle * Checkstyle again * Slightly change the test because 8.19 has fewer indices in the index pattern used (9.x also has host_inventory index). --------- Co-authored-by: kanoshiou <[email protected]>
Because we aim to minimize the number of attributes required from
field_capsat pre-analysis time, a removal check helps eliminate fields defined by users (such aseval). However, commands likelookup joinandenrichcan introduce fields with the same names as user-defined fields. To accommodate this, we ensure that aliases are not removed from the attributes list when a subsequentJOINoccurs.For the query:
The plan
grok type "%{WORD:b}passes the removal check and initiates alias removal down the tree. However, the removal process should halt atlookup joindue to the reason mentioned above. The issue arises because the original code allows progression toeval type = 1, which inadvertently removes thetypefield. As a result, the fieldtypeends up being of typeintegerdefined byeval type = 1, rather than preserving thekeywordtype fieldtypeobtained frommessage_types_lookup.Closes #127468