- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
[ES|QL] Assign new id to alias created by ReplaceMissingFieldWithNull when there is lookup join #125462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] Assign new id to alias created by ReplaceMissingFieldWithNull when there is lookup join #125462
Changes from 8 commits
13b35e6
              45553bb
              4f5fcc6
              95a461f
              f705ed4
              2a21697
              e6a055b
              3723510
              25dc590
              92985fa
              a825a5b
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| pr: 125462 | ||
| summary: Assign new id to alias created by `ReplaceMissingFieldWithNull` when there | ||
| is lookup join | ||
| area: ES|QL | ||
| type: bug | ||
| issues: [] | 
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -81,7 +81,11 @@ else if (plan instanceof Project project) { | |
| Alias nullAlias = nullLiteral.get(f.dataType()); | ||
| // save the first field as null (per datatype) | ||
| if (nullAlias == null) { | ||
| Alias alias = new Alias(f.source(), f.name(), Literal.of(f, null), f.id()); | ||
| // In case of batch executions on data nodes and join exists, SearchStats may not always be available for all, | ||
| // fields, creating a new alias for null with the same id as the field id can potentially cause planEval to add a | ||
| // duplicated ChannelSet to a layout, and Layout.builder().build() could throw a NullPointerException. | ||
| // As a workaround, assign a new alias id to the null alias when join exists and SearchStats is not available. | ||
| Alias alias = new Alias(f.source(), f.name(), Literal.of(f, null), joinAttributes.isEmpty() ? f.id() : null); | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot @fang-xing-esql for getting to the bottom of this! I think this approach unfortunately still has problems. There could be downstream commands that still reference the old name id, but now the attribute with the old id is gone for good (projected away). For instance, there might be another  I guess there are 2 approaches of eliminating missing fields: either, define a new attribute pointing to a null literal with an own id - and update all downstream references to point to the new attribute; OR update the attribute in place by defining a null literal with the same name id + make sure we never extract the original attribute in the first place; the latter is sketchy because, as we can see here, it can lead to re-defining the same name id multiple times. Now, I think a part of the correct long-term solution is that we avoid the lookup join when the left hand side join key is missing. But that's not sufficient: I can see that we're introducing multiple Projects for the same missing attribute even if it is not a join key, like so (taken from your csv test): The problem is that there are 2 Projects for  Therefore, I think the second, more crucial, part of a correct solution is that we avoid replacing attributes by other attributes with the same name id, at all. Instead, I think we should update  | ||
| nullLiteral.put(dt, alias); | ||
| projection = alias.toAttribute(); | ||
| } | ||
|  | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the following, and it gives "no child with doc id found" in
InsertFieldExtraction:That's the diff from
ReplaceMissingFieldWithNull:Notice how
MvExpandnow references a missing attribute.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing @alex-spies!
Today,
ReplaceMissingFieldWithNullonly replace missing fields inProject,Eval,Filter,OrderBy,RegexExtractandTopN, it doesn't replace missing fields in the other commands likeMvExpand. One option I can think of is whether covering more commands in this rule can solve this problem, I'll give it a try, if birth_date can be identified as a missing field forMvExpand, and if replacing it with a null alias works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching this issue with
MvExpand@alex-spies ! I addedMvExpandinReplaceMissingFieldWithNull, so that the missing field(target) referenced byMvExpandis replaced by null, expanded's id is kept unchanged, and also added some tests forMvExpandandAggregationafter lookup joins.ReplaceMissingFieldWithNulldoes not replace missing fields in aggregations, however I haven't seen queries failed because of it yet, perhaps the downstream processes aggregations in a smarter way.Another option to address #121754 is to remove duplicated entries(
ChannelSet) in the layout, in case duplicated entries are expected or not avoidable, so that theDefaultLayout.inverse()does not create a nullChannelSetin the list.