Skip to content
Closed
6 changes: 6 additions & 0 deletions docs/changelog/125462.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 125462
summary: Assign new id to alias created by `ReplaceMissingFieldWithNull` when there
is lookup join
area: ES|QL
type: bug
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -986,6 +986,24 @@ public void testDoubleParamsWithLookupJoin() throws IOException {
);
}

public void testMultipleBatchesWithLookupJoin() throws IOException {
// create 20 indices to trigger multiple batches of data node planning and execution
for (int i = 1; i <= 20; i++) {
createIndex("idx" + i, false);
}
bulkLoadTestDataLookupMode(10);
var query = requestObjectBuilder().query(format(null, "from * | lookup join {} on integer | sort integer", testIndexName()));
Map<String, Object> result = runEsql(query);
var columns = as(result.get("columns"), List.class);
assertEquals(20, columns.size());
var values = as(result.get("values"), List.class);
assertEquals(10, values.size());
// clean up
for (int i = 1; i <= 20; i++) {
assertThat(deleteIndex("idx" + i).isAcknowledged(), is(true));
}
}

private void validateResultsOfDoubleParametersForIdentifiers(RequestObjectBuilder query) throws IOException {
Map<String, Object> result = runEsql(query);
Map<String, String> colA = Map.of("name", "boolean", "type", "boolean");
Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following, and it gives "no child with doc id found" in InsertFieldExtraction:

from *
| rename city.country.continent.planet.name as message
| lookup join message_types_lookup on message
| keep birth_date, language_code
| mv_expand birth_date
| sort language_code, birth_date
| limit 1
;

That's the diff from ReplaceMissingFieldWithNull:

Project[[birth_date{r}#868, language_code{f}#859]]                                                                       = Project[[birth_date{r}#868, language_code{f}#859]]
\_Limit[1[INTEGER],false]                                                                                                = \_Limit[1[INTEGER],false]
  \_OrderBy[[Order[language_code{f}#859,ASC,LAST], Order[birth_date{r}#868,ASC,LAST]]]                                   =   \_OrderBy[[Order[language_code{f}#859,ASC,LAST], Order[birth_date{r}#868,ASC,LAST]]]
    \_MvExpand[birth_date{f}#865,birth_date{r}#868]                                                                      =     \_MvExpand[birth_date{f}#865,birth_date{r}#868]
      \_EsqlProject[[birth_date{f}#865, language_code{f}#859]]                                                           !       \_Project[[birth_date{r}#881, language_code{f}#859]]
        \_Join[LEFT,[message{r}#851],[message{r}#851],[message{f}#867]]                                                  !         \_Eval[[null[DATETIME] AS birth_date]]
          |_EsqlProject[[birth_date{f}#865, city.country.continent.planet.name{f}#864 AS message, language_code{f}#859]] !           \_Join[LEFT,[message{r}#851],[message{r}#851],[message{f}#867]]
          | \_EsRelation[*][birth_date{f}#865, city.country.continent.planet.na..]                                       !             |_Project[[birth_date{r}#865, city.country.continent.planet.name{f}#864 AS message, language_code{f}#859]]
          \_EsRelation[message_types_lookup][LOOKUP][message{f}#867]                                                     !             | \_Eval[[null[DATETIME] AS birth_date]]
                                                                                                                         !             |   \_EsRelation[*][birth_date{f}#865, city.country.continent.planet.na..]
                                                                                                                         !             \_EsRelation[message_types_lookup][LOOKUP][message{f}#867]

Notice how MvExpand now references a missing attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing @alex-spies!

Today, ReplaceMissingFieldWithNull only replace missing fields in Project, Eval, Filter, OrderBy, RegexExtract and TopN, it doesn't replace missing fields in the other commands like MvExpand. One option I can think of is whether covering more commands in this rule can solve this problem, I'll give it a try, if birth_date can be identified as a missing field for MvExpand, and if replacing it with a null alias works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this issue with MvExpand @alex-spies ! I added MvExpand in ReplaceMissingFieldWithNull, so that the missing field(target) referenced by MvExpand is replaced by null, expanded's id is kept unchanged, and also added some tests for MvExpand and Aggregation after lookup joins.

ReplaceMissingFieldWithNull does not replace missing fields in aggregations, however I haven't seen queries failed because of it yet, perhaps the downstream processes aggregations in a smarter way.

Another option to address #121754 is to remove duplicated entries(ChannelSet) in the layout, in case duplicated entries are expected or not avoidable, so that the DefaultLayout.inverse() does not create a null ChannelSet in the list.

Original file line number Diff line number Diff line change
Expand Up @@ -1452,3 +1452,29 @@ emp_no:integer | language_code:integer | language_name:keyword
10092 | 1 | English
10093 | 3 | Spanish
;

multipleBatches
required_capability: join_lookup_v12
required_capability: remove_redundant_sort

from addresses, client*, languages*, message_types*, employees*, k8s
| dissect street "%{height_range} %{MNyXV}"
| rename env AS kxpCK, pod AS etUHW, language_code AS city.country.continent.planet.galaxy
| rename city.country.continent.planet.name as message
| lookup join message_types_lookup on message
| rename languages.int as language_code
| lookup join languages_lookup on language_code
| rename language_code as language_code
| lookup join languages_lookup on language_code
| stats c = count(*) by language_code
| sort language_code
| limit 5
;

c:long | language_code:long
15 | 1
19 | 2
17 | 3
18 | 4
21 | 5
;
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,9 @@ else if (plan instanceof Project project) {
Alias nullAlias = nullLiteral.get(f.dataType());
// save the first field as null (per datatype)
if (nullAlias == null) {
Alias alias = new Alias(f.source(), f.name(), Literal.of(f, null), f.id());
Alias alias = joinAttributes.isEmpty()
? new Alias(f.source(), f.name(), Literal.of(f, null), f.id())
: new Alias(f.source(), f.name(), Literal.of(f, null));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. please add an explanation on why this occurs
  2. use the ternary operator for the id to pick either f.id or null and call the constructor only once
    new Alias(f.source(), f.name(), Literal.of(), joinAttributes.isEmpty() ? f.id(): null))

As a follow-up the fact that the rule has to be aware of the join is fishy.

nullLiteral.put(dt, alias);
projection = alias.toAttribute();
}
Expand Down
Loading