Skip to content

Conversation

jackpan123
Copy link
Contributor

@jackpan123 jackpan123 commented Oct 30, 2024

Add enhancement for MV_APPEND supports 2-n number of arguments (Closes #114436 )

Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 30, 2024
@gbanasiak gbanasiak added the :Analytics/ES|QL AKA ESQL label Nov 22, 2024
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Nov 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies alex-spies self-assigned this Nov 22, 2024
@astefan astefan added the ES|QL-ui Impacts ES|QL UI label Nov 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jackpan123 and thanks a lot for your contribution.

I believe making MV_APPEND work on n arguments will be very nice in terms of usability.

If you don't mind, I'd like to iterate on this PR a little. I had a first look and a couple of things stood out to me. I'll have another look in the New Year.

break;
}
}
if (count1 == 0 || field2AllCountZero) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct. Shouldn't that be count1 == 0 && field2AllCountZero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two sets of data. If the total count of one side is 0, we only need to keep the other side without performing any data operations, which is why appendNull() is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are right. Your change is consistent with MV_APPEND's current behavior: when either the left or right field is null, we just return null.

But this behavior itself may be a bug. I opened #121286.

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvAppendTests.java
@jackpan123 jackpan123 requested a review from alex-spies January 22, 2025 03:59
@jackpan123
Copy link
Contributor Author

cc @alex-spies

@alex-spies
Copy link
Contributor

alex-spies commented Jan 24, 2025

Thank you for your patience @jackpan123 and sorry for not getting back to you earlier.

I plan to take another look next week, or re-assign this to someone else if I don't get a chance to. Is that okay with you?

@jackpan123
Copy link
Contributor Author

@alex-spies That works for me. Thanks for letting me know and looking forward to it!

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jackpan123 , this looks a lot closer already. Summary of my review:

  • There's a compilation failure in MvAppendErrorTests. (These were added quite recently.) And the MvAppendErrorTests likely need to be updated to consider 3+ arguments.
  • I think the type resolution in MvAppend.java does not cover everything. Indeed, AnalyzerTests.testMvAppendValidation is failing. In addition to fixing the failure, I believe we need to expand that validation test to cover cases with 3+ arguments.
  • I noticed the serialization tests don't do exactly what they should (not on you, that was wrong before). I made a remark on how to fix this, which should be hopefully easy.
  • There are many test failures if you run the MvAppendTests. Also we never test cases with 3 or more arguments.

Finally, this change will require both a new capability and a bump in the transport versions. When this PR is close to being done, I think it's best that I or someone from our team takes care of this as this stuff can be a bit nuanced and tricky to get right. It's not a big problem, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add a test case with 4 or more arguments, so we avoid testing a code path that somehow specializes on 3 args.

A good place to do that may be string.csv-spec as that has a couple more usages of MV_APPEND.

private MvAppend(StreamInput in) throws IOException {
this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class), in.readNamedWriteable(Expression.class));
this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class),
in.readNamedWriteableCollectionAsList(Expression.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're changing how MvAppend is being serialized. This requires a TransportVersion bump, so we don't break bwc with older nodes. When we notice that we're sending to an older node, we probably want to turn a single MV_APPEND(field1, field2, field3, ...) into multiple nested MV_APPENDs, like MV_APPEND(field1, MV_APPEND(field2, ...)), although that's not strictly speaking required.

@jackpan123 , let us help you with this step as this can be a bit tricky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll require a new EsqlCapabilities entry, and any new/changed csv-spec tests will have to require this capability. Otherwise, bwc tests against older nodes will fail.

@jackpan123 , that's also something we can gladly help with.


private final Expression field1, field2;
private final Expression field1;
private final List<? extends Expression> field2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name field2 is maybe a bit misleading. Similarly to Concat.java, I prefer we call this rest.

dataType = field2.dataType().noText();
return isType(field2, DataType::isRepresentable, sourceText(), SECOND, "representable");
for (Expression value : field2) {
dataType = value.dataType().noText();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the placement of the condition if (dataType == DataType.NULL) is wrong. It should probably be here.

The logic should be: until we encounter a field with non-null data type, we don't know the output data type.

The current implementation is: we only check the datatypes of the other args if the first is null. Then we say that the output datatype is the same as the type of the last input field - this can't be right.

We need to fix this and double check that our tests can catch this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jackpan123 , the changes to EvaluatorImplementer look quite nice now.

I think @nik9000 should have a look at this specifically, because we do something new here: I think it's the first time we have an evaluator built from a process method that takes an arbitrary number of blocks as argument. (@nik9000 , see the MvAppend.process implementations for reference.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 , I just noticed: the way that the test cases are built, we don't ever build full blocks and embed the scalar values inside them and run the evaluations on the full blocks - or do we?

Because in cases like MvAppend, it'd be necessary to run the evaluators on non-trivial blocks to ensure that they are correct in all cases - just running on scalars could accidentally omit code paths where we e.g. accidentally skip a position, or maybe accidentally use values from the next position in the block etc.

}));
}
}
suppliers.add(new TestCaseSupplier(List.of(DataType.KEYWORD, DataType.KEYWORD), () -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all three test cases here were already added in the for loops above. Why are we duplicating them here @jackpan123 ?

new TestCaseSupplier.TypedData(field2, DataType.TEXT, "field2")
),
"MvAppendBytesRefEvaluator[field1=Attribute[channel=0], field2=[Attribute[channel=1]]]",
DataType.TEXT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. The output data type should be keyword in all cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackpan123 the test cases here only cover situations when MV_APPEND is supplied 2 arguments. Please add randomized test cases with 3 and more arguments. You can take a look at ConcatTests.java and follow the pattern there - except that we'll need to consider different data types as well.

Copy link
Contributor

github-actions bot commented Feb 5, 2025

It looks like this PR modifies one or more .asciidoc files. These files are being migrated to Markdown, and any changes merged now will be lost. See the migration guide for details.

# Conflicts:
#	x-pack/plugin/esql/compute/gen/src/main/java/org/elasticsearch/compute/gen/Types.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ES|QL] Support 2-n arguments for MV_APPEND

5 participants