-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add enhancement for MV_APPEND supports 2-n number of arguments #115948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Documentation preview: |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jackpan123 and thanks a lot for your contribution.
I believe making MV_APPEND
work on n
arguments will be very nice in terms of usability.
If you don't mind, I'd like to iterate on this PR a little. I had a first look and a couple of things stood out to me. I'll have another look in the New Year.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec
Show resolved
Hide resolved
...lugin/esql/compute/gen/src/main/java/org/elasticsearch/compute/gen/EvaluatorImplementer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/compute/gen/src/main/java/org/elasticsearch/compute/gen/Types.java
Outdated
Show resolved
Hide resolved
break; | ||
} | ||
} | ||
if (count1 == 0 || field2AllCountZero) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct. Shouldn't that be count1 == 0 && field2AllCountZero
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two sets of data. If the total count
of one side is 0
, we only need to keep the other side without performing any data operations, which is why appendNull()
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you are right. Your change is consistent with MV_APPEND's current behavior: when either the left or right field is null, we just return null.
But this behavior itself may be a bug. I opened #121286.
...asticsearch/xpack/esql/expression/function/scalar/multivalue/MvAppendSerializationTests.java
Outdated
Show resolved
Hide resolved
...elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvAppendBooleanEvaluator.java
Show resolved
Hide resolved
...elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvAppendBooleanEvaluator.java
Outdated
Show resolved
Hide resolved
...elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvAppendBooleanEvaluator.java
Outdated
Show resolved
Hide resolved
# Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/union_types.csv-spec # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvAppendTests.java
cc @alex-spies |
Thank you for your patience @jackpan123 and sorry for not getting back to you earlier. I plan to take another look next week, or re-assign this to someone else if I don't get a chance to. Is that okay with you? |
@alex-spies That works for me. Thanks for letting me know and looking forward to it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jackpan123 , this looks a lot closer already. Summary of my review:
- There's a compilation failure in
MvAppendErrorTests
. (These were added quite recently.) And theMvAppendErrorTests
likely need to be updated to consider 3+ arguments. - I think the type resolution in
MvAppend.java
does not cover everything. Indeed,AnalyzerTests.testMvAppendValidation
is failing. In addition to fixing the failure, I believe we need to expand that validation test to cover cases with 3+ arguments. - I noticed the serialization tests don't do exactly what they should (not on you, that was wrong before). I made a remark on how to fix this, which should be hopefully easy.
- There are many test failures if you run the
MvAppendTests
. Also we never test cases with 3 or more arguments.
Finally, this change will require both a new capability and a bump in the transport versions. When this PR is close to being done, I think it's best that I or someone from our team takes care of this as this stuff can be a bit nuanced and tricky to get right. It's not a big problem, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also add a test case with 4 or more arguments, so we avoid testing a code path that somehow specializes on 3 args.
A good place to do that may be string.csv-spec
as that has a couple more usages of MV_APPEND
.
private MvAppend(StreamInput in) throws IOException { | ||
this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class), in.readNamedWriteable(Expression.class)); | ||
this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class), | ||
in.readNamedWriteableCollectionAsList(Expression.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're changing how MvAppend
is being serialized. This requires a TransportVersion bump, so we don't break bwc with older nodes. When we notice that we're sending to an older node, we probably want to turn a single MV_APPEND(field1, field2, field3, ...)
into multiple nested MV_APPEND
s, like MV_APPEND(field1, MV_APPEND(field2, ...))
, although that's not strictly speaking required.
@jackpan123 , let us help you with this step as this can be a bit tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll require a new EsqlCapabilities
entry, and any new/changed csv-spec
tests will have to require this capability. Otherwise, bwc tests against older nodes will fail.
@jackpan123 , that's also something we can gladly help with.
|
||
private final Expression field1, field2; | ||
private final Expression field1; | ||
private final List<? extends Expression> field2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the name field2
is maybe a bit misleading. Similarly to Concat.java
, I prefer we call this rest
.
dataType = field2.dataType().noText(); | ||
return isType(field2, DataType::isRepresentable, sourceText(), SECOND, "representable"); | ||
for (Expression value : field2) { | ||
dataType = value.dataType().noText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the placement of the condition if (dataType == DataType.NULL)
is wrong. It should probably be here.
The logic should be: until we encounter a field with non-null data type, we don't know the output data type.
The current implementation is: we only check the datatypes of the other args if the first is null. Then we say that the output datatype is the same as the type of the last input field - this can't be right.
We need to fix this and double check that our tests can catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jackpan123 , the changes to EvaluatorImplementer
look quite nice now.
I think @nik9000 should have a look at this specifically, because we do something new here: I think it's the first time we have an evaluator built from a process
method that takes an arbitrary number of blocks as argument. (@nik9000 , see the MvAppend.process
implementations for reference.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 , I just noticed: the way that the test cases are built, we don't ever build full blocks and embed the scalar values inside them and run the evaluations on the full blocks - or do we?
Because in cases like MvAppend
, it'd be necessary to run the evaluators on non-trivial blocks to ensure that they are correct in all cases - just running on scalars could accidentally omit code paths where we e.g. accidentally skip a position, or maybe accidentally use values from the next position in the block etc.
})); | ||
} | ||
} | ||
suppliers.add(new TestCaseSupplier(List.of(DataType.KEYWORD, DataType.KEYWORD), () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all three test cases here were already added in the for
loops above. Why are we duplicating them here @jackpan123 ?
new TestCaseSupplier.TypedData(field2, DataType.TEXT, "field2") | ||
), | ||
"MvAppendBytesRefEvaluator[field1=Attribute[channel=0], field2=[Attribute[channel=1]]]", | ||
DataType.TEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. The output data type should be keyword
in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackpan123 the test cases here only cover situations when MV_APPEND
is supplied 2 arguments. Please add randomized test cases with 3 and more arguments. You can take a look at ConcatTests.java
and follow the pattern there - except that we'll need to consider different data types as well.
It looks like this PR modifies one or more |
# Conflicts: # x-pack/plugin/esql/compute/gen/src/main/java/org/elasticsearch/compute/gen/Types.java
# Conflicts: # docs/reference/esql/functions/description/mv_append.asciidoc
Add enhancement for MV_APPEND supports 2-n number of arguments (Closes #114436 )