Skip to content

Conversation

mjmbischoff
Copy link
Contributor

@mjmbischoff mjmbischoff commented Aug 27, 2025

This is a fixup for #133099 which was reverted from main as ESQL: Track memory in evaluators (#133392) got merged to main at the same time. Causing compile errors.

mjmbischoff and others added 30 commits August 19, 2025 01:46
…entation improvements.

Fix issue with byteref being empty, which caused fold to fail.
…ompatibility test environment. - not sure how to test it as, I feel like the version should be main on main / dev. Doing the dance for now.
…lifecycle metadata

Documentation rewording.

Co-authored-by: Liam Thompson <[email protected]>
- Fixing tests by removing logic to return null if all parameters are null. The standard generator had to be circumvented, should follow up with separate PR to make it more intelligent to avoid it.
- Overwritten part of the test methods to avoid the null expectation.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @mjmbischoff, I've created a changelog YAML for you.

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second look on the evaluators, I found what looks like a bug. Added also a comment around how we can test that case to better catch them

Comment on lines +2061 to +2068
ROW a = "a", b = ["a", "b", "c"], n = null
| EVAL aa = mv_contains(a, a),
bb = mv_contains(b, b),
ab = mv_contains(a, b),
ba = mv_contains(b,a),
na = mv_contains(n, a),
an = mv_contains(a, n),
nn = mv_contains(n,n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reproduce the other comment problem with nulls, this won't be enough, as this always works with a single row.

A possible way to generate the data for that could be:

ROW a = [1, 2, 3], n = null
| MV_EXPAND a # Now we have multiple rows
| EVAL a = CASE(a == 2, null, a) # And we add a null to the non-null column
| EVAL
    an = mv_contains(a, n).
    na = mv_contains(n, a)

If it works how I think and this effectively uses the MvContainsNullEvaluator, this should end up with an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really love a ROWS source command that can take multiple ROW's for both examples and test cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unresolving this, as the test for mixed nulls wasn't added yet (Mixed in the same column) 👀
The ROWS or similar syntax would be handy, but it would be a breaking change, and it's not planned. Usually the test indices have enough things to test anyway, or a new index can be added (Not a quick change, I would avoid that here).
For custom index tests, we would then make a YAML test (Example). But rarely for functions, as CSV tests are usually enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 66ba735 Let me know if I missed any combinations.

Copy link
Contributor

@ivancea ivancea Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for adding it! You could add the same CASE(...) for the subset too (blocks with some nulls inside for subsets). Since the custom evaluator was removed, this isn't as important. But could be a nice test to have, specially if we autogenerate the evaluator at a later stage

mjmbischoff and others added 3 commits August 28, 2025 04:32
Refactor null handling in `MvContains` evaluators and add `MvContainsNullSupersetEvaluator` for better type-specific evaluation logic.
@mjmbischoff mjmbischoff requested a review from ivancea August 29, 2025 08:28
final var valueCount = subset.getValueCount(position);
final var startIndex = subset.getFirstValueIndex(position);
for (int valueIndex = startIndex; valueIndex < startIndex + valueCount; valueIndex++) {
var value = valueExtractor.extractValue(subset, valueIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm I didn't see this before, but I think boxing could be a problem performance-wise.

We usually have an evaluator/vector/block per type for 2 reasons:

  • Avoid itable accesses
  • Avoid primitive boxing

Now, how much extra time will this take, I don't know. This MV_CONTAINS function has more overhead than other scalar functions, so maybe it's not that important. But I'm not sure about that.

There are 2 optimizations that would be ideal here:

  1. Having a method per type
  2. Having a specialization for sorted values (See Block#mvOrdering())

The second one can be made later. The first one too, I guess?

It's in preview, so I think it's fine either way. If it's merged as-is, I would create an issue to improve it later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, to do this per-type, whether now or later, this could be a case of using StringTemplates (Example), to avoid repeated code.
These functions could be extracted into their own static classes (Or the same, but for the full evaluators, with the functions directly inside).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, should create a followup issue for this. Also want to follow up on improving the Implementers to avoid having Evaluators here and have them autogenerated. Leaving this unresolved until after merge so I don't forget to open an issue.

}));
}

// Adjusted from static method anyNullIsNull in {@code AbstractScalarFunctionTestCase#}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line describing exactly what changed here? So it's easier later to remove this override if/when we extend the original to handle this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 23d92f0 but want to leave this unresolved as well to follow up - need to look into making the test class more adaptive.

Comment on lines +2061 to +2068
ROW a = "a", b = ["a", "b", "c"], n = null
| EVAL aa = mv_contains(a, a),
bb = mv_contains(b, b),
ab = mv_contains(a, b),
ba = mv_contains(b,a),
na = mv_contains(n, a),
an = mv_contains(a, n),
nn = mv_contains(n,n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unresolving this, as the test for mixed nulls wasn't added yet (Mixed in the same column) 👀
The ROWS or similar syntax would be handy, but it would be a breaking change, and it's not planned. Usually the test indices have enough things to test anyway, or a new index can be added (Not a quick change, I would avoid that here).
For custom index tests, we would then make a YAML test (Example). But rarely for functions, as CSV tests are usually enough

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!
And as a summary, the things to work on later that I remember are:

  • Multivalue evaluator implementer with nulls
  • Performance improvements: avoiding boxing and using inherent sorting when available (Probably to be solved before removing the preview label? Some microbenchmarks would be interesting here too)

@mjmbischoff mjmbischoff merged commit 97abc87 into elastic:main Aug 29, 2025
33 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

There are no branches to backport to. Aborting.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 133636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants