Skip to content

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Aug 28, 2024

This speeds up the CASE function when it has two or three arguments and both of the arguments are constants or fields. This works because CASE is lazy so it can avoid warnings in cases like

CASE(foo != 0, 2 / foo, 1)

And, in the case where the function is very slow, it can avoid the computations.

But if the lhs and rhs of the CASE are constant then there isn't any work to avoid.

The performance improvment is pretty substantial:

 (operation)  Before   Error   After    Error  Units
 case_1_lazy  97.422 ± 1.048  101.571 ± 0.737  ns/op
case_1_eager  79.312 ± 1.190    4.601 ± 0.049  ns/op

The top line is a CASE that has to be lazy - it shouldn't change. The 4 nanos change here is noise. The eager version improves by about 94%.

This speeds up the `CASE` function when it has two or three arguments
and both of the arguments are constants or fields. This works because
`CASE` is lazy so it can avoid warnings in cases like
```
CASE(foo != 0, 2 / foo, 1)
```

And, in the case where the function is *very* slow, it can avoid the
computations.

But if the lhs  and rhs of the `CASE` are constant then there isn't any
work to avoid.

The performance improvment is pretty substantial:
```
 (operation)  Before   Error   After    Error  Units
 case_1_lazy  97.422 ± 1.048  101.571 ± 0.737  ns/op
case_1_eager  79.312 ± 1.190    4.601 ± 0.049  ns/op
```

The top line is a `CASE` that has to be lazy - it shouldn't change. The
4 nanos change here is noise. The eager version improves by about 94%.
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

*/
default boolean safeToEvalInLazy() {
return false;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is something that we could derive from the Expression - but it felt simpler to put it here. And it's just a boolean at this point - though maybe it should be a cost estimate at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of constants, the unvisited branches can be removed and the case simplified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works if the expression is constant, but not if the values are constant. We still have to evaluate in that case. And when we do we can do it the fast way with something like this.

@nik9000
Copy link
Member Author

nik9000 commented Aug 30, 2024

I'm waiting until I merge #112401 to pick this one back up.

&& conditionsFactories.get(0).value.safeToEvalInLazy()
&& elseValueFactory.safeToEvalInLazy()) {

return new EagerEvaluator(context, conditionsFactories.get(0).apply(context), elseValueFactory.get(context));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if the logic is wrong here or if the naming is just confusing. To me, this reads like "if it is safe to evaluate in lazy, return an eager evaluator". I would normally say "eager evaluation" was the opposite of "lazy evaluation".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad name, I think. It's "if this is safe to eagerly evaluate when we expect a lazy"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename this now that the CASE bugs are fixed.

@nik9000 nik9000 marked this pull request as ready for review September 9, 2024 19:24
@nik9000 nik9000 requested a review from a team as a code owner September 9, 2024 19:24
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

* that evaluate quickly and can not produce warnings may override this to
* {@code true} to get a significant speed-up in {@code CASE}-like operations.
*/
default boolean eagerEvalSafeInLazy() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative is to be extract this as a marking interface that gets implemented by certain factories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a lot less readable.

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
The method names are a bit cryptic indeed, but the javadoc helps a lot

if (lhsOrRhs.hadMultivaluedFields()) {
condition.registerMultivalue();
}
for (int p = 0; p < lhs.getPositionCount(); p++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could leverage BooleanVector.isConstant() here, eg.

                BooleanVector mask = lhsOrRhs.mask();
                if (mask.isConstant()) {
                    builder.copyFrom(mask.getBoolean(0) ? lhs : rhs, 0, lhs.getPositionCount());
                } else {
                    for (int p = 0; p < lhs.getPositionCount(); p++) {
                    ...

Or even return lhs/rhs...? Not sure it's safe for memory accounting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is safe, but tests can figure it out. I'll try it.

@nik9000 nik9000 merged commit 5c91edd into elastic:main Sep 24, 2024
15 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 24, 2024
This speeds up the `CASE` function when it has two or three arguments
and both of the arguments are constants or fields. This works because
`CASE` is lazy so it can avoid warnings in cases like
```
CASE(foo != 0, 2 / foo, 1)
```

And, in the case where the function is *very* slow, it can avoid the
computations.

But if the lhs  and rhs of the `CASE` are constant then there isn't any
work to avoid.

The performance improvment is pretty substantial:
```
 (operation)  Before   Error   After    Error  Units
 case_1_lazy  97.422 ± 1.048  101.571 ± 0.737  ns/op
case_1_eager  79.312 ± 1.190    4.601 ± 0.049  ns/op
```

The top line is a `CASE` that has to be lazy - it shouldn't change. The
4 nanos change here is noise. The eager version improves by about 94%.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Sep 24, 2024
This speeds up the `CASE` function when it has two or three arguments
and both of the arguments are constants or fields. This works because
`CASE` is lazy so it can avoid warnings in cases like
```
CASE(foo != 0, 2 / foo, 1)
```

And, in the case where the function is *very* slow, it can avoid the
computations.

But if the lhs  and rhs of the `CASE` are constant then there isn't any
work to avoid.

The performance improvment is pretty substantial:
```
 (operation)  Before   Error   After    Error  Units
 case_1_lazy  97.422 ± 1.048  101.571 ± 0.737  ns/op
case_1_eager  79.312 ± 1.190    4.601 ± 0.049  ns/op
```

The top line is a `CASE` that has to be lazy - it shouldn't change. The
4 nanos change here is noise. The eager version improves by about 94%.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 14, 2025
```
                      before              after
     (operation)   Score   Error       Score   Error  Units
 coalesce_2_noop  75.949 ± 3.961  ->   0.010 ±  0.001 ns/op  99.9%
coalesce_2_eager  99.299 ± 6.959  ->   4.292 ±  0.227 ns/op  95.7%
 coalesce_2_lazy 113.118 ± 5.747  ->  26.746 ±  0.954 ns/op  76.4%
```

We tend to advise folks that "COALESCE is faster than CASE", but, as of
8.16.0/elastic#112295 that wasn't the true. I was working with someone a few
days ago to port a scripted_metric aggregation to ESQL and we saw
COALESCE taking ~60% of the time. That won't do.

The trouble is that CASE and COALESCE have to be *lazy*, meaning that
operations like:
```
COALESCE(a, 1 / b)
```
should never emit a warning if `a` is not `null`, even if `b` is `0`. In
8.16/elastic#112295 CASE grew an optimization where it could operate non-lazily
if it was flagged as "safe". This brings a similar optimization to
COALESCE, see it above as "case_2_eager", a 95.7% improvement.

It also brings and arguably more important optimization - entire-block
execution for COALESCE. The schort version is that, if the first
parameter of COALESCE returns no nulls we can return it without doing
anything lazily. There are a few more cases, but the upshot is that
COALESCE is pretyt much *free* in cases where long strings of results
are `null` or not `null`. That's the `coalesce_2_noop` line.

Finally, when there mixed null and non-null values we were using a
single builder with some fairly inefficient paths. This specializes them
per type and skips some slow null-checking where possible. That's the
`coalesce_2_lazy` result, a more modest 76.4%.
nik9000 added a commit that referenced this pull request Jan 23, 2025
```
                      before              after
     (operation)   Score   Error       Score   Error  Units
 coalesce_2_noop  75.949 ± 3.961  ->   0.010 ±  0.001 ns/op  99.9%
coalesce_2_eager  99.299 ± 6.959  ->   4.292 ±  0.227 ns/op  95.7%
 coalesce_2_lazy 113.118 ± 5.747  ->  26.746 ±  0.954 ns/op  76.4%
```

We tend to advise folks that "COALESCE is faster than CASE", but, as of
8.16.0/#112295 that wasn't the true. I was working with someone a few
days ago to port a scripted_metric aggregation to ESQL and we saw
COALESCE taking ~60% of the time. That won't do.

The trouble is that CASE and COALESCE have to be *lazy*, meaning that
operations like:
```
COALESCE(a, 1 / b)
```
should never emit a warning if `a` is not `null`, even if `b` is `0`. In
8.16/#112295 CASE grew an optimization where it could operate non-lazily
if it was flagged as "safe". This brings a similar optimization to
COALESCE, see it above as "case_2_eager", a 95.7% improvement.

It also brings and arguably more important optimization - entire-block
execution for COALESCE. The schort version is that, if the first
parameter of COALESCE returns no nulls we can return it without doing
anything lazily. There are a few more cases, but the upshot is that
COALESCE is pretty much *free* in cases where long strings of results
are `null` or not `null`. That's the `coalesce_2_noop` line.

Finally, when there mixed null and non-null values we were using a
single builder with some fairly inefficient paths. This specializes them
per type and skips some slow null-checking where possible. That's the
`coalesce_2_lazy` result, a more modest 76.4%.

NOTE: These %s of improvements on COALESCE itself, or COALESCE with some load-overhead operators like `+`. If COALESCE isn't taking a *ton* time in your query don't get particularly excited about this. It's fun though.

Closes #119953
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 23, 2025
```
                      before              after
     (operation)   Score   Error       Score   Error  Units
 coalesce_2_noop  75.949 ± 3.961  ->   0.010 ±  0.001 ns/op  99.9%
coalesce_2_eager  99.299 ± 6.959  ->   4.292 ±  0.227 ns/op  95.7%
 coalesce_2_lazy 113.118 ± 5.747  ->  26.746 ±  0.954 ns/op  76.4%
```

We tend to advise folks that "COALESCE is faster than CASE", but, as of
8.16.0/elastic#112295 that wasn't the true. I was working with someone a few
days ago to port a scripted_metric aggregation to ESQL and we saw
COALESCE taking ~60% of the time. That won't do.

The trouble is that CASE and COALESCE have to be *lazy*, meaning that
operations like:
```
COALESCE(a, 1 / b)
```
should never emit a warning if `a` is not `null`, even if `b` is `0`. In
8.16/elastic#112295 CASE grew an optimization where it could operate non-lazily
if it was flagged as "safe". This brings a similar optimization to
COALESCE, see it above as "case_2_eager", a 95.7% improvement.

It also brings and arguably more important optimization - entire-block
execution for COALESCE. The schort version is that, if the first
parameter of COALESCE returns no nulls we can return it without doing
anything lazily. There are a few more cases, but the upshot is that
COALESCE is pretty much *free* in cases where long strings of results
are `null` or not `null`. That's the `coalesce_2_noop` line.

Finally, when there mixed null and non-null values we were using a
single builder with some fairly inefficient paths. This specializes them
per type and skips some slow null-checking where possible. That's the
`coalesce_2_lazy` result, a more modest 76.4%.

NOTE: These %s of improvements on COALESCE itself, or COALESCE with some load-overhead operators like `+`. If COALESCE isn't taking a *ton* time in your query don't get particularly excited about this. It's fun though.

Closes elastic#119953
elasticsearchmachine pushed a commit that referenced this pull request Jan 23, 2025
```
                      before              after
     (operation)   Score   Error       Score   Error  Units
 coalesce_2_noop  75.949 ± 3.961  ->   0.010 ±  0.001 ns/op  99.9%
coalesce_2_eager  99.299 ± 6.959  ->   4.292 ±  0.227 ns/op  95.7%
 coalesce_2_lazy 113.118 ± 5.747  ->  26.746 ±  0.954 ns/op  76.4%
```

We tend to advise folks that "COALESCE is faster than CASE", but, as of
8.16.0/#112295 that wasn't the true. I was working with someone a few
days ago to port a scripted_metric aggregation to ESQL and we saw
COALESCE taking ~60% of the time. That won't do.

The trouble is that CASE and COALESCE have to be *lazy*, meaning that
operations like:
```
COALESCE(a, 1 / b)
```
should never emit a warning if `a` is not `null`, even if `b` is `0`. In
8.16/#112295 CASE grew an optimization where it could operate non-lazily
if it was flagged as "safe". This brings a similar optimization to
COALESCE, see it above as "case_2_eager", a 95.7% improvement.

It also brings and arguably more important optimization - entire-block
execution for COALESCE. The schort version is that, if the first
parameter of COALESCE returns no nulls we can return it without doing
anything lazily. There are a few more cases, but the upshot is that
COALESCE is pretty much *free* in cases where long strings of results
are `null` or not `null`. That's the `coalesce_2_noop` line.

Finally, when there mixed null and non-null values we were using a
single builder with some fairly inefficient paths. This specializes them
per type and skips some slow null-checking where possible. That's the
`coalesce_2_lazy` result, a more modest 76.4%.

NOTE: These %s of improvements on COALESCE itself, or COALESCE with some load-overhead operators like `+`. If COALESCE isn't taking a *ton* time in your query don't get particularly excited about this. It's fun though.

Closes #119953
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants