Skip to content

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 21, 2025

Creates a ROUND_TO function that rounds it's input to one of the provided values. Like so:

ROUND_TO(v, 0, 5000, 10000, 20000, 40000, 100000)

   v   | ROUND_TO
     0 | 0
   100 | 0
  6000 | 5000
 45001 | 40000
999999 | 100000

For some sequences of numbers you could do this with the / operator - but for arbitrary sequences of numbers you needed CASE which is quite slow. And hard to read!

Rewriting the example above would look like:

CASE (
  v <   5000,     0,
  v <  10000,  5000,
  v <  20000, 10000,
  v <  40000, 20000,
  v < 100000, 40000,
  100000
)

Even better, this is fast:

        (operation)  Mode  Cnt    Score   Error  Units
round_to_4_via_case  avgt    7  138.124 ± 0.738  ns/op
         round_to_4  avgt    7    0.805 ± 0.011  ns/op
         round_to_3  avgt    7    0.739 ± 0.011  ns/op
         round_to_2  avgt    7    0.651 ± 0.009  ns/op
         date_trunc  avgt    7    2.425 ± 0.018  ns/op

I've included a comparison to DATE_TRUNC above because we should be able to rewrite DATE_TRUNC into ROUND_TO when we know the date range of the index. This doesn't do it now, but it should be possible.

Creates a `ROUND_TO` function that rounds it's input to one of the
provided values. Like so:
```
ROUND_TO(v, 0, 5000, 10000, 20000, 40000, 100000)

   v   | ROUND_TO
     0 | 0
   100 | 0
  6000 | 5000
 45001 | 40000
999999 | 100000
```

For some sequences of numbers you could do this with the `/` operator -
but for arbitrary sequences of numbers you needed `CASE` which is quite
slow. And hard to read!

Rewriting the example above would look like:
```
CASE (
  v <   5000,     0,
  v <  10000,  5000,
  v <  20000, 10000,
  v <  40000, 20000,
  v < 100000, 40000,
  100000
)
```

Even better, this is *fast*:
```
        (operation)  Mode  Cnt    Score   Error  Units
round_to_4_via_case  avgt    7  138.124 ± 0.738  ns/op
         round_to_4  avgt    7    0.805 ± 0.011  ns/op
         round_to_3  avgt    7    0.739 ± 0.011  ns/op
         round_to_2  avgt    7    0.651 ± 0.009  ns/op
         date_trunc  avgt    7    2.425 ± 0.018  ns/op
```

I've included a comparison to `DATE_TRUNC` above because we should be
able to rewrite `DATE_TRUNC` into `ROUND_TO` when we know the date range
of the index. This doesn't do it now, but it should be possible.
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

Copy link
Member Author

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is fun: I realized that I was linearly searching a list and I ask binary search questions in interviews all the time. Gives us an excuse to talk about O with a topic everyone understand. You know. Super obvious.

Well, any guess how big the array has to be before binary search gets faster?

Anyway, no we have a linear search and a binary search.

@nik9000
Copy link
Member Author

nik9000 commented May 22, 2025

FWIW, I suspect https://www.felixcloutier.com/x86/pcmpgtq but have no decompiled.

@nik9000
Copy link
Member Author

nik9000 commented May 22, 2025

Also, the magic isn't entirely building this fast function - it's for when we can merge it into lucene's structures like doc-value-skipper.

@nik9000 nik9000 added the auto-backport Automatically create backport pull requests when merged label May 23, 2025
@nik9000 nik9000 merged commit 45bfaab into elastic:main May 23, 2025
18 checks passed
@nik9000
Copy link
Member Author

nik9000 commented May 23, 2025

Thanks @idegtiarenko !

@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 128278

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 23, 2025
Creates a `ROUND_TO` function that rounds it's input to one of the
provided values. Like so:
```
ROUND_TO(v, 0, 5000, 10000, 20000, 40000, 100000)

   v   | ROUND_TO
     0 | 0
   100 | 0
  6000 | 5000
 45001 | 40000
999999 | 100000
```

For some sequences of numbers you could do this with the `/` operator -
but for arbitrary sequences of numbers you needed `CASE` which is quite
slow. And hard to read!

Rewriting the example above would look like:
```
CASE (
  v <   5000,     0,
  v <  10000,  5000,
  v <  20000, 10000,
  v <  40000, 20000,
  v < 100000, 40000,
  100000
)
```

Even better, this is *fast*:
```
        (operation)  Mode  Cnt    Score   Error  Units
round_to_4_via_case  avgt    7  138.124 ± 0.738  ns/op
         round_to_4  avgt    7    0.805 ± 0.011  ns/op
         round_to_3  avgt    7    0.739 ± 0.011  ns/op
         round_to_2  avgt    7    0.651 ± 0.009  ns/op
         date_trunc  avgt    7    2.425 ± 0.018  ns/op
```

I've included a comparison to `DATE_TRUNC` above because we should be
able to rewrite `DATE_TRUNC` into `ROUND_TO` when we know the date range
of the index. This doesn't do it now, but it should be possible.
@nik9000
Copy link
Member Author

nik9000 commented May 23, 2025

Backport is #128397

elasticsearchmachine pushed a commit that referenced this pull request May 23, 2025
Creates a `ROUND_TO` function that rounds it's input to one of the
provided values. Like so:

```
ROUND_TO(v, 0, 5000, 10000, 20000, 40000, 100000)

   v   | ROUND_TO
     0 | 0
   100 | 0
  6000 | 5000
 45001 | 40000
999999 | 100000
```

For some sequences of numbers you could do this with the `/` operator -
but for arbitrary sequences of numbers you needed `CASE` which is quite
slow. And hard to read!

Rewriting the example above would look like:

```
CASE (
  v <   5000,     0,
  v <  10000,  5000,
  v <  20000, 10000,
  v <  40000, 20000,
  v < 100000, 40000,
  100000
)
```

Even better, this is *fast*:

```
        (operation)  Mode  Cnt    Score   Error  Units
round_to_4_via_case  avgt    7  138.124 ± 0.738  ns/op
         round_to_4  avgt    7    0.805 ± 0.011  ns/op
         round_to_3  avgt    7    0.739 ± 0.011  ns/op
         round_to_2  avgt    7    0.651 ± 0.009  ns/op
         date_trunc  avgt    7    2.425 ± 0.018  ns/op
```

I've included a comparison to `DATE_TRUNC` above because we should be
able to rewrite `DATE_TRUNC` into `ROUND_TO` when we know the date range
of the index. This doesn't do it now, but it should be possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants