Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Oct 27, 2025

Creates a BlockLoader for pushing the LENGTH function down into the loader for keyword fields. It takes advantage of the terms dictionary so we only need to calculate the code point count once per unique term loaded.

This BlockLoader implementation isn't plugged into the infrastructure for emitting it because we're waiting on the infrastructure we've started in #137002. We'll make a follow up PR to plug this in.

We're doing this mostly to demonstrate another function that we can push into field loading, in addition to the vector similarity functions we're building in #137002. We don't expect LENGTH to be a super hot function. If it happens to be then this'll help.

Before we plug this in we'll have to figure out emitting warnings from functions that we've fused to field loading. Because LENGTH can emit a warning, specifically when it hits a multivalued field.

Creates a `BlockLoader` for pushing the `LENGTH` function down into the
loader for `keyword` fields. It takes advantage of the terms dictionary
so we only need to calculate the code point count once per unique term
loaded.

This `BlockLoader` implementation isn't plugged into the infrastructure
for emitting it because we're waiting on the infrastructure we've
started in elastic#137002. We'll make a follow up PR to plug this in.

We're doing this mostly to demonstrate another function that we can push
into field loading, in addition to the vector similarity functions we're
building in elastic#137002. We don't expect `LENGTH` to be a super hot
function. If it happens to be then this'll help.

Before we plug this in we'll have to figure out emitting warnings from
functions that we've fused to field loading. Because `LENGTH` can emit a
warning, specifically when it hits a multivalued field.
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000 nik9000 requested a review from dnhatn October 28, 2025 15:00
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases with large indices and many ordinals - such as an index with 10M documents and 10K ordinals - it might be more efficient to look up ordinals in order. However, this isn't a big concern. This looks great. Thank you, Nik!

@nik9000
Copy link
Member Author

nik9000 commented Oct 28, 2025

In cases with large indices and many ordinals - such as an index with 10M documents and 10K ordinals - it might be more efficient to look up ordinals in order. However, this isn't a big concern. This looks great. Thank you, Nik!

I think what you are saying is "this is fine, but the Immediate version will be slow because it's random, instead do the same read ords/sort/read+count dance but this time without a cache". That'd be faster, yeah.

@nik9000 nik9000 merged commit 84014cd into elastic:main Oct 28, 2025
34 checks passed
chrisparrinello pushed a commit to chrisparrinello/elasticsearch that referenced this pull request Nov 3, 2025
Creates a `BlockLoader` for pushing the `LENGTH` function down into the
loader for `keyword` fields. It takes advantage of the terms dictionary
so we only need to calculate the code point count once per unique term
loaded.

This `BlockLoader` implementation isn't plugged into the infrastructure
for emitting it because we're waiting on the infrastructure we've
started in elastic#137002. We'll make a follow up PR to plug this in.

We're doing this mostly to demonstrate another function that we can push
into field loading, in addition to the vector similarity functions we're
building in elastic#137002. We don't expect `LENGTH` to be a super hot
function. If it happens to be then this'll help.

Before we plug this in we'll have to figure out emitting warnings from
functions that we've fused to field loading. Because `LENGTH` can emit a
warning, specifically when it hits a multivalued field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants