Skip to content

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Oct 9, 2025

Not being able to push down vector similarity functions as scripts to Lucene harms performance, as vector fields have to be deserialized in order for similarities to be calculated on the compute engine.

This PoC pushes down TopN expressions to Lucene, translating them to sort scripts. We're using sort scripts, as:

  • Lucene does not allow for negative scoring values. Negative values can happen when pushing down unnormalized similarity functions, or when they are part of an expression.
  • This allows for a more general solution that doesn't involve scoring, like sorting by length(text_field).

Expressions define two new methods:

  • getPushableOptions: Returns whether the expression is pushable to Lucene:
    • PREFERRED: We should push whenever possible. An expression that has a PREFERRED expression and does not have a NOT_SUPPORTED one can be pushed down to Lucene.
    • SUPPORTED: Expression can be pushed down but does not offer a significant advantage in terms of performance, or depends on other expressions that are part of it.
    • NOT_SUPPORTED: Expression can't be pushed down.
  • asScript: Returns a String with the Painless script to use as an expression translation into Painless.

The PushTopNToSource optimization rule is modified to add a script sort to the EsQueryExec in case a pushable expression is found for a Top N.

The LuceneTopNSourceOperator retrieves the sort values for the script sorts, so they are usable on the compute engine as the result of the evaluation.

Copy link
Contributor

github-actions bot commented Oct 9, 2025

🔍 Preview links for changed docs

Copy link
Contributor

github-actions bot commented Oct 9, 2025

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@carlosdelest carlosdelest changed the title ESQL - Push down Top N expressions to Lucene [PoC] ESQL - Push down Top N expressions to Lucene Oct 9, 2025
@carlosdelest
Copy link
Member Author

Closing, as we will use BlockLoaders for this, similar to #103636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants