ESQL - KNN function uses LIMIT for setting top k #129353

carlosdelest · 2025-06-12T15:32:00Z

KNN is a different kind of function than other text functions in ESQL - it retrieves the top K results instead of returning all the results dictated by LIMIT.

In order not to return less results than the user expects, and to optimize the underlying KNN search, KNN function will use the LIMIT(s) before it to set the k parameter, in case it is not set already as an option by the user.

FROM test
| WHERE KNN(vector, [0, 1, 2])
| LIMIT 100

will mean that KNN will set k to 100.

A LIMIT that comes later from KNN is taken into account:

FROM test METADATA _score
| WHERE KNN(vector, [0, 1, 2])
| EVAL t = x + 1
| SORT _score
| LIMIT 100

will use k = 100 as well.

The only commands that stop the LIMIT to KNN propagation is STATS.

If there are multiple LIMIT clauses, the lesser one will be applied:

FROM test METADATA _score
| WHERE KNN(vector, [0, 1, 2])
| LIMIT 20
| EVAL t = x + 1
| SORT _score
| LIMIT 10

will have k = 10.

If the user has already specified a k value in the KNN function via the k option, the k option will prevail vs using a limit:

FROM test
| WHERE KNN(vector, [0, 1, 2],  {k: 10})
| LIMIT 100

will hava k = 10.

k needs to be specified either implicitly (via LIMIT) or explicitly (via the k option). We have a default LIMIT of 1000 that is applied implicitly, except for example in the case of STATS:

FROM test
| WHERE KNN(vector, [0, 1, 2])
| STATS c = count(*)

the query above will fail as there is no default LIMIT applied and k option is not set.

Open questions:

Should we just make K a mandatory param and don't depend on LIMIT?
Should KNN use K as an optional param instead of an option?
Should the default LIMIT of 1000 be considered an error for KNN, and force the user to either set a LIMIT or set K in KNN?

… the query builder is retrieved

…nction-limit

…n-limit' into non-issue/esql-knn-function-limit

carlosdelest · 2025-06-12T16:50:31Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/knn-function.csv-spec

I changed these tests to:

Remove the "k" option and replace it by LIMIT

Remove STATS use cases, as I've removed the ability to use KNN in STATS for now

carlosdelest · 2025-06-12T16:51:03Z

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java

     * @param failures failures found
     */
-    private static void checkFullTextQueryFunctions(LogicalPlan plan, Failures failures) {
+    private void checkFullTextQueryFunctions(LogicalPlan plan, Failures failures) {


Changed visibility of these methods to allow overriding in KNN

carlosdelest · 2025-06-12T16:54:02Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

            new EsqlCCSUtils.CssPartialErrorsActionListener(executionInfo, listener) {
                @Override
                public void onResponse(LogicalPlan analyzedPlan) {
+                    LogicalPlan optimizedPlan = optimizedPlan(analyzedPlan);


Changed PreMapper to be done after the plan has been optimized. This way, KNN can update k before the QueryBuilder is created, and so it takes the k coming from the optimization

carlosdelest · 2025-06-12T16:55:36Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java

+     * KNN should not be used in aggregations, as it is a top-N query and not a filtering query
+     */
+    @Override
+    protected void checkFullTextFunctionsInAggs(Aggregate agg, Failures failures) {


I removed the ability to use KNN in aggs. It does not make sense to me - KNN retrieves a top K in terms of similarity, not as a filtering function like other FTFs.

We can undo that decision later, for example forcing it to use a minimum similarity - but still think that it's too much of an edge case as of now, and complicates testing.

benwtrent · 2025-06-13T14:05:40Z

FROM test METADATA _score
| WHERE KNN(vector, [0, 1, 2])
| LIMIT 20
| EVAL t = x + 1
| SORT _score
| LIMIT 10

Having knn take the lesser one doesn't make sense to me. Why shouldn't it take the first one?

As a user, I would assume it takes the first one, then allows me to do whatever I want with the table of k:20, then further limit based on some subsequent actions.

carlosdelest · 2025-06-13T15:38:16Z

Having knn take the lesser one doesn't make sense to me. Why shouldn't it take the first one?

As a user, I would assume it takes the first one, then allows me to do whatever I want with the table of k:20, then further limit based on some subsequent actions.

@benwtrent In case there's a LIMIT that comes after that reduces the resultset, then whatever the user does before would get lost when applying the lesser LIMIT afterwards.

Given the following:

| WHERE KNN(..)
| LIMIT 20
| EVAL k = x +1
| LIMIT 10

Then it doesn't matter what I calculated in the bottom 10 rows as it will be discarded by the last LIMIT 10.

This comes from an optimization already done for pushing down and combining limits - I wanted to keep the same semantics for KNN, as the returned rows that don't pass a posterior LIMIT would be discarded as well.

The only case I can think of is if we're changing the SORT order between limits - I don't see that taken into account in the push down limits optimization, I'll think about it more.

benwtrent · 2025-06-13T15:57:04Z

@carlosdelest yes, sort order changing is exactly my concern

carlosdelest · 2025-09-15T06:10:52Z

Done as part of #132944

carlosdelest added 10 commits June 10, 2025 13:43

Add limit to Knn query, set it in PushDownAndCombineLimits

887615c

First test - default limit

56d4317

Add tests

625da5b

Add tests

ac0d231

Include push down limits past sorting

a6562f8

KNN can't be used in stats functions

967a415

Separate push down limits to knn into its own rule class

0e049ae

Perform premapping after optimization so FTFs can be optimized before…

f6cb5d9

… the query builder is retrieved

Fix tests

1fe1a89

Merge remote-tracking branch 'origin/main' into non-issue/esql-knn-fu…

3f16640

…nction-limit

elasticsearchmachine added the v9.1.0 label Jun 12, 2025

carlosdelest and others added 7 commits June 12, 2025 17:46

Spotless

67280fc

[CI] Auto commit changes from spotless

44c87cb

Guard knn tests with capability

81f84a2

LIMITs should not pass through STATS

89d8907

K must be set either explicitly or implicitly

9167cab

Merge remote-tracking branch 'carlosdelest/non-issue/esql-knn-functio…

c5467d2

…n-limit' into non-issue/esql-knn-function-limit

Add javadoc

54e6407

carlosdelest commented Jun 12, 2025

View reviewed changes

carlosdelest and others added 3 commits June 13, 2025 10:22

Fix tests

6931643

[CI] Auto commit changes from spotless

b810ca6

Merge branch 'main' into non-issue/esql-knn-function-limit

6b10677

carlosdelest mentioned this pull request Jun 20, 2025

ESQL - Add K mandatory param for KNN function #129763

Merged

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

carlosdelest mentioned this pull request Sep 1, 2025

ES|QL - dense_vector approximate nearest neighbour search support (GA) #126710

Open

8 tasks

carlosdelest closed this Sep 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ESQL - KNN function uses LIMIT for setting top k #129353

ESQL - KNN function uses LIMIT for setting top k #129353

Uh oh!

carlosdelest commented Jun 12, 2025 •

edited

Loading

Uh oh!

carlosdelest Jun 12, 2025

Uh oh!

carlosdelest Jun 12, 2025

Uh oh!

carlosdelest Jun 12, 2025

Uh oh!

carlosdelest Jun 12, 2025

Uh oh!

benwtrent commented Jun 13, 2025

Uh oh!

carlosdelest commented Jun 13, 2025 •

edited

Loading

Uh oh!

benwtrent commented Jun 13, 2025

Uh oh!

carlosdelest commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ESQL - KNN function uses LIMIT for setting top k #129353

ESQL - KNN function uses LIMIT for setting top k #129353

Uh oh!

Conversation

carlosdelest commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Open questions:

Uh oh!

carlosdelest Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent commented Jun 13, 2025

Uh oh!

carlosdelest commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent commented Jun 13, 2025

Uh oh!

carlosdelest commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

carlosdelest commented Jun 12, 2025 •

edited

Loading

carlosdelest commented Jun 13, 2025 •

edited

Loading