Vector rescoring - Simplify code for k == null #118997

carlosdelest · 2024-12-18T17:47:51Z

Following up on #116663 and #118774

#118774 makes k to be always specified, defaulting to size in case it's null.

This allows to perform some simplifications on rescoring, as we'll always have k to limit the results returned per shard.

… retrieve from each shard

carlosdelest · 2024-12-18T17:55:37Z

...r/src/main/java/org/elasticsearch/index/mapper/vectors/VectorSimilarityFloatValueSource.java

 * original vector values stored in the index
 */
-public class VectorSimilarityFloatValueSource extends DoubleValuesSource implements QueryProfilerProvider {
+public class VectorSimilarityFloatValueSource extends DoubleValuesSource {


Profiling can be simplified, as we always know the number of results to return

carlosdelest · 2024-12-18T17:56:07Z

server/src/main/java/org/elasticsearch/search/vectors/RescoreKnnVectorQuery.java

-
        // Retrieve top k documents from the rescored query
        TopDocs topDocs = searcher.search(query, k);
+        vectorOperations = topDocs.totalHits.value();


We know in advance the number of comparisons done

carlosdelest · 2024-12-18T17:57:00Z

server/src/test/java/org/elasticsearch/search/vectors/RescoreKnnVectorQueryTests.java

    }
-
-    @ParametersFactory
-    public static Iterable<Object[]> parameters() {


We always have a specific k, it makes no sense to use parameters.

carlosdelest · 2024-12-18T18:00:26Z

@elasticmachine update branch

carlosdelest · 2025-01-03T13:15:04Z

@elasticmachine update branch

…se-size-as-k' into non-issue/rescore-vector-use-size-as-k

elasticsearchmachine · 2025-01-03T16:20:10Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent

Good cleanup. Also, I removed the v8.9 tag, I think you added that by accident.

benwtrent

Actually, one of the things I thought we should do is switch to use k as the oversampling and not num candidates. That was one of the original reasons for the overall change of eagerly setting k to request size.

carlosdelest · 2025-01-09T07:37:50Z

Actually, one of the things I thought we should do is switch to use k as the oversampling and not num candidates. That was one of the original reasons for the overall change of eagerly setting k to request size.

Oh yeah, it all started with this conversation.

I can work on that as a follow up - and get back the original design:

{
    "query": {
        "knn": {
            "field": "emb",
            "query_vector": [...],
            "k": 10,
            "num_candidates": 100,
            "rescore_vector": {
                "oversample": 2.0
            }
        }
    }
}

This will mean rescoring k * oversample from the num_candidates retrieved on each shard, and returning the top k out of them.

I'll merge this and start working on it. Draft PR: #119835

elasticsearchmachine · 2025-01-09T07:39:28Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 118997

carlosdelest · 2025-01-09T09:29:43Z

💚 All backports created successfully

Status	Branch	Result
✅	8.x

Questions ?

Please refer to the Backport tool documentation

Use request size when k is null to calculate the number of results to…

e03f240

… retrieve from each shard

carlosdelest added >non-issue auto-backport Automatically create backport pull requests when merged v8.9.0 v8.18.0 labels Dec 18, 2024

elasticsearchmachine added the v9.0.0 label Dec 18, 2024

carlosdelest commented Dec 18, 2024

View reviewed changes

elasticmachine and others added 2 commits December 18, 2024 18:00

Merge branch 'main' into non-issue/rescore-vector-use-size-as-k

1674d50

Merge branch 'main' into non-issue/rescore-vector-use-size-as-k

72b8779

carlosdelest marked this pull request as ready for review December 19, 2024 08:46

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Dec 19, 2024

carlosdelest requested a review from benwtrent December 19, 2024 08:48

elasticmachine and others added 3 commits January 3, 2025 13:15

Merge branch 'main' into non-issue/rescore-vector-use-size-as-k

89fae7c

Remove unnecessary override

0a2f895

Merge remote-tracking branch 'carlosdelest/non-issue/rescore-vector-u…

dcbe0bf

…se-size-as-k' into non-issue/rescore-vector-use-size-as-k

carlosdelest added :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Jan 3, 2025

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jan 3, 2025

carlosdelest added 2 commits January 8, 2025 08:02

Merge branch 'main' into non-issue/rescore-vector-use-size-as-k

17979d5

Remove request size as it is already provided from the query using k

f1e2972

carlosdelest changed the title ~~Vector rescoring - Use request size when k is null~~ Vector rescoring - Simplify code for k == null Jan 8, 2025

benwtrent removed the v8.9.0 label Jan 8, 2025

benwtrent approved these changes Jan 8, 2025

View reviewed changes

benwtrent reviewed Jan 8, 2025

View reviewed changes

carlosdelest merged commit 0cf2ebb into elastic:main Jan 9, 2025
16 checks passed

elasticsearchmachine added the backport pending label Jan 9, 2025

carlosdelest mentioned this pull request Jan 9, 2025

[8.x] Vector rescoring - Simplify code for k == null (#118997) #119838

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vector rescoring - Simplify code for k == null #118997

Vector rescoring - Simplify code for k == null #118997

Uh oh!

carlosdelest commented Dec 18, 2024 •

edited

Loading

Uh oh!

carlosdelest Dec 18, 2024

Uh oh!

carlosdelest Dec 18, 2024

Uh oh!

carlosdelest Dec 18, 2024

Uh oh!

carlosdelest commented Dec 18, 2024

Uh oh!

carlosdelest commented Jan 3, 2025

Uh oh!

elasticsearchmachine commented Jan 3, 2025

Uh oh!

benwtrent left a comment

Uh oh!

benwtrent left a comment

Uh oh!

carlosdelest commented Jan 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 9, 2025

Uh oh!

carlosdelest commented Jan 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Vector rescoring - Simplify code for k == null #118997

Vector rescoring - Simplify code for k == null #118997

Uh oh!

Conversation

carlosdelest commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlosdelest Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

carlosdelest Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

carlosdelest Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Dec 18, 2024

Uh oh!

carlosdelest commented Jan 3, 2025

Uh oh!

elasticsearchmachine commented Jan 3, 2025

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 9, 2025

💔 Backport failed

Uh oh!

carlosdelest commented Jan 9, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

carlosdelest commented Dec 18, 2024 •

edited

Loading

carlosdelest commented Jan 9, 2025 •

edited

Loading