[8.x] Vector rescoring oversamples k instead of num_candidates #119887

carlosdelest · 2025-01-09T17:15:32Z

It makes more sense to apply rescoring to an oversampled k instead of num_candidates, as rescoring just a fraction of the candidates will be more performant and offer good recall, specially for smaller k sizes compared to number of candidates.

API changes so we use oversample instead of num_candidates_factor:

GET msmarco-v2-bbq/_search
{
    "query": {
        "knn": {
            "field": "emb",
            "query_vector": [...],
            "k": 10,
            "num_candidates": 100,
            "rescore_vector": {
                "overseample": 2.5
            }
        }
    }
}

This will mean rescoring k * oversample from the num_candidates retrieved on each shard, and returning the top k out of them.

Follow up to #116663

We start with 8.x and will backport to main, as this introduces an incompatible API change that we want to land in 8.x first so BwC and rest compat tests pass in main.

elasticsearchmachine · 2025-01-09T19:55:59Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

carlosdelest · 2025-01-13T07:06:55Z

Closing in favor of #119835

carlosdelest added 3 commits January 9, 2025 18:07

Use oversample to modify k instead of num_candidates for rescoring

0f956b4

Renaming typo

ab484b5

Fix test

401fede

carlosdelest added :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 >non-issue labels Jan 9, 2025

carlosdelest marked this pull request as ready for review January 9, 2025 19:55

carlosdelest requested a review from benwtrent January 9, 2025 19:55

carlosdelest closed this Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

Uh oh!

carlosdelest commented Jan 9, 2025

Uh oh!

elasticsearchmachine commented Jan 9, 2025

Uh oh!

carlosdelest commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

Uh oh!

Conversation

carlosdelest commented Jan 9, 2025

Uh oh!

elasticsearchmachine commented Jan 9, 2025

Uh oh!

carlosdelest commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants