Skip to content

Conversation

benwtrent
Copy link
Member

Since we are quantizing for posting list centroid, I think we can get away with fewer optimization iterations.

Dropping from 5 to 2 reduces latency when hitting many centroids, with no recall impact (at least on my data sets).

baseline:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100         2.43              0.00           0.00   411.52    0.91  23766.65

candidate:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100         1.84              0.00           0.00   543.48    0.91  23766.65

Here is a more extreme case (many segments):

baseline:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100        36.10              0.00           0.00   27.70    0.87  364480.37

candidate:

index_name                      index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited
------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------
cohere-wikipedia-docs-768d.vec         ivf      100        24.94              0.00           0.00   40.10    0.87  364480.37

Need to test against more data sets, but this is a nice improvement.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 2, 2025
Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed need to test this more but good results are good and I buy that it makes sense; lgtm

@benwtrent
Copy link
Member Author

@john-wagster

I am still running more benchmarks, but I ran over

glove-normalized-200.train
corpus-dbpedia-entity-arctic-0.fvec
cohere-wikipedia-docs-768d.vec
corpus-quora-E5-small.fvec.flat

And observed no recall difference.

I am running a larger 8M run of cohere-wikipedia-docs-768d.vec. But even with iter=1 at query time, there has been zero impact on recall, with nice query time latency improvements when hitting many centroids.

@benwtrent
Copy link
Member Author

@john-wagster I ran over all 8M of the cohere 768, observed no recall difference between 5 iterations and 1 iteration for multiple segments and force merged to one.

I am thinking we should call it. If we noticed funky recall in other datasets, we can revisit.

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 3, 2025
@elasticsearchmachine elasticsearchmachine merged commit c0374c2 into elastic:main Jul 7, 2025
34 checks passed
@benwtrent benwtrent deleted the adj-quantization-iters-ivf branch July 7, 2025 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants