Skip to content

Conversation

@john-wagster
Copy link
Contributor

@john-wagster john-wagster commented Jul 25, 2025

Quantizing all centroids at a 1 bit level and then re-quantizing a subset at a 4 bit level. Seems to show nice improvements and likely will help with scaling but we need to run larger datasets to be sure.

Part of exploring #131234

main vs 1bit rescore
# main 10m dbpedia
index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf  10000000           90237                337852             0

index_name                             index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf      100         2.04              0.00           0.00  491.40    0.74   43437.83                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200         3.24              0.00           0.00  308.64    0.79   85848.81                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500         6.42              0.00           0.00  155.70    0.85  209564.45                1.00

# bbq_quantizing_rescoring 10m dpbedia 1bit filter pass oversample 1x nProbe (100, 200, 500) for ~10k centroids w bulk read/write for 1bit centroids w always full queue
index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf  10000000           82835                228470             0

index_name                             index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf      100         1.48              0.00           0.00  676.82    0.74   44297.70                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200         2.51              0.00           0.00  398.80    0.80   87600.88                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500         5.70              0.00           0.00  175.52    0.86  213596.79                1.00


# main 3m cohere
index_name                          index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
----------------------------------  ----------  --------  --------------  --------------------  ------------  
cohere-wikipedia-10m-docs-768d.vec         ivf   3000000          184590                291532             0

index_name                          index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
----------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
cohere-wikipedia-10m-docs-768d.vec         ivf      100         1.77              0.00           0.00  565.29    0.84   18526.90                1.00
cohere-wikipedia-10m-docs-768d.vec         ivf      200         2.61              0.00           0.00  383.73    0.90   40016.92                1.00
cohere-wikipedia-10m-docs-768d.vec         ivf      500         5.36              0.00           0.00  186.53    0.94  108335.49                1.00

# bbq_quantizing_rescoring 3m cohere 1bit filter pass oversample 1x nProbe (100, 200, 500)
index_name                          index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
----------------------------------  ----------  --------  --------------  --------------------  ------------  
cohere-wikipedia-10m-docs-768d.vec         ivf   3000000           83569                288610             0

index_name                          index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
----------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
cohere-wikipedia-10m-docs-768d.vec         ivf      100         1.24              0.00           0.00  806.45    0.84   18601.59                1.00
cohere-wikipedia-10m-docs-768d.vec         ivf      200         2.14              0.00           0.00  466.64    0.90   39888.18                1.00
cohere-wikipedia-10m-docs-768d.vec         ivf      500         4.74              0.00           0.00  211.19    0.94  107405.52   


# main 10m glove
index_name                         index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
---------------------------------  ----------  --------  --------------  --------------------  ------------  
enwiki-20120502-lines-1k-200d.vec         ivf  10000000           75592                273178             0

index_name                         index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
---------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
enwiki-20120502-lines-1k-200d.vec         ivf      100         2.51              0.00           0.00  398.25    0.63   41179.24                1.00
enwiki-20120502-lines-1k-200d.vec         ivf      200         3.31              0.00           0.00  302.11    0.66   82515.54                1.00
enwiki-20120502-lines-1k-200d.vec         ivf      500         6.14              0.00           0.00  162.95    0.71  204451.47                1.00
enwiki-20120502-lines-1k-200d.vec         ivf      800         8.72              0.00           0.00  114.68    0.73  324602.77                1.00

# bbq_quantizing_rescoring 3m cohere 1bit filter pass oversample 1x nProbe (100, 200, 500)
index_name                         index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
---------------------------------  ----------  --------  --------------  --------------------  ------------  
enwiki-20120502-lines-1k-200d.vec         ivf  10000000          199676                302940             0

index_name                         index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
---------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
enwiki-20120502-lines-1k-200d.vec         ivf      100         1.51              0.00           0.00  663.57    0.59   43242.75                1.00
enwiki-20120502-lines-1k-200d.vec         ivf      200         2.35              0.00           0.00  425.89    0.63   86191.62                1.00
enwiki-20120502-lines-1k-200d.vec         ivf      500         5.03              0.00           0.00  198.61    0.69  212031.32                1.00
enwiki-20120502-lines-1k-200d.vec         ivf      800         7.86              0.00           0.00  127.19    0.71  335586.37                1.00


# main hotpotqa 5m
index_name                       index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-hotpotqa-E5-small-0.fvec         ivf   5000000           84688                250279             0

index_name                       index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
corpus-hotpotqa-E5-small-0.fvec         ivf      100         2.03              0.00           0.00  492.61    0.74   46490.24                1.00
corpus-hotpotqa-E5-small-0.fvec         ivf      200         3.05              0.00           0.00  327.60    0.78   90759.95                1.00
corpus-hotpotqa-E5-small-0.fvec         ivf      500         6.27              0.00           0.00  159.49    0.83  219369.09                1.00

# bbq_quantizing_rescoring hotpotqa 5m 1bit filter pass oversample 1x nProbe (100, 200, 500)
index_name                       index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-hotpotqa-E5-small-0.fvec         ivf   5000000           83199                265299             0

index_name                       index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------  
corpus-hotpotqa-E5-small-0.fvec         ivf      100         1.53              0.00           0.00  653.59    0.74   47434.10                1.00
corpus-hotpotqa-E5-small-0.fvec         ivf      200         2.69              0.00           0.00  371.75    0.79   92671.01                1.00
corpus-hotpotqa-E5-small-0.fvec         ivf      500         6.11              0.00           0.00  163.80    0.83  222184.33                1.00


# main quora-e5small 5m
index_name                  index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
--------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-quora-E5-small.fvec         ivf   5000000            8309                 14576             0

index_name                  index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall   visited  filter_selectivity
--------------------------  ----------  -------  -----------  ----------------  -------------  -------  ------  --------  ------------------  
corpus-quora-E5-small.fvec         ivf       15         0.35              0.00           0.00  2890.17    0.89   7715.37                1.00
corpus-quora-E5-small.fvec         ivf       40         0.57              0.00           0.00  1742.16    0.94  19676.86                1.00
corpus-quora-E5-small.fvec         ivf       70         0.87              0.00           0.00  1150.75    0.96  33377.72                1.00

# bbq_quantizing_rescoring quora-e5small 5m 1bit filter pass oversample 1x nProbe (100, 200, 500)
index_name                  index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
--------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-quora-E5-small.fvec         ivf   5000000            7814                 14280             0

index_name                  index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall   visited  filter_selectivity
--------------------------  ----------  -------  -----------  ----------------  -------------  -------  ------  --------  ------------------  
corpus-quora-E5-small.fvec         ivf       15         0.31              0.00           0.00  3205.13    0.89   7704.26                1.00
corpus-quora-E5-small.fvec         ivf       40         0.56              0.00           0.00  1788.91    0.94  19718.86                1.00
corpus-quora-E5-small.fvec         ivf       70         0.88              0.00           0.00  1136.36    0.96  33515.73                1.00

// FIXME: do l2normalize here?
final float[] scratchTarget = new float[targetQuery.length];
System.arraycopy(targetQuery, 0, scratchTarget, 0, targetQuery.length);
if (fieldInfo.getVectorSimilarityFunction() == COSINE) {
Copy link
Contributor Author

@john-wagster john-wagster Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we weren't normalizing for COSINE here previously; bug maybe?

}
CentroidIterator centroidIterator = getCentroidIterator(fieldInfo, entry.numCentroids, entry.centroidSlice(ivfCentroids), target);

final int numOversampled = Math.min((int) (nProbe * NPROBE_OVERSAMPLE), entry.numCentroids());
Copy link
Contributor

@iverase iverase Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be add this computation in DefaultIVFVectors reader and pass nProbe to the method getCentroidIterator?. This is an implementation detail of the strategy you are implementing and should not leak here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants