Add support for nested queries for ivf indices #128782

benwtrent · 2025-06-02T20:19:24Z

This does a first pass at adding nested query support for bbq_ivf indices.

The support is pretty simple right now, basically, we keep exploring until we at least get k results to cover the case when the nested docs are all tightly clustered and the typical nprobe explores too few clusters to actually get k docs.

I have some weird test failures I need to debug, so opening as draft for now.

elasticsearchmachine · 2025-06-04T15:20:52Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent · 2025-06-04T15:25:10Z

.../src/main/java/org/elasticsearch/search/vectors/DiversifyingNearestChildrenKnnCollector.java

+ * This collects the nearest children vectors. Diversifying the results over the provided parent
+ * filter. This means the nearest children vectors are returned, but only one per parent
+ */
+class DiversifyingNearestChildrenKnnCollector extends AbstractKnnCollector {


This is mostly copied from Lucene, its package private there, so we cannot use it wholesale. We may end up mutating it to support ivf more directly. But this is just the first step.

benwtrent · 2025-06-04T15:25:49Z

server/src/main/java/org/elasticsearch/search/vectors/IVFKnnFloatVectorQuery.java

        LeafReader reader = context.reader();
        FloatVectorValues floatVectorValues = reader.getFloatVectorValues(field);
-        if (floatVectorValues == null) {
+        if (floatVectorValues == null || knnCollector == null) {


a null collector is now possible if the parent bit set is invalid.

benwtrent · 2025-06-04T15:26:14Z

...t/java/org/elasticsearch/search/vectors/DiversifyingChildrenIVFKnnFloatVectorQueryTests.java

+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.join.BitSetProducer;
+
+public class DiversifyingChildrenIVFKnnFloatVectorQueryTests extends AbstractDiversifyingChildrenIVFKnnVectorQueryTestCase {


I added the abstract sub-class assuming we will have byte support in the future.

benwtrent · 2025-06-04T15:27:48Z

server/src/main/java/org/elasticsearch/index/codec/vectors/IVFVectorsReader.java

+        // TODO do we need to handle nested doc counts similarly to how we handle
+        // filtering? E.g. keep exploring until we hit an expected number of parent documents vs. child vectors?
+        while (centroidQueue.size() > 0 && centroidsVisited < nProbe && knnCollectorImpl.numCollected() < knnCollector.k()) {


I did consider doing something similar to our filtering logic, by treating the number of visited vectors, vs. the number of visited parent docs, but I am not 100% sure its absolutely necessary.

If it is necessary, we will need to add some bit set logic to the collector to keep track of the visited parent docs as we cannot do a simple incremental count as we might visit the same parent document multiple times.

If we want to keep going until we collected k documents and we visited at least nProbe centroids, shouldn't the condition be:

centroidQueue.size() > 0 && (centroidsVisited < nProbe || knnCollectorImpl.numCollected() < knnCollector.k()))

iverase · 2025-06-05T13:13:47Z

server/src/main/java/org/elasticsearch/search/vectors/IVFKnnFloatVectorQuery.java

    ) throws IOException {
        KnnCollector knnCollector = knnCollectorManager.newCollector(visitedLimit, searchStrategy, context);
        LeafReader reader = context.reader();
        FloatVectorValues floatVectorValues = reader.getFloatVectorValues(field);


If the collector is null, we might not want to do this, it is not free.

iverase · 2025-06-05T13:41:57Z

.../org/elasticsearch/search/vectors/AbstractDiversifyingChildrenIVFKnnVectorQueryTestCase.java

+    @Before
+    public void setUp() throws Exception {
+        super.setUp();
+        format = new IVFVectorsFormat(128);


Would it make sense to randomize the number of vectors per cluster?

@iverase I can do that

john-wagster

lgtm

iverase

LGTM

This does a first pass at adding nested query support for bbq_ivf indices. The support is pretty simple right now, basically, we keep exploring until we at least get `k` results to cover the case when the nested docs are all tightly clustered and the typical `nprobe` explores too few clusters to actually get `k` docs. I have some weird test failures I need to debug, so opening as draft for now.

benwtrent added 3 commits May 30, 2025 16:21

Adding nested support for ivf knn search

12367e0

adding tests

2f22dd9

adding tests

04ce871

benwtrent added >non-issue :Search Relevance/Vectors Vector search v9.1.0 labels Jun 2, 2025

benwtrent added 4 commits June 3, 2025 15:37

iter

d7f95d7

removing debugging stuffs

b927078

Merge remote-tracking branch 'upstream/main' into nested-ivf-queries

c860ebb

iter

114db10

benwtrent marked this pull request as ready for review June 4, 2025 15:20

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jun 4, 2025

benwtrent commented Jun 4, 2025

View reviewed changes

benwtrent added 2 commits June 5, 2025 07:33

Merge remote-tracking branch 'upstream/main' into nested-ivf-queries

6d2b3b1

iter

38e658a

iverase reviewed Jun 5, 2025

View reviewed changes

john-wagster approved these changes Jun 5, 2025

View reviewed changes

benwtrent added 2 commits June 5, 2025 15:09

Merge remote-tracking branch 'upstream/main' into nested-ivf-queries

235ab87

addressing pr comments, etc.

a363163

iverase approved these changes Jun 6, 2025

View reviewed changes

benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 6, 2025

benwtrent added 2 commits June 6, 2025 11:08

Merge branch 'main' into nested-ivf-queries

aee6d17

Merge branch 'main' into nested-ivf-queries

67fa286

elasticsearchmachine merged commit b5d5229 into elastic:main Jun 9, 2025
18 checks passed

benwtrent deleted the nested-ivf-queries branch June 9, 2025 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for nested queries for ivf indices #128782

Add support for nested queries for ivf indices #128782

Uh oh!

benwtrent commented Jun 2, 2025

Uh oh!

elasticsearchmachine commented Jun 4, 2025

Uh oh!

benwtrent Jun 4, 2025

Uh oh!

benwtrent Jun 4, 2025

Uh oh!

benwtrent Jun 4, 2025

Uh oh!

benwtrent Jun 4, 2025

Uh oh!

iverase Jun 5, 2025 •

edited

Loading

Uh oh!

iverase Jun 5, 2025

Uh oh!

iverase Jun 5, 2025

Uh oh!

benwtrent Jun 5, 2025

Uh oh!

john-wagster left a comment

Uh oh!

iverase left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add support for nested queries for ivf indices #128782

Add support for nested queries for ivf indices #128782

Uh oh!

Conversation

benwtrent commented Jun 2, 2025

Uh oh!

elasticsearchmachine commented Jun 4, 2025

Uh oh!

benwtrent Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iverase Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

iverase Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

john-wagster left a comment

Choose a reason for hiding this comment

Uh oh!

iverase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iverase Jun 5, 2025 •

edited

Loading