Skip to content

Commit 5643712

Browse files
kaivalnpbenwtrent
authored andcommitted
Fix failing BaseVectorSimilarityQueryTestCase#testApproximate (#12922)
Discovered in #12921, and introduced in #12679 The first issue is that we weren't advancing the `VectorScorer` [here](https://github.com/apache/lucene/blob/cf13a9295052288b748ed8f279f05ee26f3bfd5f/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L257-L262) -- so it was still un-positioned while trying to compute the similarity score Earlier in the PR, the underlying delegate of the `FilteredDocIdSetIterator` was `scorer.iterator()` (see [here](https://github.com/apache/lucene/blob/cad565439be512ac6e95a698007b1fc971173f00/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L107)) -- so we didn't need to explicitly advance it Later, we decided to maintain parity to `AbstractKnnVectorQuery` and introduce filtering in `AbstractVectorSimilarityQuery` (see [this commit](5096790)) to determine the `visitLimit` of approximate search -- after which the underlying iterator changed to the accepted docs (see [here](https://github.com/apache/lucene/blob/5096790f281e477c529a7c8311aeb353ccdffdeb/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L255)) and I missed advancing the `VectorScorer` explicitly.. After doing so, we no longer get the original `java.lang.ArrayIndexOutOfBoundsException` -- but the `BaseVectorSimilarityQueryTestCase#testApproximate` starts failing because it falls back to exact search, as the limit of the prefilter is met during graph search Relaxed the parameters of the test to fix this (making the filter less restrictive, and trying to visit a fewer number of nodes so that approximate search completes without hitting its limit) Sorry for missing this earlier!
1 parent 2df6908 commit 5643712

File tree

3 files changed

+9
-2
lines changed

3 files changed

+9
-2
lines changed

lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,11 @@ static VectorSimilarityScorer fromAcceptDocs(
256256
new FilteredDocIdSetIterator(acceptDocs) {
257257
@Override
258258
protected boolean match(int doc) throws IOException {
259+
// Advance the scorer
260+
if (!scorer.advanceExact(doc)) {
261+
return false;
262+
}
263+
259264
// Compute the dot product
260265
float score = scorer.score();
261266
cachedScore[0] = score * boost;

lucene/core/src/java/org/apache/lucene/search/VectorScorer.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ public boolean advanceExact(int doc) throws IOException {
8787

8888
@Override
8989
public float score() throws IOException {
90+
assert values.docID() != -1 : getClass().getSimpleName() + " is not positioned";
9091
return similarity.compare(query, values.vectorValue());
9192
}
9293
}
@@ -117,6 +118,7 @@ public boolean advanceExact(int doc) throws IOException {
117118

118119
@Override
119120
public float score() throws IOException {
121+
assert values.docID() != -1 : getClass().getSimpleName() + " is not positioned";
120122
return similarity.compare(query, values.vectorValue());
121123
}
122124
}

lucene/core/src/test/org/apache/lucene/search/BaseVectorSimilarityQueryTestCase.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -433,8 +433,8 @@ public void testFallbackToExact() throws IOException {
433433

434434
public void testApproximate() throws IOException {
435435
// Non-restrictive filter, along with similarity to visit a small number of nodes
436-
int numFiltered = random().nextInt(numDocs - (numDocs * 4) / 5) + (numDocs * 4) / 5;
437-
int targetVisited = random().nextInt(numFiltered / 8 - numFiltered / 10) + numFiltered / 10;
436+
int numFiltered = numDocs - 1;
437+
int targetVisited = random().nextInt(numFiltered / 10 - 1) + 1;
438438

439439
V[] vectors = getRandomVectors(numDocs, dim);
440440
V queryVector = getRandomVector(dim);

0 commit comments

Comments
 (0)