Fix mergedbytevectorvalues lastord by finnroblin · Pull Request #15553 · apache/lucene

finnroblin · 2026-01-07T21:45:00Z

Description

Fixes MergedByteVectorValues behavior as described in #14992. Currently in MergedByteVectorValues::nextDoc() the lastOrd value is not incremented when the iterator is advanced. If nextDoc is called several times and a vector is loaded then the bad state of lastOrd causes an exception. One case where this occurs is In OpenSearch k-NN where we split a list of vectors into multiple parts to upload the partitions in parallel. (Please see opensearch-project/k-NN#2803 for more details about the use case this bugfix solves).

This PR includes a unit test that fails without the bugfix.

Thanks @0ctopus13prime for the original RFC and bugfix!

Failed test output, pre-bug fix: (use the first commit in this PR to replicate. The second commit contains bugfix.)

TestMergedByteVectorValues > testSkipThenLoadByteVectorDuringMerge FAILED
    java.lang.AssertionError: Test failed during merge
        at __randomizedtesting.SeedInfo.seed([CDF9B6BFD3014176:37C8D3B5F864CA9F]:0)
        at org.apache.lucene.codecs.TestMergedByteVectorValues.testSkipThenLoadByteVectorDuringMerge(TestMergedByteVectorValues.java:201)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:565)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:52)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1474)

        Caused by:
        java.lang.IllegalStateException: only supports forward iteration: ord=3, lastOrd=-1
            at org.apache.lucene.codecs.KnnVectorsWriter$MergedVectorValues$MergedByteVectorValues.vectorValue(KnnVectorsWriter.java:418)
            at org.apache.lucene.codecs.TestMergedByteVectorValues$1$1.mergeOneField(TestMergedByteVectorValues.java:137)
            at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.mergeOneField(PerFieldKnnVectorsFormat.java:128)
            at org.apache.lucene.codecs.KnnVectorsWriter.merge(KnnVectorsWriter.java:105)
            at org.apache.lucene.index.SegmentMerger.mergeVectorValues(SegmentMerger.java:272)
            at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:315)
            at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:159)
            at org.apache.lucene.index.IndexWriter.addIndexesReaderMerge(IndexWriter.java:3467)
            at org.apache.lucene.index.IndexWriter$AddIndexesMergeSource.merge(IndexWriter.java:3349)
            at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:661)
            at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:720)


:lucene:core:test (FAILURE): 1 test, 1  failure

1 test completed, 1 failed

> Task :checkAnyTestIncludedAfterFiltering
> Task :lucene:core:wipeTaskTemp UP-TO-DATE

ERROR: 1 test has failed:

  - org.apache.lucene.codecs.TestMergedByteVectorValues.testSkipThenLoadByteVectorDuringMerge (:lucene:core)
    Test output: /Users/finnrobl/Documents/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.codecs.TestMergedByteVectorValues.txt
    Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.codecs.TestMergedByteVectorValues.testSkipThenLoadByteVectorDuringMerge" -Ptests.asserts=true -Ptests.file.encoding=UTF-8 -Ptests.gui=false "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.jvms=6 -Ptests.seed=CDF9B6BFD3014176 -Ptests.vectorsize=default

navneet1v · 2026-01-07T22:10:50Z

Tagging few folks. @mikemccand, @kaivalnp and @vigyasharma to see if they can review the PR.

kaivalnp

The fix LGTM (makes logic same as MergedFloat32VectorValues), one small comment about testing though

kaivalnp · 2026-01-08T18:49:44Z

lucene/core/src/test/org/apache/lucene/codecs/TestMergedByteVectorValues.java

+   * <p>The bug: MergedByteVectorValues.nextDoc() does not increment lastOrd, so when you skip N
+   * vectors and then try to load vectorValue(N), it fails because lastOrd is still -1.
+   */
+  public void testSkipThenLoadByteVectorDuringMerge() throws IOException {


The setup of this test seems complicated to me -- considering the gist is:

// Run the test ByteVectorValues values = KnnVectorsWriter.MergedVectorValues.mergeByteVectorValues( fieldInfo, mergeState); // Skip doc 0 and load doc 1 values.iterator().nextDoc(); // doc 0 values.iterator().nextDoc(); // doc 1 // Read vector for doc 1 assertArrayEquals(expected, values.vectorValue(1)); // throws IllegalArgumentException on main

It's probably because ByteVectorValuesSub and MergedByteVectorValues are private, so we have to go through public APIs to instantiate them + execute the test?

Perhaps we can make them package-private for testing? It would simplify the test to something like:

/** * Test that skipping vectors in MergedByteVectorValues via nextDoc() and then loading a * subsequent vector via vectorValue() works correctly. */ public void testSkipsInMergedByteVectorValues() throws IOException { // Data List<byte[]> vectors = List.of(new byte[] {0}, new byte[] {1}); // Setup KnnVectorsWriter.ByteVectorValuesSub sub = new KnnVectorsWriter.ByteVectorValuesSub(x -> x, ByteVectorValues.fromBytes(vectors, 1)); MergeState state = new MergeState( null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, false); // Run the test ByteVectorValues values = new KnnVectorsWriter.MergedVectorValues.MergedByteVectorValues(List.of(sub), state); // Skip doc 0 and load doc 1 values.iterator().nextDoc(); // doc 0 values.iterator().nextDoc(); // doc 1 // Read vector for doc 1 assertArrayEquals(vectors.get(1), values.vectorValue(1)); }

Although I'll wait for someone else to weigh in on this..

github-actions · 2026-01-23T00:29:55Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

kaivalnp

LGTM, thanks!

This bug only affects users doing something specific during merging that skips a vector? But it's nice to fix nevertheless..

Can you also add a CHANGES.txt entry?

kaivalnp · 2026-01-23T19:13:04Z

lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java

  }

-  private static class ByteVectorValuesSub extends DocIDMerger.Sub {
+  static class ByteVectorValuesSub extends DocIDMerger.Sub {


Can we add some Javadocs with the @lucene.internal tag to avoid users from depending on this class, now that its more accessible?

kaivalnp · 2026-01-23T19:14:01Z

lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java

      ByteVectorValuesSub current;

-      private MergedByteVectorValues(List<ByteVectorValuesSub> subs, MergeState mergeState)
+      MergedByteVectorValues(List<ByteVectorValuesSub> subs, MergeState mergeState)


Can you add a comment specifying that this is package-private for testing?

kaivalnp · 2026-01-23T19:15:35Z

lucene/core/src/test/org/apache/lucene/codecs/TestMergedVectorValues.java

+   * Test that skipping vectors in MergedByteVectorValues via nextDoc() and then loading a
+   * subsequent vector via vectorValue() works correctly.
+   */
+  public void testSkipsInMergedByteVectorValues() throws IOException {


Should we include an equivalent test for the float vector values? (making the class more generic like TestMergedVectorValues)

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

finnroblin · 2026-02-02T18:49:50Z

Thanks @kaivalnp ! I've added @lucene.internal annotations/package-private comments as suggested and added a mergedFloatVectorValues test. Also, I added to CHANGES.txt under the 10.4.0 section.

This bug only affects users doing something specific during merging that skips a vector?

Yes, the bugfix will allow OpenSearch k-NN to partition different parts of the vectorvalues during merging so we can upload the vectors in parallel for a GPU acceleration use case (more detail here). I suppose it will also affect any user who uses nextDoc -> delayed vectorValue read, but I can't think of any use cases besides partitioning a segment.

kaivalnp

LGTM!

Thanks @finnroblin, the build was failing so I pushed a small commit to tidy, hope you don't mind :)

Also tagging @benwtrent who's cutting the 10.4 branch soon (tomorrow!) -- this bug fix seems small + good to include? (please feel free to merge if you agree)

benwtrent · 2026-02-03T19:58:16Z

@kaivalnp freeze is in effect, but we are blocked by #15662

I am ok with this small change.

FYI, I have cut the branch.

kaivalnp · 2026-02-03T20:58:18Z

@benwtrent sorry I'm a bit unfamiliar with what can and cannot be merged in the freeze -- do you mean we can put this bug-fix in 10.4? (i.e. first commit to main, then to branch_10x, then raise PR against branch_10_4).

Or does it have to go into 10.5 now?

benwtrent · 2026-02-03T21:16:11Z

Freeze means only justified bug fixes

Since we are blocked, I think this one can go in, but please backport to 10.4 branch as soon as possible. I hope to start back up the releases process soon

Move internal ordinal tracking in `MergedByteVectorValues` from `vectorValue` -> `nextDoc` to allow loading only a subset of vectors during iteration.

kaivalnp · 2026-02-03T22:28:40Z

Thanks @benwtrent -- I merged, backported to branch_10x and opened #15664 for pushing to branch_10_4

Edit: please let me know if the PR was unnecessary because we are blocked, and if I should push it instead.

finnroblin · 2026-02-03T23:20:29Z

Thanks @kaivalnp @benwtrent @navneet1v !

Move internal ordinal tracking in `MergedByteVectorValues` from `vectorValue` -> `nextDoc` to allow loading only a subset of vectors during iteration.

github-actions bot added the module:core/codecs label Jan 7, 2026

finnroblin marked this pull request as ready for review January 7, 2026 21:57

finnroblin mentioned this pull request Jan 7, 2026

MergedByteVectorValues is missing increasing lastOrd when advancing. #14992

Closed

navneet1v approved these changes Jan 7, 2026

View reviewed changes

kaivalnp reviewed Jan 8, 2026

View reviewed changes

github-actions bot added the Stale label Jan 23, 2026

finnroblin force-pushed the fix-mergedbytevectorvalues-lastord branch from cabb630 to f62ff8a Compare January 23, 2026 10:03

kaivalnp reviewed Jan 23, 2026

View reviewed changes

github-actions bot removed the Stale label Jan 24, 2026

github-actions bot removed module:core/FSTs module:queryparser module:join module:test-framework module:expressions module:sandbox module:spatial3d module:misc module:queries module:monitor module:core/hnsw module:build-infra labels Feb 2, 2026

finnroblin and others added 2 commits February 2, 2026 10:34

fix CHANGES conflict

ff98fc1

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

Merge branch 'main' into fix-mergedbytevectorvalues-lastord

102485a

github-actions bot modified the milestones: 11.0.0, 10.4.0 Feb 2, 2026

finnroblin requested a review from kaivalnp February 2, 2026 18:49

Tidy comments / javadocs

8d1e4fa

kaivalnp approved these changes Feb 3, 2026

View reviewed changes

kaivalnp merged commit 48e78c5 into apache:main Feb 3, 2026
13 checks passed

kaivalnp pushed a commit that referenced this pull request Feb 3, 2026

Fix MergedByteVectorValues internal ordinal tracking (#15553)

06d868d

Move internal ordinal tracking in `MergedByteVectorValues` from `vectorValue` -> `nextDoc` to allow loading only a subset of vectors during iteration.

kaivalnp mentioned this pull request Feb 3, 2026

Backport "Fix MergedByteVectorValues internal ordinal tracking" to 10.4 #15664

Closed

kaivalnp pushed a commit that referenced this pull request Feb 4, 2026

Fix MergedByteVectorValues internal ordinal tracking (#15553)

62c9777

Move internal ordinal tracking in `MergedByteVectorValues` from `vectorValue` -> `nextDoc` to allow loading only a subset of vectors during iteration.

Conversation

finnroblin commented Jan 7, 2026

Description

Uh oh!

navneet1v commented Jan 7, 2026

Uh oh!

kaivalnp left a comment

Choose a reason for hiding this comment

Uh oh!

kaivalnp Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 23, 2026

Uh oh!

kaivalnp left a comment

Choose a reason for hiding this comment

Uh oh!

kaivalnp Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

kaivalnp Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

kaivalnp Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

finnroblin commented Feb 2, 2026

Uh oh!

kaivalnp left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent commented Feb 3, 2026

Uh oh!

kaivalnp commented Feb 3, 2026

Uh oh!

benwtrent commented Feb 3, 2026

Uh oh!

Uh oh!

kaivalnp commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finnroblin commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaivalnp commented Feb 3, 2026 •

edited

Loading