Skip to content

Fix mergedbytevectorvalues lastord#15553

Merged
kaivalnp merged 5 commits intoapache:mainfrom
finnroblin:fix-mergedbytevectorvalues-lastord
Feb 3, 2026
Merged

Fix mergedbytevectorvalues lastord#15553
kaivalnp merged 5 commits intoapache:mainfrom
finnroblin:fix-mergedbytevectorvalues-lastord

Conversation

@finnroblin
Copy link
Contributor

Description

Fixes MergedByteVectorValues behavior as described in #14992. Currently in MergedByteVectorValues::nextDoc() the lastOrd value is not incremented when the iterator is advanced. If nextDoc is called several times and a vector is loaded then the bad state of lastOrd causes an exception. One case where this occurs is In OpenSearch k-NN where we split a list of vectors into multiple parts to upload the partitions in parallel. (Please see opensearch-project/k-NN#2803 for more details about the use case this bugfix solves).

This PR includes a unit test that fails without the bugfix.

Thanks @0ctopus13prime for the original RFC and bugfix!

Failed test output, pre-bug fix: (use the first commit in this PR to replicate. The second commit contains bugfix.)

TestMergedByteVectorValues > testSkipThenLoadByteVectorDuringMerge FAILED
    java.lang.AssertionError: Test failed during merge
        at __randomizedtesting.SeedInfo.seed([CDF9B6BFD3014176:37C8D3B5F864CA9F]:0)
        at org.apache.lucene.codecs.TestMergedByteVectorValues.testSkipThenLoadByteVectorDuringMerge(TestMergedByteVectorValues.java:201)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:565)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:52)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1474)

        Caused by:
        java.lang.IllegalStateException: only supports forward iteration: ord=3, lastOrd=-1
            at org.apache.lucene.codecs.KnnVectorsWriter$MergedVectorValues$MergedByteVectorValues.vectorValue(KnnVectorsWriter.java:418)
            at org.apache.lucene.codecs.TestMergedByteVectorValues$1$1.mergeOneField(TestMergedByteVectorValues.java:137)
            at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.mergeOneField(PerFieldKnnVectorsFormat.java:128)
            at org.apache.lucene.codecs.KnnVectorsWriter.merge(KnnVectorsWriter.java:105)
            at org.apache.lucene.index.SegmentMerger.mergeVectorValues(SegmentMerger.java:272)
            at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:315)
            at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:159)
            at org.apache.lucene.index.IndexWriter.addIndexesReaderMerge(IndexWriter.java:3467)
            at org.apache.lucene.index.IndexWriter$AddIndexesMergeSource.merge(IndexWriter.java:3349)
            at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:661)
            at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:720)


:lucene:core:test (FAILURE): 1 test, 1  failure

1 test completed, 1 failed

> Task :checkAnyTestIncludedAfterFiltering
> Task :lucene:core:wipeTaskTemp UP-TO-DATE

ERROR: 1 test has failed:

  - org.apache.lucene.codecs.TestMergedByteVectorValues.testSkipThenLoadByteVectorDuringMerge (:lucene:core)
    Test output: /Users/finnrobl/Documents/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.codecs.TestMergedByteVectorValues.txt
    Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.codecs.TestMergedByteVectorValues.testSkipThenLoadByteVectorDuringMerge" -Ptests.asserts=true -Ptests.file.encoding=UTF-8 -Ptests.gui=false "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.jvms=6 -Ptests.seed=CDF9B6BFD3014176 -Ptests.vectorsize=default

@navneet1v
Copy link
Contributor

Tagging few folks. @mikemccand, @kaivalnp and @vigyasharma to see if they can review the PR.

Copy link
Contributor

@kaivalnp kaivalnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix LGTM (makes logic same as MergedFloat32VectorValues), one small comment about testing though

* <p>The bug: MergedByteVectorValues.nextDoc() does not increment lastOrd, so when you skip N
* vectors and then try to load vectorValue(N), it fails because lastOrd is still -1.
*/
public void testSkipThenLoadByteVectorDuringMerge() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setup of this test seems complicated to me -- considering the gist is:

// Run the test
ByteVectorValues values =
    KnnVectorsWriter.MergedVectorValues.mergeByteVectorValues(
        fieldInfo, mergeState);

// Skip doc 0 and load doc 1
values.iterator().nextDoc(); // doc 0
values.iterator().nextDoc(); // doc 1

// Read vector for doc 1
assertArrayEquals(expected, values.vectorValue(1)); // throws IllegalArgumentException on main

It's probably because ByteVectorValuesSub and MergedByteVectorValues are private, so we have to go through public APIs to instantiate them + execute the test?

Perhaps we can make them package-private for testing? It would simplify the test to something like:

/**
 * Test that skipping vectors in MergedByteVectorValues via nextDoc() and then loading a
 * subsequent vector via vectorValue() works correctly.
 */
public void testSkipsInMergedByteVectorValues() throws IOException {
  // Data
  List<byte[]> vectors = List.of(new byte[] {0}, new byte[] {1});

  // Setup
  KnnVectorsWriter.ByteVectorValuesSub sub =
      new KnnVectorsWriter.ByteVectorValuesSub(x -> x, ByteVectorValues.fromBytes(vectors, 1));
  MergeState state =
      new MergeState(
          null, null, null, null, null, null, null, null, null, null, null, null, null, null,
          null, false);

  // Run the test
  ByteVectorValues values =
      new KnnVectorsWriter.MergedVectorValues.MergedByteVectorValues(List.of(sub), state);

  // Skip doc 0 and load doc 1
  values.iterator().nextDoc(); // doc 0
  values.iterator().nextDoc(); // doc 1

  // Read vector for doc 1
  assertArrayEquals(vectors.get(1), values.vectorValue(1));
}

Although I'll wait for someone else to weigh in on this..

@github-actions
Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 23, 2026
@finnroblin finnroblin force-pushed the fix-mergedbytevectorvalues-lastord branch from cabb630 to f62ff8a Compare January 23, 2026 10:03
Copy link
Contributor

@kaivalnp kaivalnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

This bug only affects users doing something specific during merging that skips a vector? But it's nice to fix nevertheless..

Can you also add a CHANGES.txt entry?

}

private static class ByteVectorValuesSub extends DocIDMerger.Sub {
static class ByteVectorValuesSub extends DocIDMerger.Sub {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some Javadocs with the @lucene.internal tag to avoid users from depending on this class, now that its more accessible?

ByteVectorValuesSub current;

private MergedByteVectorValues(List<ByteVectorValuesSub> subs, MergeState mergeState)
MergedByteVectorValues(List<ByteVectorValuesSub> subs, MergeState mergeState)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment specifying that this is package-private for testing?

* Test that skipping vectors in MergedByteVectorValues via nextDoc() and then loading a
* subsequent vector via vectorValue() works correctly.
*/
public void testSkipsInMergedByteVectorValues() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include an equivalent test for the float vector values? (making the class more generic like TestMergedVectorValues)

finnroblin and others added 2 commits February 2, 2026 10:34
@github-actions github-actions bot modified the milestones: 11.0.0, 10.4.0 Feb 2, 2026
@finnroblin
Copy link
Contributor Author

Thanks @kaivalnp ! I've added @lucene.internal annotations/package-private comments as suggested and added a mergedFloatVectorValues test. Also, I added to CHANGES.txt under the 10.4.0 section.

This bug only affects users doing something specific during merging that skips a vector?

Yes, the bugfix will allow OpenSearch k-NN to partition different parts of the vectorvalues during merging so we can upload the vectors in parallel for a GPU acceleration use case (more detail here). I suppose it will also affect any user who uses nextDoc -> delayed vectorValue read, but I can't think of any use cases besides partitioning a segment.

@finnroblin finnroblin requested a review from kaivalnp February 2, 2026 18:49
Copy link
Contributor

@kaivalnp kaivalnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks @finnroblin, the build was failing so I pushed a small commit to tidy, hope you don't mind :)

Also tagging @benwtrent who's cutting the 10.4 branch soon (tomorrow!) -- this bug fix seems small + good to include? (please feel free to merge if you agree)

@benwtrent
Copy link
Member

@kaivalnp freeze is in effect, but we are blocked by #15662

I am ok with this small change.

FYI, I have cut the branch.

@kaivalnp
Copy link
Contributor

kaivalnp commented Feb 3, 2026

@benwtrent sorry I'm a bit unfamiliar with what can and cannot be merged in the freeze -- do you mean we can put this bug-fix in 10.4? (i.e. first commit to main, then to branch_10x, then raise PR against branch_10_4).

Or does it have to go into 10.5 now?

@benwtrent
Copy link
Member

Freeze means only justified bug fixes

Since we are blocked, I think this one can go in, but please backport to 10.4 branch as soon as possible. I hope to start back up the releases process soon

@kaivalnp kaivalnp merged commit 48e78c5 into apache:main Feb 3, 2026
13 checks passed
kaivalnp pushed a commit that referenced this pull request Feb 3, 2026
Move internal ordinal tracking in `MergedByteVectorValues` from `vectorValue` -> `nextDoc` to allow loading only a subset of vectors during iteration.
kaivalnp pushed a commit to kaivalnp/lucene that referenced this pull request Feb 3, 2026
Move internal ordinal tracking in `MergedByteVectorValues` from `vectorValue` -> `nextDoc` to allow loading only a subset of vectors during iteration.
@kaivalnp
Copy link
Contributor

kaivalnp commented Feb 3, 2026

Thanks @benwtrent -- I merged, backported to branch_10x and opened #15664 for pushing to branch_10_4

Edit: please let me know if the PR was unnecessary because we are blocked, and if I should push it instead.

@finnroblin
Copy link
Contributor Author

Thanks @kaivalnp @benwtrent @navneet1v !

kaivalnp pushed a commit that referenced this pull request Feb 4, 2026
Move internal ordinal tracking in `MergedByteVectorValues` from `vectorValue` -> `nextDoc` to allow loading only a subset of vectors during iteration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants