Skip to content

Virtual Shards Phase 2 - Filter and Extract#20896

Open
atris wants to merge 3 commits intoopensearch-project:mainfrom
atris:vshards_p2
Open

Virtual Shards Phase 2 - Filter and Extract#20896
atris wants to merge 3 commits intoopensearch-project:mainfrom
atris:vshards_p2

Conversation

@atris
Copy link
Contributor

@atris atris commented Mar 17, 2026

This change adds the Phase 2 virtual-shard extraction primitive.

It introduces IndexShard#extractVirtualShard(int, Path), backed by a Lucene filter-on-write extraction path (VirtualShardFilteredMergePolicy) that writes a standalone index containing only documents for the target virtual shard.

Routing parity is preserved by sharing virtual-shard computation with OperationRouting through VirtualShardRoutingHelper.computeVirtualShardId(...). IndexMetadata also caches index.number_of_virtual_shards for direct access.

What is included:

  • virtual-shard extraction utility in server merge layer
  • IndexShard integration entrypoint with validation
  • shared vShard computation helper used by routing and extraction
  • tests for extraction correctness, empty extraction, routing-field behavior, partitioned routing parity, override-map invariants, closed-shard behavior, bounds checks, and overwrite behavior for existing output paths

Out of scope:

  • transport orchestration/state machine
  • remap workflow APIs and cluster-level move sequencing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

atris added 2 commits March 17, 2026 20:13
Add the Phase 2 storage primitive for virtual shards by extracting documents for a target vShard into a standalone Lucene index.

  - add VirtualShardFilteredMergePolicy (filter-on-write extraction)
  - add IndexShard#extractVirtualShard(int, Path) with validation
  - add VirtualShardRoutingHelper#computeVirtualShardId and use it from OperationRouting to keep routing/extraction parity

Also address post-PR review findings for phase 1 PR.

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit d737bf4.

PathLineSeverityDescription
server/src/main/java/org/opensearch/index/shard/IndexShard.java1694mediumextractVirtualShard() accepts a caller-controlled java.nio.file.Path with no path validation, sandboxing, or authorization check beyond verifyNotClosed(). If this method is wired to a REST/transport handler in a subsequent commit, it becomes an arbitrary filesystem write primitive that could copy shard data to any path accessible by the JVM process.
server/src/main/java/org/opensearch/index/merge/VirtualShardFilteredMergePolicy.java68mediumisolateVirtualShard() opens FSDirectory.open(outputPath) with OpenMode.CREATE (line ~89), which silently overwrites any existing index at that path. Combined with the unchecked caller-supplied path, this is a potential data-overwrite vector should the API surface be extended; the absence of path canonicalization or a chroot-style check is anomalous for a production code path that touches the filesystem.
server/src/main/java/org/opensearch/cluster/metadata/VirtualShardRoutingHelper.java19lowThe @publicapi(since="3.6.0") annotation was removed from VirtualShardRoutingHelper without explanation. This reduces API-stability tracking and could be an attempt to make the class harder to audit as a public contract, though it may also reflect a legitimate decision to keep the class internal.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 9194a9f.

PathLineSeverityDescription
server/src/main/java/org/opensearch/index/merge/VirtualShardFilteredMergePolicy.java64lowisolateVirtualShard() writes Lucene index data to a caller-supplied outputPath via FSDirectory.open() with no path validation or containment checks. If this method is surfaced through an admin or REST API without strict authorization controls, it could allow writing index data to arbitrary filesystem locations accessible by the JVM process. The concern is a design-level path traversal risk rather than intentional malice — the feature appears legitimately scoped to internal shard operations, but no call-site validation is enforced here.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Virtual Shard routing helper and IndexMetadata caching

Relevant files:

  • server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java
  • server/src/main/java/org/opensearch/cluster/metadata/VirtualShardRoutingHelper.java
  • server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java
  • server/src/test/java/org/opensearch/cluster/metadata/VirtualShardRoutingHelperTests.java

Sub-PR theme: Virtual Shard extraction primitive and IndexShard integration

Relevant files:

  • server/src/main/java/org/opensearch/index/merge/VirtualShardFilteredMergePolicy.java
  • server/src/main/java/org/opensearch/index/shard/IndexShard.java
  • server/src/test/java/org/opensearch/index/merge/VirtualShardFilteredMergePolicyTests.java
  • server/src/test/java/org/opensearch/index/shard/IndexShardVirtualShardTests.java

⚡ Recommended focus areas for review

Double liveDocs Check

In VirtualShardFilterReader, the liveDocs Bits implementation re-checks srcLive at query time, but the membership array was already built by skipping deleted docs (lines 126-128). This double-check is redundant and could mask correctness issues if sourceLiveDocs changes between construction and use. The membership array should be the sole source of truth for the filtered live docs.

this.liveDocs = new Bits() {
    @Override
    public boolean get(int index) {
        if (srcLive != null && !srcLive.get(index)) {
            return false;
        }
        return membershipFinal[index];
    }

    @Override
    public int length() {
        return maxDoc;
    }
};
Null CacheHelper

getCoreCacheHelper() and getReaderCacheHelper() both return null. Returning null from these methods can cause NullPointerException in Lucene internals that rely on cache helpers (e.g., query caches, field caches). Consider delegating to the wrapped reader's cache helpers or documenting why null is safe here.

public CacheHelper getCoreCacheHelper() {
    return null;
}

@Override
public CacheHelper getReaderCacheHelper() {
    return null;
}
Skipped Segments

In isolateVirtualShard, if a leaf reader cannot be unwrapped to a CodecReader, it is silently skipped with only a trace log. This means documents in that segment are silently excluded from the extraction output without any warning or error, potentially producing an incomplete virtual shard index.

if (codecReader == null) {
    logger.trace("skipping leaf [{}]: unable to unwrap to CodecReader", leafReader.getClass());
    continue;
Routing Parity Risk

computeVirtualShardId recomputes effectiveRouting and partitionOffset independently from OperationRouting.generateShardId. If OperationRouting is updated (e.g., changes to how effectiveRouting is derived), the two implementations can silently diverge, causing extraction to assign documents to wrong virtual shards. Consider consolidating the hash logic into a single shared method.

public static int computeVirtualShardId(IndexMetadata indexMetadata, String id, String routing) {
    int numVirtualShards = indexMetadata.getNumberOfVirtualShards();
    if (numVirtualShards <= 0) {
        throw new IllegalStateException("virtual shards are not enabled on index [" + indexMetadata.getIndex() + "]");
    }
    String effectiveRouting = (routing != null) ? routing : id;
    int partitionOffset = indexMetadata.isRoutingPartitionedIndex()
        ? Math.floorMod(Murmur3HashFunction.hash(id), indexMetadata.getRoutingPartitionSize())
        : 0;
    int hash = Murmur3HashFunction.hash(effectiveRouting) + partitionOffset;
    return Math.floorMod(hash, numVirtualShards);
}
Override Map Test

testExtractWithOverrideMap builds an overriddenMetadata object but never uses it in an actual extraction call. The test only validates resolvePhysicalShardId with the override map, but does not verify that extraction with an overridden metadata produces the correct output. The test name is misleading relative to what is actually being tested.

public void testExtractWithOverrideMap() throws Exception {
    IndexShard shard = newStartedShard(true, virtualShardSettings());
    int docCount = 50;
    for (int i = 0; i < docCount; i++) {
        indexDoc(shard, "_doc", String.valueOf(i), "{\"value\":" + i + "}");
    }
    shard.refresh("test");
    flushShard(shard, true);

    int targetVShard = 0;

    int expectedCount = 0;
    IndexMetadata indexMetadata = shard.indexSettings.getIndexMetadata();
    for (int i = 0; i < docCount; i++) {
        int vShardId = VirtualShardRoutingHelper.computeVirtualShardId(indexMetadata, String.valueOf(i), null);
        if (vShardId == targetVShard) {
            expectedCount++;
        }
    }

    Path tempPath = createTempDir("vshard_override");
    shard.extractVirtualShard(targetVShard, tempPath);

    try (Directory dir = FSDirectory.open(tempPath); DirectoryReader reader = DirectoryReader.open(dir)) {
        assertEquals("Override map should not change vShard membership", expectedCount, reader.numDocs());
    }

    int defaultPhysical = VirtualShardRoutingHelper.resolvePhysicalShardId(shard.indexSettings.getIndexMetadata(), targetVShard);
    assertEquals(0, defaultPhysical);

    Map<String, String> overrides = new HashMap<>();
    overrides.put(String.valueOf(targetVShard), "3");
    IndexMetadata.Builder overrideBuilder = IndexMetadata.builder("test_override")
        .settings(
            Settings.builder()
                .put(IndexMetadata.SETTING_VERSION_CREATED, Version.CURRENT)
                .put(IndexMetadata.SETTING_NUMBER_OF_VIRTUAL_SHARDS, TOTAL_VIRTUAL_SHARDS)
        )
        .numberOfShards(5)
        .numberOfReplicas(1);
    overrideBuilder.putCustom(VirtualShardRoutingHelper.VIRTUAL_SHARDS_CUSTOM_METADATA_KEY, overrides);
    IndexMetadata overriddenMetadata = overrideBuilder.build();
    int overriddenPhysical = VirtualShardRoutingHelper.resolvePhysicalShardId(overriddenMetadata, targetVShard);
    assertEquals(3, overriddenPhysical);

    closeShards(shard);
}

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Pass pre-computed effective routing to avoid inconsistency

The original code used effectiveRouting (which defaults to id when routing is null)
and partitionOffset already computed above. The new computeVirtualShardId call
recomputes both internally, but passes the raw routing parameter instead of
effectiveRouting. While computeVirtualShardId handles null routing by falling back
to id, the partitionOffset is now computed twice — once here and once inside the
helper — which is redundant but not incorrect. However, the effectiveRouting
variable computed just before this block is now unused, which may cause a compiler
warning and indicates a subtle inconsistency in the refactoring.

server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [541]

-int vShardId = VirtualShardRoutingHelper.computeVirtualShardId(indexMetadata, id, routing);
+int vShardId = VirtualShardRoutingHelper.computeVirtualShardId(indexMetadata, id, effectiveRouting);
Suggestion importance[1-10]: 3

__

Why: While computeVirtualShardId handles null routing by falling back to id internally, passing effectiveRouting instead of routing would be more consistent with the surrounding code. However, the behavior is functionally equivalent since both handle the null-routing case the same way, making this a minor style/consistency improvement.

Low
Guard test assumption about empty virtual shard

The test assumes that (occupiedVShard + 1) % TOTAL_VIRTUAL_SHARDS is an empty
virtual shard, but with only one document this is likely true. However, this is not
guaranteed — if by chance the next vShard also maps to a document (not possible with
one doc, but fragile as a pattern). More importantly, the test does not assert that
emptyVShard is actually different from occupiedVShard, which would be a problem if
TOTAL_VIRTUAL_SHARDS == 1. Add an assertion to guard this assumption.

server/src/test/java/org/opensearch/index/shard/IndexShardVirtualShardTests.java [107-115]

-public void testExtractEmptyVirtualShard() throws Exception {
-    IndexShard shard = newStartedShard(true, virtualShardSettings());
-    indexDoc(shard, "_doc", "1", "{\"value\":1}");
-    shard.refresh("test");
-    flushShard(shard, true);
+IndexMetadata indexMetadata = shard.indexSettings.getIndexMetadata();
+int occupiedVShard = VirtualShardRoutingHelper.computeVirtualShardId(indexMetadata, "1", null);
+int emptyVShard = (occupiedVShard + 1) % TOTAL_VIRTUAL_SHARDS;
+assumeTrue("Need at least 2 virtual shards for this test", TOTAL_VIRTUAL_SHARDS > 1);
+assertNotEquals(occupiedVShard, emptyVShard);
 
-    IndexMetadata indexMetadata = shard.indexSettings.getIndexMetadata();
-    int occupiedVShard = VirtualShardRoutingHelper.computeVirtualShardId(indexMetadata, "1", null);
-    int emptyVShard = (occupiedVShard + 1) % TOTAL_VIRTUAL_SHARDS;
-
Suggestion importance[1-10]: 3

__

Why: Since TOTAL_VIRTUAL_SHARDS is a constant set to 20, the guard assumeTrue("Need at least 2 virtual shards", TOTAL_VIRTUAL_SHARDS > 1) would always pass, making it a no-op. The assertNotEquals is also trivially true with 20 virtual shards. This is a minor defensive improvement with negligible practical impact.

Low
Document reader lifecycle to prevent resource confusion

The wrappedReaders are FilterCodecReader instances wrapping readers from the
searcher acquired outside this try block. The IndexWriter is closed before the
searcher is closed (since the searcher's try-with-resources wraps this block), so
the underlying readers remain valid. However, if wrappedReaders is empty,
writer.commit() still runs and creates an empty index — this is intentional per the
method contract, but the forceMerge(1) call inside the if block means an empty index
won't be force-merged, which is fine. The real concern is that wrappedReaders are
never explicitly closed; FilterCodecReader does not own the underlying reader, so
this is acceptable, but it should be documented.

server/src/main/java/org/opensearch/index/merge/VirtualShardFilteredMergePolicy.java [84-91]

 try (Directory outDir = FSDirectory.open(outputPath); IndexWriter writer = new IndexWriter(outDir, iwc)) {
     if (!wrappedReaders.isEmpty()) {
         writer.addIndexes(wrappedReaders.toArray(new CodecReader[0]));
         writer.forceMerge(1);
     }
+    // commit even if empty to produce a valid index at outputPath
     writer.commit();
 }
+// wrappedReaders wrap the searcher's readers and are not owned here; closed via searcher
Suggestion importance[1-10]: 1

__

Why: This suggestion only adds comments to the existing code without changing any logic. The improved_code is functionally identical to the existing_code, making this a documentation-only change with minimal impact.

Low
Possible issue
Reset field visitor state between documents

The leftToVisit counter is decremented every time _id or _routing fields are
visited, but needsField can be called multiple times for the same field name across
different documents in a batch. Since VShardFieldVisitor is reused per document in
the loop, leftToVisit will reach 0 after the first document and cause early STOP for
subsequent documents. A new VShardFieldVisitor instance should be created per
document, or leftToVisit should be reset between documents.

server/src/main/java/org/opensearch/index/merge/VirtualShardFilteredMergePolicy.java [203-212]

-@Override
-public Status needsField(FieldInfo fieldInfo) {
-    switch (fieldInfo.name) {
-        case IdFieldMapper.NAME:
-        case RoutingFieldMapper.NAME:
-            leftToVisit--;
-            return Status.YES;
-        default:
-            return leftToVisit == 0 ? Status.STOP : Status.NO;
+for (int docId = 0; docId < maxDoc; docId++) {
+    if (sourceLiveDocs != null && !sourceLiveDocs.get(docId)) {
+        continue;
+    }
+    VShardFieldVisitor visitor = new VShardFieldVisitor();  // new instance per doc
+    storedFields.document(docId, visitor);
+    if (visitor.id == null) {
+        continue;
+    }
+    int computedVShard = VirtualShardRoutingHelper.computeVirtualShardId(indexMetadata, visitor.id, visitor.routing);
+    if (computedVShard == targetVShardId) {
+        membership[docId] = true;
+        count++;
     }
 }
Suggestion importance[1-10]: 2

__

Why: Looking at the actual code in the PR, a new VShardFieldVisitor instance is already created per document in the loop (VShardFieldVisitor visitor = new VShardFieldVisitor() at line 129), so leftToVisit is reset for each document. The suggestion's concern about reuse is not applicable here, making this suggestion incorrect.

Low

@github-actions
Copy link
Contributor

✅ Gradle check result for 9194a9f: SUCCESS

@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 73.00000% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.32%. Comparing base (6562711) to head (9194a9f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...h/index/merge/VirtualShardFilteredMergePolicy.java 67.94% 14 Missing and 11 partials ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 80.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20896      +/-   ##
============================================
+ Coverage     73.31%   73.32%   +0.01%     
- Complexity    72506    72509       +3     
============================================
  Files          5819     5820       +1     
  Lines        331095   331192      +97     
  Branches      47829    47853      +24     
============================================
+ Hits         242749   242860     +111     
+ Misses        68831    68812      -19     
- Partials      19515    19520       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant