Skip to content

[WIP] Native engine abstractions#20821

Draft
bharath-techie wants to merge 6 commits intoopensearch-project:mainfrom
bharath-techie:native-eng
Draft

[WIP] Native engine abstractions#20821
bharath-techie wants to merge 6 commits intoopensearch-project:mainfrom
bharath-techie:native-eng

Conversation

@bharath-techie
Copy link
Contributor

@bharath-techie bharath-techie commented Mar 10, 2026

Description

  • This PR wires the Index shard with backend plugin engines.
  • Has changes for CatalogSnapshot, DataFormatRegistry , CompositeEngine, IndexFileDeleter etc which might be part of other indexing PRs such # 20675 - but added here to give clarity to flow
  • Search exec engine is extension of existing engine bridge but also holds the state specific to shard specific state such as reader manager which ties the index / catalog snapshot lifecycle with the backend engine and plugins and in future it will have ties to cache etc
  • Reader managers in backend plugins is not refcounted , readers can only can be retrieved with a catalog snapshot and a catalog snapshot when it gets acquired / released gets ref counted, so we will fully rely on catalog snapshot for reader / files management
  • WIP :
    • Need to wire response side flow in DF
    • Data format class
    • Naming conventions for index / source provider and interfaces for the same - this can change , what this PR intends to do is associate context

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

public CompositeEngine(List<SearchEnginePlugin> plugins, ShardPath shardPath) throws IOException {
Map<String, List<SearchExecEngine<?, ?>>> engines = new HashMap<>();
for (SearchEnginePlugin plugin : plugins) {
SearchExecEngine<?, ?> engine = plugin.createSearchExecEngine(shardPath);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworking to avoid this in composite engine - we will do it in index shard. Here we'll just tie SPIs of backend plugins that listens to deletes and refreshes.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 4aad14e.

PathLineSeverityDescription
sandbox/plugins/engine-datafusion/build.gradle22lowSecurity manager is explicitly disabled for tests (`tests.security.manager = false`). While common for JNI-based plugins requiring privileged operations, this removes sandbox protections during test execution and could mask unsafe native library behavior. Same pattern repeated in engine-lucene/build.gradle.
sandbox/plugins/engine-lucene/src/main/java/org/opensearch/lucene/LuceneEngineSearcher.java64lowStatic `ConcurrentHashMap` instances `activeWeights` and `activeScorers` are shared global state accessible to JNI callbacks from native code. If the companion native library (`opensearch_datafusion_jni`) were malicious or compromised, it could enumerate or abuse these opaque pointer handles to access Lucene Weight/DocIdSetIterator objects across unrelated search operations. No capacity bound is enforced, which also permits unbounded memory growth if release methods are never called.
server/src/main/java/org/opensearch/index/shard/IndexShard.java2215low`setCompositeEngine(CompositeEngine)` is a public method added to `IndexShard` with no authorization or access-control guard. Any plugin code running in the same JVM could replace the active CompositeEngine with a custom implementation, potentially intercepting or manipulating all search execution for a shard. Consider restricting visibility or requiring a privilege check.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 0 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

PR Reviewer Guide 🔍

(Review updated until commit 7f5f3e6)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Core DataFormatAwareEngine abstractions and lifecycle management

Relevant files:

  • server/src/main/java/org/opensearch/index/engine/DataFormatAwareEngine.java
  • server/src/main/java/org/opensearch/index/engine/exec/DataFormatAwareEngineFactory.java
  • server/src/main/java/org/opensearch/index/engine/exec/IndexFileDeleter.java
  • server/src/main/java/org/opensearch/index/engine/exec/DataFormatEngineCatalogSnapshotListener.java
  • server/src/main/java/org/opensearch/index/engine/exec/CollectorQueryLifecycleManager.java
  • server/src/main/java/org/opensearch/index/engine/exec/FileMetadata.java
  • server/src/test/java/org/opensearch/index/engine/dataformat/DataFormatPluginTests.java

Sub-PR theme: DataFusion native engine plugin integration

Relevant files:

  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java
  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java
  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DatafusionContext.java
  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DatafusionReaderManager.java
  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DatafusionResultStream.java
  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DatafusionSearcher.java
  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/NativeRuntimeHandle.java
  • sandbox/libs/analytics-framework/src/main/java/org/opensearch/analytics/backend/jni/NativeHandle.java

Sub-PR theme: Lucene backend plugin and analytics engine plan executor wiring

Relevant files:

  • sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneIndexFilterProvider.java
  • sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneIndexFilterContext.java
  • sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneReaderManager.java
  • sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneSearchContext.java
  • sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneSourceProvider.java
  • sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java
  • sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/AnalyticsPlugin.java

⚡ Recommended focus areas for review

Race Condition

In setLatestSnapshot, the read of this.latestSnapshot and the write are not atomic. Under concurrent calls (e.g., two refreshes racing), the old snapshot could be decRef'd twice or the new snapshot could be missed. The volatile field alone does not provide the compare-and-swap semantics needed here; an AtomicReference with getAndSet would be safer.

public void setLatestSnapshot(CatalogSnapshot snapshot) {
    CatalogSnapshot prev = this.latestSnapshot;
    this.latestSnapshot = snapshot;
    if (prev != null) {
        prev.decRef();
    }
}
Snapshot Leak on Close

DataFormatAwareEngine.close() does not decRef latestSnapshot. If the engine is closed while holding a snapshot reference (set via setLatestSnapshot), the snapshot's ref count will never reach zero and its files will never be cleaned up.

public void close() throws IOException {
    List<Exception> exceptions = new ArrayList<>();
    closeSupplierInstances(engineSuppliers.values(), exceptions);
    closeSupplierInstances(indexFilterProviderSuppliers.values(), exceptions);
    closeSupplierInstances(sourceProviderSuppliers.values(), exceptions);
    for (EngineReaderManager<?> rm : readerManagers.values()) {
        if (rm instanceof Closeable) {
            try {
                ((Closeable) rm).close();
            } catch (Exception e) {
                exceptions.add(e);
            }
        }
    }
    if (exceptions.isEmpty() == false) {
        IOException ioException = new IOException("Failed to close CompositeEngine resources");
        for (Exception e : exceptions) {
            ioException.addSuppressed(e);
        }
        throw ioException;
    }
}
NPE Risk

In onDeleted, readers.remove(catalogSnapshot) can return null if the snapshot was never registered (e.g., afterRefresh was skipped). Calling .close() on null will throw a NullPointerException. A null check should be added before calling close().

public void onDeleted(CatalogSnapshot catalogSnapshot) throws IOException {
    readers.remove(catalogSnapshot).close();
}
NPE Risk

Same pattern as DatafusionReaderManager.onDeletedreaders.remove(catalogSnapshot) may return null, and calling .close() on it will throw a NullPointerException.

public void onDeleted(CatalogSnapshot catalogSnapshot) throws IOException {
    readers.remove(catalogSnapshot).close();
}
Incomplete Implementation

segregateFilesByFormat always returns an empty map (TODO body), meaning addFileReferences and removeFileReferences never track any files. This makes the entire IndexFileDeleter a no-op, so file lifecycle notifications will never fire and files will never be cleaned up.

private Map<DataFormat, Collection<String>> segregateFilesByFormat(CatalogSnapshot snapshot) {
    Map<DataFormat, Collection<String>> dfSegregatedFiles = new HashMap<>();
    // TODO
    return dfSegregatedFiles;
}

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

PR Code Suggestions ✨

Latest suggestions up to 7f5f3e6

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix race condition in snapshot replacement

The setLatestSnapshot method is not thread-safe. Between reading this.latestSnapshot
into prev and writing the new snapshot, another thread could call setLatestSnapshot
concurrently, causing the old snapshot to be decRef'd twice or the new snapshot to
be overwritten. Use an AtomicReference with getAndSet to make this operation atomic.

server/src/main/java/org/opensearch/index/engine/DataFormatAwareEngine.java [91-97]

 public void setLatestSnapshot(CatalogSnapshot snapshot) {
-    CatalogSnapshot prev = this.latestSnapshot;
-    this.latestSnapshot = snapshot;
+    CatalogSnapshot prev = ((AtomicReference<CatalogSnapshot>) latestSnapshotRef).getAndSet(snapshot);
     if (prev != null) {
         prev.decRef();
     }
 }
Suggestion importance[1-10]: 7

__

Why: The setLatestSnapshot method has a real race condition between reading latestSnapshot and writing the new value. However, the improved code references latestSnapshotRef which doesn't exist in the PR (the field is declared as volatile CatalogSnapshot latestSnapshot), making the improved code incorrect as-is. The issue is valid but the fix needs to also change the field declaration.

Medium
Prevent NullPointerException on missing snapshot deletion

If readers.remove(catalogSnapshot) returns null (e.g., the snapshot was never added
or already removed), calling .close() on it will throw a NullPointerException. Add a
null check before calling close().

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DatafusionReaderManager.java [52-54]

 @Override
 public void onDeleted(CatalogSnapshot catalogSnapshot) throws IOException {
-    readers.remove(catalogSnapshot).close();
+    DatafusionReader reader = readers.remove(catalogSnapshot);
+    if (reader != null) {
+        reader.close();
+    }
 }
Suggestion importance[1-10]: 7

__

Why: The onDeleted method calls .close() directly on the result of readers.remove() without a null check, which will throw a NullPointerException if the snapshot was never added or already removed. The fix is straightforward and prevents a real NPE bug.

Medium
Fix incorrect pointer used in native stream close

The doClose() method uses ptr (the base handle pointer) to close the stream, but
StreamHandle has a separate streamPtr field that represents the native stream
pointer. If the stream and the base handle are distinct native resources, streamPtr
should be used here instead of ptr to avoid a resource leak or incorrect pointer
usage.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/jni/StreamHandle.java [31-33]

 @Override
 protected void doClose() {
-    NativeBridge.streamClose(ptr);
+    NativeBridge.streamClose(streamPtr);
 }
Suggestion importance[1-10]: 7

__

Why: StreamHandle holds two pointers: ptr (base handle) and streamPtr (native stream). Using ptr in streamClose may close the wrong resource or cause a leak if they are distinct native objects. However, without knowing the native API contract, this could also be intentional.

Medium
Ensure search context is always closed

The engineContext created by searchEngine.createContext(...) is never closed, which
may leak resources (e.g., native memory, file handles). The context should be closed
in a try-with-resources or finally block after execution.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [67-79]

 try (DataFormatAwareEngine.DataFormatAwareReader dataFormatAwareReader = dataFormatAwareEngine.acquireReader()) {
     Object reader = dataFormatAwareReader.getReader(format);
     SearchExecEngine searchEngine = dataFormatAwareEngine.getSearchExecEngine(format);
     Object plan = searchEngine.convertFragment(logicalFragment);
     var engineContext = searchEngine.createContext(reader, plan, null, null, null);
-    Object result = searchEngine.execute(engineContext);
-
-    // TODO: consume result stream into rows
-    logger.info("[DefaultPlanExecutor] Executed via [{}]", plugin.name());
-    return new ArrayList<>();
+    try {
+        Object result = searchEngine.execute(engineContext);
+        // TODO: consume result stream into rows
+        logger.info("[DefaultPlanExecutor] Executed via [{}]", plugin.name());
+        return new ArrayList<>();
+    } finally {
+        engineContext.close();
+    }
 }
Suggestion importance[1-10]: 6

__

Why: The engineContext is never closed, which could leak resources like native memory or file handles. The suggested fix wraps execution in a try-finally to ensure engineContext.close() is always called, which is a valid resource management improvement.

Low
Prevent silent resource leaks in default close method

The default close() implementation is a no-op, which means implementations that hold
native or I/O resources will silently leak them if they forget to override close().
Since the Javadoc says "Callers should use try-with-resources to ensure cleanup,"
the default should either throw an UnsupportedOperationException or be removed to
force implementors to provide a real cleanup.

server/src/main/java/org/opensearch/index/engine/exec/SegmentCollector.java [35-36]

 @Override
-default void close() {}
+default void close() {
+    throw new UnsupportedOperationException("SegmentCollector implementations must override close()");
+}
Suggestion importance[1-10]: 4

__

Why: A no-op default close() can silently swallow resource cleanup for implementations that forget to override it. However, throwing UnsupportedOperationException by default is an unusual pattern for Closeable and may break existing or future implementations that legitimately have nothing to close.

Low
General
Improve type safety of reader retrieval method

The getReader method returns a raw Object, which loses type safety and forces
callers to perform unchecked casts. Consider using a generic type parameter or a
bounded return type to make the API safer and more expressive.

server/src/main/java/org/opensearch/index/engine/exec/CatalogSnapshot.java [138]

-public abstract Object getReader(DataFormat dataFormat);
+public abstract <T> T getReader(DataFormat dataFormat);
Suggestion importance[1-10]: 5

__

Why: Returning raw Object from getReader forces callers to perform unchecked casts, reducing type safety. Using a generic return type <T> T getReader(DataFormat dataFormat) is a reasonable improvement, though it still requires unchecked casts internally.

Low
Return defensive copy to protect mutable internal state

Returning the internal byte[] array directly exposes the mutable internal state of
the object, allowing callers to modify the array contents. Return a defensive copy
to preserve immutability of the stored plan bytes.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DatafusionQuery.java [29-31]

 public byte[] getSubstraitBytes() {
-    return substraitBytes;
+    return substraitBytes == null ? null : substraitBytes.clone();
 }
Suggestion importance[1-10]: 4

__

Why: Returning the internal byte[] directly exposes mutable state, which is a valid concern for immutability. However, for performance-sensitive serialized plan bytes, defensive copying may have overhead, and the impact depends on usage patterns.

Low

Previous suggestions

Suggestions up to commit 338bc6e
Suggestions up to commit 4aad14e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent double-release of reader reference

The reader is released twice: once when the DatafusionSearcher is closed (via the
onClose lambda) and once when the supplier's doClose() is called. This
double-release will decrement the ref count below the expected value, potentially
causing premature native resource cleanup. The supplier's doClose should only
release the reader if no searcher was acquired, or the searcher's close callback
should be the sole release path.

sandbox/plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/DatafusionSearchExecEngine.java [88-111]

 @Override
 public EngineSearcherSupplier<DatafusionSearcher> acquireSearcherSupplier() throws IOException {
     DatafusionReader reader = readerManager.acquire();
     return new EngineSearcherSupplier<>() {
+        private final java.util.concurrent.atomic.AtomicBoolean searcherAcquired = new java.util.concurrent.atomic.AtomicBoolean(false);
+
         @Override
         protected DatafusionSearcher acquireSearcherInternal(String source) {
+            searcherAcquired.set(true);
             return new DatafusionSearcher(source, reader.getReaderPtr(), () -> {
                 try {
                     readerManager.release(reader);
                 } catch (IOException e) {
                     throw new UncheckedIOException(e);
                 }
             });
         }
 
         @Override
         protected void doClose() {
-            try {
-                readerManager.release(reader);
-            } catch (IOException e) {
-                throw new UncheckedIOException(e);
+            // Only release if no searcher was acquired (searcher's onClose handles release otherwise)
+            if (!searcherAcquired.get()) {
+                try {
+                    readerManager.release(reader);
+                } catch (IOException e) {
+                    throw new UncheckedIOException(e);
+                }
             }
         }
     };
 }
Suggestion importance[1-10]: 8

__

Why: The doClose() method and the searcher's onClose lambda both call readerManager.release(reader), causing a double-decrement of the ref count. This is a real correctness bug that would lead to premature native resource cleanup once the native bridge is wired.

Medium
Fix incorrect guard condition for runtime pointer

The condition ptr == 0L && lifecycle.started() == false only throws when both
conditions are true, meaning it silently returns 0 when the service is started but
the runtime pointer is still 0 (placeholder). This could cause
null-pointer-equivalent bugs in JNI calls. The check should throw if the pointer is
0 regardless of lifecycle state, or at minimum use || instead of &&.

sandbox/plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/DataFusionService.java [81-87]

 public long getRuntimePointer() {
+    if (lifecycle.started() == false) {
+        throw new IllegalStateException("DataFusionService has not been started");
+    }
     long ptr = runtimePointer;
-    if (ptr == 0L && lifecycle.started() == false) {
-        throw new IllegalStateException("DataFusionService has not been started");
+    if (ptr == 0L) {
+        throw new IllegalStateException("DataFusionService runtime pointer is not initialized");
     }
     return ptr;
 }
Suggestion importance[1-10]: 7

__

Why: The && condition means a started service with runtimePointer == 0L (the current placeholder state) silently returns 0, which would cause JNI crashes. Separating the lifecycle check from the pointer check is more correct, though the issue is partially masked by the TODO placeholder nature of the code.

Medium
Fix race condition in reader acquisition

There is a TOCTOU race condition: the reader could be swapped out (set to null or
replaced) between currentReader.get() and reader.incRef(). If the old reader is
closed concurrently, incRef would increment a ref count on an already-closed reader.
The acquire should be done atomically, checking that the reader hasn't been replaced
after incrementing the ref count.

sandbox/plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/DatafusionReaderManager.java [33-40]

 @Override
 public DatafusionReader acquire() throws IOException {
-    DatafusionReader reader = currentReader.get();
-    if (reader == null) {
-        throw new IOException("No DataFusion reader available");
+    while (true) {
+        DatafusionReader reader = currentReader.get();
+        if (reader == null) {
+            throw new IOException("No DataFusion reader available");
+        }
+        reader.incRef();
+        // Verify the reader hasn't been swapped out after we incremented
+        if (currentReader.get() == reader) {
+            return reader;
+        }
+        // Reader was replaced; undo the incRef and retry
+        reader.decRef();
     }
-    reader.incRef();
-    return reader;
 }
Suggestion importance[1-10]: 6

__

Why: There is a genuine TOCTOU race between currentReader.get() and reader.incRef() in acquire(). The retry loop approach is a valid fix, though the impact is limited since afterRefresh is currently a stub and concurrent swaps won't happen yet.

Low
Guard against empty data files array access

environment.dataFiles() could return an empty array, causing an
ArrayIndexOutOfBoundsException. Additionally, calling .getParent() on the first data
file path could return null if the path has no parent. A null check or a fallback
should be added.

sandbox/plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/DataFusionPlugin.java [88]

-String spillDir = environment.dataFiles()[0].getParent().resolve("tmp").toAbsolutePath().toString();
+java.nio.file.Path[] dataFiles = environment.dataFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+    throw new IllegalStateException("No data directories configured for DataFusion spill");
+}
+java.nio.file.Path parent = dataFiles[0].getParent();
+if (parent == null) {
+    throw new IllegalStateException("Data directory has no parent path: " + dataFiles[0]);
+}
+String spillDir = parent.resolve("tmp").toAbsolutePath().toString();
Suggestion importance[1-10]: 5

__

Why: Accessing environment.dataFiles()[0] without checking for an empty array could cause an ArrayIndexOutOfBoundsException. The null/empty check is a reasonable defensive guard, though in practice OpenSearch always configures at least one data directory.

Low

@github-actions
Copy link
Contributor

❌ Gradle check result for 4aad14e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 3e4d286.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java72highCore search infrastructure import redirected from 'org.opensearch.lucene.util.CombinedBitSet' to 'org.opensearch.be.lucene.util.CombinedBitSet' (sandbox plugin namespace). CombinedBitSet is used in document-level filtering within ContextIndexSearcher, which is a security-sensitive component used by security plugins (e.g., document-level security). The replacement class is not defined anywhere in this diff, meaning its implementation cannot be verified. If the replacement silently returns all bits set or alters filter combination logic, it could bypass document-level security access controls. Redirecting a core search utility to an unverified sandbox implementation without showing the replacement source is a suspicious pattern.
server/src/main/java/org/opensearch/index/engine/exec/IndexFileDeleter.java96lowError reporting in notifyFilesAdded and notifyFilesDeleted uses 'System.err.println' directly instead of the log4j logging framework used throughout the codebase. This bypasses the standard structured logging/auditing pipeline. While not clearly malicious, silently swallowing file-tracking failures and printing to stderr (which may not be captured in audit logs) is anomalous for production-grade OpenSearch code and could hide evidence of file manipulation errors.
server/src/test/java/org/opensearch/index/engine/EngineIntegrationTests.java1lowTest files (EngineIntegrationTests.java and SearchExecEngineTests.java) reference APIs that are inconsistent with the production code added in this diff — e.g., CompositeEngine(List, null) constructor signature, EngineSearcherSupplier, EngineReaderManager.acquire()/release(), getReferenceManager(), acquireSearcherSupplier(). None of these match the SearchExecEngine or CompositeEngine interfaces defined in this PR. These tests would fail to compile, suggesting they may be intentional noise to obscure the actual functionality being introduced, or are artifacts of a different design that were not properly cleaned up.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 1 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 7ef7f9c.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java72mediumImport of CombinedBitSet redirected from core module org.opensearch.lucene.util to sandbox plugin package org.opensearch.be.lucene.util. ContextIndexSearcher is in the critical search hot path; the replacement class implementation is not present in this diff and cannot be verified, creating an unreviewed dependency on a sandbox plugin within core server code.
server/src/test/java/org/opensearch/index/engine/EngineIntegrationTests.java1mediumTest file references APIs that do not match any interface defined in this diff: CompositeEngine(List, null) constructor, getReadEngines(), getPrimaryReadEngine(), EngineSearcherSupplier, and EngineReaderManager.acquire/release. The SearchExecEngine.createContext signature also differs from the defined interface. Suggests these tests target a hidden implementation not included in this PR.
server/src/main/java/org/opensearch/index/engine/exec/IndexFileDeleter.java97lowError handling uses System.err.println instead of the Log4j logger used everywhere else in the codebase, bypassing structured logging and making file deletion notification failures harder to audit in production.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java55lowSystem.loadLibrary loads native library opensearch_datafusion_jni whose binary is not included in this diff. While expected for JNI integration, the native binary executes outside JVM security controls and its contents are unreviewed here.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 2 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ade5591.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java72mediumCore server class `ContextIndexSearcher` has its `CombinedBitSet` import redirected from `org.opensearch.lucene.util.CombinedBitSet` to `org.opensearch.be.lucene.util.CombinedBitSet`. The target package (`org.opensearch.be.lucene`) is a new plugin package introduced in this PR, and the `util.CombinedBitSet` class is not defined anywhere in this diff. This is architecturally inverted (server depending on plugin code) and unrelated to the stated PR purpose of adding a DataFusion analytics backend. The change could redirect core search bitset operations through plugin-controlled code.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java57low`System.loadLibrary(NATIVE_LIBRARY_NAME)` loads a native library named `opensearch_datafusion_jni` at runtime with no integrity check (no hash verification, no signature check). While JNI loading is a standard pattern, loading an unsigned native library at node startup from a path controlled by the filesystem represents a potential binary substitution vector. This is expected for JNI but warrants noting.
server/src/main/java/org/opensearch/index/engine/exec/IndexFileDeleter.java107lowError handling in `notifyFilesAdded` and `notifyFilesDeleted` uses `System.err.println` instead of the Log4j logger used elsewhere in the codebase. While likely just poor coding practice, bypassing the logging framework in error paths means these failures are invisible to standard OpenSearch log monitoring and auditing.
server/src/test/java/org/opensearch/index/engine/EngineIntegrationTests.java1lowTest files (`EngineIntegrationTests.java`, `SearchExecEngineTests.java`) reference interfaces and method signatures (`getReadEngines()`, `getPrimaryReadEngine()`, `EngineSearcherSupplier`, `EngineReaderManager.acquire()/release()`, `CompositeEngine(List, null)`) that do not match the actual classes defined in this diff. These tests appear to target a different version of the API than what is implemented, suggesting they may be copied from a separate non-public branch or codebase. This inconsistency warrants investigation to confirm the tests correspond to the intended implementation.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 1 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ac2edae.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java72mediumImport changed from core server package 'org.opensearch.lucene.util.CombinedBitSet' to sandbox plugin package 'org.opensearch.be.lucene.util.CombinedBitSet'. This makes a core server component depend on a sandbox/plugin package, which is an unusual architectural coupling. No definition of the new class appears in this diff, raising questions about whether the replacement implementation is identical or subtly different in behavior affecting search result correctness.
server/src/main/java/org/opensearch/index/engine/exec/IndexFileDeleter.java102lowErrors in file notification callbacks are silently swallowed using System.err.println instead of the standard logging framework. This deviates from project conventions and could suppress visibility of failures during file add/delete operations, obscuring issues in file lifecycle management.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java58lowNative library 'opensearch_datafusion_jni' is loaded via System.loadLibrary without any integrity verification (e.g., checksum or signature check). While JNI usage is expected for DataFusion integration, the absence of library validation means a tampered native binary could execute arbitrary code with the JVM's privileges.
sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneEngineSearcher.java49lowStatic ConcurrentHashMaps (activeWeights, activeScorers) are used to hold Weight and Scorer contexts keyed by opaque long pointers shared across all instances. If releaseWeight/releaseCollector are not reliably called (e.g., on exception paths), entries accumulate indefinitely. This is a potential resource exhaustion vector, though more likely a design oversight than intentional.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 1 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

* @opensearch.internal
*/
public interface AnalyticsBackEndPlugin {
public interface AnalyticsBackEndPlugin extends SearchAnalyticsBackEndPlugin {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep these two things separate - the api in SearchAnalyticsBackEndPlugin isn't going to be used by the analytics engine and are largely used for registration/indexing with composite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have the same class extend and implement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

* Create a search context. The reader is provided by {@link org.opensearch.index.engine.CompositeEngine}
* which owns all reader managers.
*/
C createContext(
Copy link
Member

@mch2 mch2 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know that we need this exposed - this was my motive for the bridge interface as is. The bridge can manage the lifecycle of the engine specific context vs exposing it outside and is built from the given snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a common view of catalog snapshot or reader of the common view exposed. So that + plan is what we always send to this method.

Mainly this enables filter delegates and in general all search actions to maintain their own contexts through the query lifecycle.

I'm okay with not exposing this as well as long as we are able to provide the context when we initialize the delegates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's roughly what i'm thinking - where snapshot below is replaced with the shared reader context.

        try (CompositeEngine.ReleasableRef<CatalogSnapshot> snapshot = engine.acquireSnapshot()) {
            EngineBridge<byte[], ? extends EngineResultStream, RelNode> bridge =
                (EngineBridge<byte[], ? extends EngineResultStream, RelNode>) plugin.bridge(engine, snapshot.getRef());

            byte[] converted = bridge.convertFragment(logicalFragment);

            List<Object[]> rows = new ArrayList<>();
            try (EngineResultStream resultStream = bridge.execute(converted)) {
...
                    }
                }
            }

The bridge interface (poorly named) was intended to be the point in time searcher specific to the back-end, built from the shared snapshot/reader. Which is basically this one, so it makes to replace it but thinking it should be the responsibility of AnalyticsBackEndPlugin to vend it on demand from the point in time state/reader managed by the composite engine. Sort of like CompositeEngine's getSearchExecEngine, but rather than requiring the CompositeEngine to maintain that its only worrying about refreshing readers behind the scenes and vending the composite reader.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit cc1fb42.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java72mediumImport of CombinedBitSet redirected from 'org.opensearch.lucene.util' to 'org.opensearch.be.lucene.util' — a new plugin-owned package added in this PR. This ties a core search component to a new package whose implementation is not shown in this diff. If the new class differs in behavior from the original, it could intercept or alter search result filtering for all queries processed by ContextIndexSearcher.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java58lowSystem.loadLibrary('opensearch_datafusion_jni') loads a native JNI library by bare name, relying on the JVM's library path resolution. No path pinning, signature verification, or integrity check is performed on the native binary before loading. A malicious library with the same name placed earlier in java.library.path would be silently loaded instead.
server/src/main/java/org/opensearch/index/engine/exec/IndexFileDeleter.java101lowError handling uses System.err.println instead of the configured logger. Errors are silently swallowed from the logging framework, making file deletion/addition failures invisible to operators and monitoring systems. This also bypasses any log-level controls and audit trails, which is unusual given every other class in this PR uses Log4j.
sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneEngineSearcher.java44lowStatic ConcurrentHashMaps (activeWeights, activeScorers) keyed by incrementing long IDs are shared across all shard instances with no TTL or maximum size bound. If releaseWeight/releaseCollector are not called (e.g., due to exceptions in native Rust code), the maps grow unboundedly, holding references to Lucene Weight/Scorer objects and their associated index readers indefinitely, preventing garbage collection.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 1 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 86a9747.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java72mediumImport of CombinedBitSet redirected from the established core package 'org.opensearch.lucene.util' to a new, unverified plugin package 'org.opensearch.be.lucene.util'. CombinedBitSet is used in critical document-level filtering in the search path. The replacement class is not introduced anywhere in this diff, raising the question of whether a pre-existing class with that path silently substitutes behavior, or if this is an intentional package reorganization that warrants verification of the new implementation.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java60lowSystem.loadLibrary('opensearch_datafusion_jni') loads a native library at node startup via the DataFusionService lifecycle. While standard JNI practice, the library is loaded without path pinning or integrity verification, meaning a tampered or substituted native library on the node's library path would execute with full JVM privileges. This is contextually consistent with the stated DataFusion integration purpose.
sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneEngineSearcher.java47lowStatic, JVM-wide ConcurrentHashMaps (activeWeights, activeScorers) keyed by auto-incrementing long IDs store per-query Weight and Scorer contexts shared across all shard and engine instances. In a multi-tenant or multi-shard environment, predictable ID sequences could in principle allow one query's execution context to observe or interfere with another's. Intended for JNI callbacks but the static scope creates unintended cross-query reachability.
server/src/main/java/org/opensearch/index/engine/exec/IndexFileDeleter.java109lowError handling in notifyFilesAdded() and notifyFilesDeleted() uses System.err.println() rather than a logger, and silently swallows all exceptions from CompositeEngine notifications. This means failures to notify the engine about added or deleted files (including potential cache poisoning or stale-reader conditions) are invisible to standard log monitoring and alerting pipelines.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 1 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5f761ba.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java72mediumImport of CombinedBitSet changed from core package 'org.opensearch.lucene.util' to plugin package 'org.opensearch.be.lucene.util'. The class at the new location is not defined anywhere in this diff. If this resolves to a plugin-supplied class with altered behavior, it could influence how query bit-sets are computed in all search operations through this core class.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java60mediumSystem.loadLibrary("opensearch_datafusion_jni") loads a native library by bare name, resolved via java.library.path at runtime. No integrity check, path pinning, or signature verification is performed. A malicious actor with control over java.library.path or LD_LIBRARY_PATH could substitute a malicious native library that runs with full JVM privileges.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java68lowHardcoded fake native pointer 'long ptr = 1L' is used as a placeholder NativeRuntimeHandle. If this placeholder reaches production and native JNI code dereferences pointer 1, it will cause a JVM crash (SIGSEGV). This is a placeholder comment, but the value is live in the code path.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java150lowgetSupportedFormats() returns null instead of an empty list. CompositeEngineFactory iterates over this return value with a for-each loop, which will throw a NullPointerException at startup for every shard using this plugin. This could be used as a denial-of-service against shard initialization.
server/src/main/java/org/opensearch/index/engine/exec/WriterFileSet.java155lowgetTotalSize() constructs file paths from user-controlled 'directory' and 'file' fields and calls Files.size() on them, silently catching and discarding IOExceptions. While read-only, this probes arbitrary paths on the filesystem and suppresses all errors, masking path traversal or permission issues.
sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DatafusionReaderManager.java41lowreaders field is a non-synchronized HashMap accessed from methods (getReader, afterRefresh, onDeleted) that can be called concurrently during refresh and shard close. This is a data race that can corrupt internal state, potentially causing stale or double-closed native readers.
server/src/test/java/org/opensearch/index/engine/EngineIntegrationTests.java1lowTest class references APIs (CompositeEngine(List, null), EngineSearcherSupplier, EngineReaderManager.acquire/release, SearchExecEngine with ReaderContext/BigArrays parameters) that do not match the interfaces defined in the production code of this same diff. These tests will not compile, suggesting they were written against a different or future API version and included prematurely, creating dead code that obscures the actual API contract.

The table above displays the top 10 most important findings.

Total: 7 | Critical: 0 | High: 0 | Medium: 2 | Low: 5


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

* @opensearch.internal
*/
public interface AnalyticsBackEndPlugin {
public interface AnalyticsBackEndPlugin extends SearchAnalyticsBackEndPlugin {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

return readerManagers.get(format);
}

public SearchExecEngine<?, ?> getSearchExecEngine(DataFormat format) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we search from the analytics-engine, we're going to end up with a fragment that is assigned to a particular engine from query planning. We'll need to fetch by engine name, not dataformat - probably just as simple as maintaining a reader by name map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i think the back-ends can build the searcher / searchExecEngine on the fly, vs having to maintain this state. We'd just need to fetch the reader to build searcher from?

* Create a search context. The reader is provided by {@link org.opensearch.index.engine.CompositeEngine}
* which owns all reader managers.
*/
C createContext(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's roughly what i'm thinking - where snapshot below is replaced with the shared reader context.

        try (CompositeEngine.ReleasableRef<CatalogSnapshot> snapshot = engine.acquireSnapshot()) {
            EngineBridge<byte[], ? extends EngineResultStream, RelNode> bridge =
                (EngineBridge<byte[], ? extends EngineResultStream, RelNode>) plugin.bridge(engine, snapshot.getRef());

            byte[] converted = bridge.convertFragment(logicalFragment);

            List<Object[]> rows = new ArrayList<>();
            try (EngineResultStream resultStream = bridge.execute(converted)) {
...
                    }
                }
            }

The bridge interface (poorly named) was intended to be the point in time searcher specific to the back-end, built from the shared snapshot/reader. Which is basically this one, so it makes to replace it but thinking it should be the responsibility of AnalyticsBackEndPlugin to vend it on demand from the point in time state/reader managed by the composite engine. Sort of like CompositeEngine's getSearchExecEngine, but rather than requiring the CompositeEngine to maintain that its only worrying about refreshing readers behind the scenes and vending the composite reader.

bharath-techie and others added 5 commits March 20, 2026 14:13
Signed-off-by: bharath-techie <bharath78910@gmail.com>
Signed-off-by: bharath-techie <bharath78910@gmail.com>
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
* Refactor CompositeEngine to use factory

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* Introduce SegmentCollector

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* Introduce SegmentCollector

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* Introduce SegmentCollector

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

---------

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
* Refactor CompositeEngine to use factory

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* Introduce SegmentCollector

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* Introduce SegmentCollector

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* Introduce SegmentCollector

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* De-couple and simplify index file deleter

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* De-couple and simplify index file deleter

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* De-couple and simplify index file deleter, handle scorer and weight query lifecycle

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* De-couple and simplify index file deleter, handle scorer and weight query lifecycle

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

* De-couple and simplify index file deleter, handle scorer and weight query lifecycle

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>

---------

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 338bc6e

@github-actions
Copy link
Contributor

❌ Gradle check result for 338bc6e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: bharath-techie <bharath78910@gmail.com>
testingConventions.enabled = false

// analytics-framework does not depend on server
// analytics-framework depends on server for SearchAnalyticsBackEndPlugin SPI
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we let the bridge talk to the engine ? if we need to remove this dependency.

We can come back to this after we figure out if context is required in analytics plugin for delegates.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 7f5f3e6

@github-actions
Copy link
Contributor

❌ Gradle check result for 7f5f3e6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

*
* @param readerManagers the per-format reader managers that receive notifications
*/
public CatalogSnapshotLifecycleListener createCatalogSnapshotListener(Map<DataFormat, EngineReaderManager<?>> readerManagers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where will this be called? from the composite engine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants