Skip to content

initial commit of analytics engine plugin to sandbox#20697

Merged
mch2 merged 23 commits intoopensearch-project:mainfrom
mch2:planning-poc
Mar 16, 2026
Merged

initial commit of analytics engine plugin to sandbox#20697
mch2 merged 23 commits intoopensearch-project:mainfrom
mch2:planning-poc

Conversation

@mch2
Copy link
Member

@mch2 mch2 commented Feb 20, 2026

Description

This PR introduces a set of experimental plugins that we are building out related to this issue, that starts moving feature/engine-datafusion into main.

This introduces:
analytics-backend-datafusion - responsible for executing queries against datafusion back-end. No-Op right now.
analytics-engine - main "hub" for wiring up front-ends, back-ends, and a plan executor (component that is responsible for executing a query end-to-end).

For now its largely no-op, providing wiring for back-ends and the engine using ExtensiblePlugin. There is a test front-end inside of the engine that uses the UnifiedQueryAPI from sql plugin that will invoke our executor.

back-ends will implement AnalyticsBackEndPlugin and provide to the engine via spi. that for now defines its capabilities in terms of supported calcite operators/functions, and a path to execution via an EngineBridge. These common types are kept in a sandbox lib.

Related Issues

Related issues:
#19902
#18416

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 937ad71.

PathLineSeverityDescription
gradle/forbidden-dependencies.gradle30mediumSandbox projects exempted from forbidden dependency checks, potentially allowing unsafe dependencies to bypass security validation
sandbox/plugins/extensible-engine/src/main/java/org/opensearch/eqe/ExtensibleQueryEnginePlugin.java62mediumSPI-based extension loading accepts arbitrary ExecutionEngine implementations without validation, allowing potentially malicious engines to be loaded and execute code
sandbox/plugins/extensible-engine/build.gradle86lowUnsafe Netty operations explicitly enabled (io.netty.tryUnsafe=true, io.netty.noUnsafe=false) which allow direct memory access bypassing Java safety checks
sandbox/plugins/extensible-engine/build.gradle80lowJVM module encapsulation weakened with --add-opens flags exposing internal java.base packages to ALL-UNNAMED modules, increasing exploit surface

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 2 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Copy link
Contributor

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the initial draft Marc, looks like a good start
The one high-level concern I have is that one single plugin is responsible for planning, orchestration, substrait conversion etc would become unwieldy overtime, espl when we bring in plan optimisers and orchestrators(stage-based).
Would it make sense to tease out planner/optimisers into a plugin and the maybe orchestrators as yet another while letting server have the stitching logic across these plugins. Also I am not sure if we need the ExecutionEngine to be invoked from the extensible plugin(avoiding the SPI complexity for native execution engine)

@mch2
Copy link
Member Author

mch2 commented Feb 24, 2026

Would it make sense to tease out planner/optimisers into a plugin and the maybe orchestrators as yet another while letting server have the stitching logic across these plugins.

Still trying this out, my thinking with this is to keep it self contained (at least while we're in sandbox) & opt in so we aren't bloating server for users who don't use this plugin. Perhaps we can move planner/optimizer to /libs but it will also carry the calcite dependency with it.

Also I am not sure if we need the ExecutionEngine to be invoked from the extensible plugin(avoiding the SPI complexity for native execution engine)

The query engine is going to have fragments we'll need to send to the native engine for both data nodes and coordinators. So we can wire it via SPI, or through pluggable interface in core but I think the driver should remain in the plugin itself.

@mch2
Copy link
Member Author

mch2 commented Feb 24, 2026

Ok gave this a shot and I think it may work out. Ultimately we have a query-planner plugin, the engine plugin (implementing ExecutionEnginePlugin), and a module/ for the rest (what used to be extensible-engine) - thinking about pulling what i can out to a lib there as well, but this should remove the direct call from the engine into the native executor.

@Bukhtawar
Copy link
Contributor

Ok gave this a shot and I think it may work out. Ultimately we have a query-planner plugin, the engine plugin (implementing ExecutionEnginePlugin), and a module/ for the rest (what used to be extensible-engine) - thinking about pulling what i can out to a lib there as well, but this should remove the direct call from the engine into the native executor.

Thanks Marc for the change, looks awesome. The plugin segregations across planner and executor look pretty neat(thanks for moving the executor to modules to simplify plugin installation for various FE lang). Overall love the simplification while balancing extensibility.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 3190e36.

PathLineSeverityDescription
.gitignore4lowBroad addition of `*.md` to the root .gitignore would silently prevent all future markdown files repo-wide from being tracked by git. While files already tracked are unaffected, this is an unusually broad suppression that could be used to hide sensitive notes or documentation files added later. No clear legitimate rationale for suppressing all markdown at the repo root.
sandbox/plugins/analytics-frontend-ppl/build.gradle28lowJitPack (https://jitpack.io) is added as a Maven repository. JitPack builds artifacts directly from GitHub source at resolution time, which can introduce supply-chain risk if the upstream repo changes. Although the `includeGroup` filter restricts resolution to `com.github.babbel`, the constraint relies on Gradle content filtering being correctly enforced at all times.
sandbox/plugins/analytics-frontend-ppl/build.gradle22lowA snapshot Maven repository (`https://ci.opensearch.org/ci/dbc/snapshots/maven/`) is added for resolving `unified-query-*:3.6.0.0-SNAPSHOT` artifacts. Snapshot dependencies are mutable: the same version string can resolve to different artifacts across builds, making the build non-reproducible and potentially allowing silent artifact substitution if the CI repository is ever compromised.
sandbox/plugins/analytics-engine/src/main/java/org/opensearch/fe/ppl/PlanExecutor.java43lowQuery plan details are logged at INFO level: `logger.info("[PlanExecutor] Executing fragment with {} fields: {}", fieldCount, fragment.explain())`. The `explain()` output of a RelNode can expose internal schema structure, predicate values, and query shapes. In a production deployment this constitutes unintentional information disclosure in application logs.
sandbox/modules/query-engine/src/main/java/org/opensearch/qe/DefaultQueryExecutionEngine.java9lowThe file is located under the `org/opensearch/qe/` source path but declares `package org.opensearch.engine;`. This package/path mismatch is anomalous — it causes the class to be invisible under its filesystem location and could create ambiguity about which module owns the type, complicating security review and audit of the class.

The table above displays the top 10 most important findings.

Total: 5 | Critical: 0 | High: 0 | Medium: 0 | Low: 5


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR Reviewer Guide 🔍

(Review updated until commit aef649d)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Analytics framework SPI and shared library types

Relevant files:

  • sandbox/libs/analytics-framework/src/main/java/org/opensearch/analytics/backend/EngineCapabilities.java
  • sandbox/libs/analytics-framework/src/main/java/org/opensearch/analytics/backend/EngineBridge.java
  • sandbox/libs/analytics-framework/src/main/java/org/opensearch/analytics/spi/AnalyticsBackEndPlugin.java

Sub-PR theme: Analytics engine plugin core implementation and unit tests

Relevant files:

  • sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/AnalyticsPlugin.java
  • sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/schema/OpenSearchSchemaBuilder.java
  • sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rel/OpenSearchBoundaryTableScan.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/PushDownPlanner.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/AbsorbFilterRule.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/AbsorbProjectRule.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/AbsorbAggregateRule.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/AbsorbSortRule.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/BoundaryTableScanRule.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/action/UnifiedQueryService.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/action/TestPPLTransportAction.java
  • sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/compiler/OpenSearchQueryCompiler.java

Sub-PR theme: DataFusion backend plugin and ClickBench integration tests

Relevant files:

  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionBridge.java
  • sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java
  • sandbox/plugins/analytics-engine/src/internalClusterTest/java/org/opensearch/fe/planner/unified/ClickBenchUnifiedPipelineIT.java

⚡ Recommended focus areas for review

Naming Convention

The factory method create(QueryPlanExecutor QueryPlanExecutor) uses a parameter name that starts with an uppercase letter (QueryPlanExecutor), which is a Java naming violation. The parameter name should be lowercase (e.g., queryPlanExecutor) to follow Java conventions and avoid confusion with the type name.

public static BoundaryTableScanRule create(QueryPlanExecutor QueryPlanExecutor) {
    return new BoundaryTableScanRule(
        Config.INSTANCE.withConversion(LogicalTableScan.class, Convention.NONE, EnumerableConvention.INSTANCE, "BoundaryTableScanRule"),
        QueryPlanExecutor
    );
}
Fragile Source Replacement

The runClickBenchQuery method uses multiple sequential String.replace() calls to rewrite source=hits to source=opensearch.hits. This approach is fragile and may miss edge cases or produce incorrect results if the pattern appears in unexpected positions (e.g., inside string literals in the query). A regex-based replacement or a proper PPL AST rewrite would be more robust.

String ppl = rawPpl.replace("source=hits", "source=opensearch.hits")
    .replace("source =hits", "source =opensearch.hits")
    .replace("source= hits", "source= opensearch.hits")
    .replace("source = hits", "source = opensearch.hits");
Null DataContext

The execute() method calls bind(null), passing a null DataContext. Inside bind(), the dataContext parameter is passed to planExecutor.execute(). If any executor implementation uses the DataContext (e.g., for variable bindings or timezone), this will cause a NullPointerException at runtime. The DataContext should be properly threaded through or documented as intentionally null.

public Enumerable<Object[]> execute() {
    return bind(null);
}
Test Isolation

The createTestServiceWithContextTracking method overrides execute() and replicates the real execution logic inline rather than delegating to super.execute(). This means the test is not actually testing the real UnifiedQueryService.execute() implementation for resource cleanup — it's testing a hand-rolled copy. If the real implementation changes, these tests won't catch regressions in cleanup behavior.

private UnifiedQueryService createTestServiceWithContextTracking(PreparedStatement mockStatement, AtomicBoolean contextClosed) {
    return new UnifiedQueryService(mockPlanner, testSchemaProvider()) {
        @Override
        protected PreparedStatement compileAndPrepare(UnifiedQueryContext context, RelNode mixedPlan) {
            return mockStatement;
        }

        @Override
        public PPLResponse execute(String pplText, ClusterState cs) {
            // Replicate the real execute logic but track context cleanup
            RelNode mixed = mockPlanner.plan(mockLogicalPlan);

            try {
                try (PreparedStatement statement = mockStatement) {
                    ResultSet rs = statement.executeQuery();
                    ResultSetMetaData metaData = rs.getMetaData();
                    int columnCount = metaData.getColumnCount();
                    List<String> columns = new ArrayList<>();
                    for (int i = 1; i <= columnCount; i++) {
                        columns.add(metaData.getColumnName(i));
                    }
                    List<Object[]> rows = new ArrayList<>();
                    while (rs.next()) {
                        Object[] row = new Object[columnCount];
                        for (int i = 1; i <= columnCount; i++) {
                            row[i - 1] = rs.getObject(i);
                        }
                        rows.add(row);
                    }
                    return new PPLResponse(columns, rows);
                }
            } catch (Exception e) {
                if (e instanceof RuntimeException) throw (RuntimeException) e;
                throw new RuntimeException(e.getMessage(), e);
            } finally {
                contextClosed.set(true);
            }
        }
    };
}
TODO Comment

There is a TODO comment indicating this schema builder is for illustration only and should be replaced with the sql plugin's version. This is production code in src/main and should not be shipped with placeholder/illustration logic.

*/

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR Code Suggestions ✨

Latest suggestions up to aef649d

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incomplete recursion in boundary node search

The findBoundary method only recurses into nodes with exactly one input (size() ==
1), which means it will silently skip boundary nodes reachable through multi-input
nodes (e.g., joins). This could cause absorbIntoBoundary to be called with a null
boundary, leading to a NullPointerException. The recursion should not be restricted
to single-input nodes.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/AbsorbRuleUtils.java [38-52]

 static OpenSearchBoundaryTableScan findBoundary(RelNode node) {
     for (RelNode rawInput : node.getInputs()) {
         RelNode input = unwrap(rawInput);
         if (input instanceof OpenSearchBoundaryTableScan) {
             return (OpenSearchBoundaryTableScan) input;
         }
-        if (input.getInputs().size() == 1) {
-            OpenSearchBoundaryTableScan found = findBoundary(input);
-            if (found != null) {
-                return found;
-            }
+        OpenSearchBoundaryTableScan found = findBoundary(input);
+        if (found != null) {
+            return found;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 7

__

Why: The findBoundary method restricts recursion to single-input nodes, which can silently miss boundary nodes in multi-input subtrees and cause a NullPointerException in absorbIntoBoundary. Removing the size() == 1 guard makes the search more robust and correct.

Medium
Ensure ResultSet is closed via try-with-resources

The ResultSet obtained from statement.executeQuery() is never explicitly closed.
While closing the PreparedStatement may close the ResultSet in some JDBC drivers, it
is not guaranteed by the JDBC spec. The ResultSet should be closed in a
try-with-resources block to ensure proper resource cleanup.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/action/UnifiedQueryService.java [71-93]

 PreparedStatement statement = compileAndPrepare(context, mixedPlan);
-try (statement) {
-    ResultSet rs = statement.executeQuery();
-    ...
+try (statement; ResultSet rs = statement.executeQuery()) {
+    ResultSetMetaData metaData = rs.getMetaData();
+    int columnCount = metaData.getColumnCount();
+    List<String> columns = new ArrayList<>();
+    for (int i = 1; i <= columnCount; i++) {
+        columns.add(metaData.getColumnName(i));
+    }
+    List<Object[]> rows = new ArrayList<>();
+    while (rs.next()) {
+        Object[] row = new Object[columnCount];
+        for (int i = 1; i <= columnCount; i++) {
+            row[i - 1] = rs.getObject(i);
+        }
+        rows.add(row);
+    }
     return new PPLResponse(columns, rows);
 }
Suggestion importance[1-10]: 6

__

Why: The ResultSet is not explicitly closed, which can lead to resource leaks in JDBC drivers that don't automatically close it when the PreparedStatement is closed. The fix using try-with-resources for both statement and rs is valid and improves resource management.

Low
Avoid returning null for engine capabilities

Returning null from getEngineCapabilities() can cause a NullPointerException when
callers use the returned capabilities without null-checking. The AnalyticsPlugin
binds EngineCapabilities.defaultCapabilities() globally, but any code that calls
getEngineCapabilities() directly on the plugin will receive null. Return a
meaningful default (e.g., EngineCapabilities.defaultCapabilities()) instead.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [37-39]

 @Override
 public EngineCapabilities getEngineCapabilities() {
-    return null;
+    return EngineCapabilities.defaultCapabilities();
 }
Suggestion importance[1-10]: 6

__

Why: Returning null from getEngineCapabilities() can cause NullPointerException in callers that don't null-check. Returning EngineCapabilities.defaultCapabilities() is a safer default, though the actual impact depends on how callers use this method.

Low
General
Use regex for robust source table name rewriting

The string replacement logic for rewriting source=hits is fragile and
order-dependent. If the raw PPL contains source=hits it will be replaced first, but
subsequent replacements for variants with spaces will not match since they were not
present. A single regex replacement would be more robust and handle all whitespace
variants in one pass.

sandbox/plugins/analytics-engine/src/internalClusterTest/java/org/opensearch/fe/planner/unified/ClickBenchUnifiedPipelineIT.java [365-370]

 private void runClickBenchQuery(String queryId) throws Exception {
     String rawPpl = loadQuery(queryId);
-    String ppl = rawPpl.replace("source=hits", "source=opensearch.hits")
-        .replace("source =hits", "source =opensearch.hits")
-        .replace("source= hits", "source= opensearch.hits")
-        .replace("source = hits", "source = opensearch.hits");
+    String ppl = rawPpl.replaceAll("source\\s*=\\s*hits", "source=opensearch.hits");
Suggestion importance[1-10]: 5

__

Why: The multiple replace() calls are fragile and miss other whitespace variants. The regex approach replaceAll("source\\s*=\\s*hits", "source=opensearch.hits") is more robust and handles all whitespace combinations in a single pass, though the improved code changes the output format slightly (normalizing spaces around =).

Low
Explicitly enumerate supported operators instead of using all

SqlStdOperatorTable.instance().getOperatorList() returns all standard SQL operators,
which may include operators that the engine does not actually support. This could
cause unsupported operators to be incorrectly pushed down to the engine. The default
supported functions set should be explicitly enumerated to only include operators
the engine truly supports.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/backend/EngineCapabilities.java [47-52]

 public static EngineCapabilities defaultCapabilities() {
     return new EngineCapabilities(
         Set.of(LogicalTableScan.class, LogicalFilter.class, LogicalAggregate.class, LogicalSort.class),
-        new HashSet<>(SqlStdOperatorTable.instance().getOperatorList())
+        Set.of(
+            SqlStdOperatorTable.EQUALS,
+            SqlStdOperatorTable.NOT_EQUALS,
+            SqlStdOperatorTable.GREATER_THAN,
+            SqlStdOperatorTable.GREATER_THAN_OR_EQUAL,
+            SqlStdOperatorTable.LESS_THAN,
+            SqlStdOperatorTable.LESS_THAN_OR_EQUAL,
+            SqlStdOperatorTable.AND,
+            SqlStdOperatorTable.OR,
+            SqlStdOperatorTable.NOT,
+            SqlStdOperatorTable.IS_NULL,
+            SqlStdOperatorTable.IS_NOT_NULL,
+            SqlStdOperatorTable.LIKE,
+            SqlStdOperatorTable.IN,
+            SqlStdOperatorTable.BETWEEN,
+            SqlStdOperatorTable.COUNT,
+            SqlStdOperatorTable.SUM,
+            SqlStdOperatorTable.MIN,
+            SqlStdOperatorTable.MAX,
+            SqlStdOperatorTable.AVG
+        )
     );
 }
Suggestion importance[1-10]: 5

__

Why: Using all operators from SqlStdOperatorTable as defaults could cause incorrect push-down of unsupported operators. However, this appears to be intentional for a default/illustrative implementation, and the tests in EngineCapabilitiesTests explicitly test with restricted capabilities for unsupported operators, suggesting the current design is deliberate.

Low
Fix parameter name shadowing the type name

The parameter name QueryPlanExecutor starts with an uppercase letter, which shadows
the type name QueryPlanExecutor and is a Java naming convention violation that can
cause confusion or compilation issues in some contexts. Rename the parameter to
follow camelCase convention (e.g., queryPlanExecutor).

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/BoundaryTableScanRule.java [36-41]

-public static BoundaryTableScanRule create(QueryPlanExecutor QueryPlanExecutor) {
+public static BoundaryTableScanRule create(QueryPlanExecutor queryPlanExecutor) {
     return new BoundaryTableScanRule(
         Config.INSTANCE.withConversion(LogicalTableScan.class, Convention.NONE, EnumerableConvention.INSTANCE, "BoundaryTableScanRule"),
-        QueryPlanExecutor
+        queryPlanExecutor
     );
 }
Suggestion importance[1-10]: 5

__

Why: The parameter QueryPlanExecutor shadows the type name QueryPlanExecutor, which is a Java naming convention violation. While it may compile, it reduces readability and can cause confusion. Renaming to queryPlanExecutor is the correct fix.

Low
Prevent exception masking in error message construction

Calling logicalFragment.explain() inside an exception handler can itself throw an
exception, potentially masking the original error. The explain() call should be
wrapped in a try-catch or moved outside the exception message construction to avoid
swallowing the root cause.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rel/OpenSearchBoundaryTableScan.java [110-124]

 @SuppressWarnings("unchecked")
 public Enumerable<Object[]> bind(DataContext dataContext) {
     try {
         Iterable<Object[]> result = (Iterable<Object[]>) planExecutor.execute(logicalFragment, dataContext);
         return Linq4j.asEnumerable(result);
     } catch (Exception e) {
+        String fragmentDesc;
+        try {
+            fragmentDesc = logicalFragment.explain();
+        } catch (Exception ex) {
+            fragmentDesc = logicalFragment.toString();
+        }
         throw new RuntimeException(
             "Engine execution failed for table ["
                 + getTable().getQualifiedName()
                 + "] with logical fragment: "
-                + logicalFragment.explain(),
+                + fragmentDesc,
             e
         );
     }
 }
Suggestion importance[1-10]: 4

__

Why: Calling logicalFragment.explain() inside an exception handler could theoretically throw, but this is an unlikely edge case. The suggestion is valid but has low practical impact since explain() rarely throws in normal Calcite usage.

Low
Restrict field visibility to private

The fields executor, schemaProvider, and capabilities are package-private and not
initialized, so getExecutor(), getSchemaProvider(), and getCapabilities() can return
null if the @Inject method is never called (e.g., when Guice injection is skipped or
optional injection is not satisfied). These fields should be declared private and
callers should guard against null returns.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/AnalyticsEngineAdapter.java [30-32]

-QueryPlanExecutor executor;
-SchemaProvider schemaProvider;
-EngineCapabilities capabilities;
+private QueryPlanExecutor executor;
+private SchemaProvider schemaProvider;
+private EngineCapabilities capabilities;
Suggestion importance[1-10]: 4

__

Why: The fields executor, schemaProvider, and capabilities are package-private, which is a minor encapsulation issue. Making them private is a good practice, but this is a style/maintainability concern with low functional impact.

Low

Previous suggestions

Suggestions up to commit 9acc4b2
CategorySuggestion                                                                                                                                    Impact
Possible issue
Close ResultSet in try-with-resources block

The ResultSet obtained from statement.executeQuery() is never explicitly closed.
While closing the PreparedStatement may close the ResultSet in some JDBC drivers, it
is not guaranteed by the JDBC specification. The ResultSet should be closed in a
try-with-resources block to ensure proper resource cleanup.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/action/UnifiedQueryService.java [71-93]

 PreparedStatement statement = compileAndPrepare(context, mixedPlan);
-try (statement) {
-    ResultSet rs = statement.executeQuery();
-    ...
+try (statement; ResultSet rs = statement.executeQuery()) {
+    ResultSetMetaData metaData = rs.getMetaData();
+    int columnCount = metaData.getColumnCount();
+    List<String> columns = new ArrayList<>();
+    for (int i = 1; i <= columnCount; i++) {
+        columns.add(metaData.getColumnName(i));
+    }
+    List<Object[]> rows = new ArrayList<>();
+    while (rs.next()) {
+        Object[] row = new Object[columnCount];
+        for (int i = 1; i <= columnCount; i++) {
+            row[i - 1] = rs.getObject(i);
+        }
+        rows.add(row);
+    }
     return new PPLResponse(columns, rows);
 }
Suggestion importance[1-10]: 6

__

Why: The ResultSet is not explicitly closed, which can cause resource leaks in JDBC drivers that don't auto-close it when the PreparedStatement is closed. The fix is valid and improves resource management.

Low
Avoid returning null capabilities

Returning null from getEngineCapabilities() will cause a NullPointerException when
the hub or planner calls capabilities.supportsOperator(...) or similar methods on
the result. Return a proper default or empty capabilities object instead.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [37-39]

 @Override
 public EngineCapabilities getEngineCapabilities() {
-    return null;
+    return EngineCapabilities.defaultCapabilities();
 }
Suggestion importance[1-10]: 6

__

Why: Returning null from getEngineCapabilities() can cause NullPointerException when the planner calls methods on the result. However, the AnalyticsBackEndPlugin interface's Javadoc explicitly states "or null" as a valid return, so callers should already guard against null. This is a moderate improvement.

Low
Fix incomplete boundary node traversal

The findBoundary method only recurses into nodes with exactly one input (size() ==
1), which means it will fail to find a boundary node in subtrees where intermediate
nodes have multiple inputs (e.g., joins). This restriction should be removed to
allow full traversal, or at minimum the logic should be consistent with
replaceWithFragment which recurses into all inputs unconditionally.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/AbsorbRuleUtils.java [38-52]

 static OpenSearchBoundaryTableScan findBoundary(RelNode node) {
     for (RelNode rawInput : node.getInputs()) {
         RelNode input = unwrap(rawInput);
         if (input instanceof OpenSearchBoundaryTableScan) {
             return (OpenSearchBoundaryTableScan) input;
         }
-        if (input.getInputs().size() == 1) {
-            OpenSearchBoundaryTableScan found = findBoundary(input);
-            if (found != null) {
-                return found;
-            }
+        OpenSearchBoundaryTableScan found = findBoundary(input);
+        if (found != null) {
+            return found;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 6

__

Why: The size() == 1 restriction in findBoundary is inconsistent with replaceWithFragment which recurses into all inputs, and could miss boundary nodes in multi-input subtrees. The fix is straightforward and improves correctness.

Low
Guard against null SchemaProvider injection

The SchemaProvider is set unconditionally in loadExtensions, but loadExtensions may
be called before the cluster state is available, and the schemaProviderRef is set
even if no back-ends are loaded. More critically, the SchemaProvider lambda is set
here but the Guice provider schemaProviderRef::get will return null if
loadExtensions is never called (e.g., no extensions present), causing injection
failures. The schemaProviderRef should be initialized with a default value or a
guard should be added in the Guice provider.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/AnalyticsPlugin.java [50-54]

 @Override
 public void loadExtensions(ExtensionLoader loader) {
     backEnds.addAll(loader.loadExtensions(AnalyticsBackEndPlugin.class));
     SchemaProvider schemaProvider = clusterState -> OpenSearchSchemaBuilder.buildSchema((ClusterState) clusterState);
-    schemaProviderRef.set(schemaProvider);
+    schemaProviderRef.compareAndSet(null, schemaProvider);
 }
 
+// In createGuiceModules, guard against null:
+// b.bind(SchemaProvider.class).toProvider(() -> {
+//     SchemaProvider sp = schemaProviderRef.get();
+//     if (sp == null) throw new IllegalStateException("SchemaProvider not initialized");
+//     return sp;
+// });
+
Suggestion importance[1-10]: 4

__

Why: The concern about schemaProviderRef being null if loadExtensions is never called is valid, but the improved_code mixes inline code with comments and doesn't cleanly reflect the change. The suggestion is partially correct but the implementation is incomplete and inconsistent.

Low
General
Use explicit curated set for default supported functions

SqlStdOperatorTable.instance().getOperatorList() returns all operators including
those that may not be safely executable by the engine. This creates an overly
permissive default that could allow unsupported operators to be pushed down, leading
to runtime failures. The default supported functions should be an explicit, curated
set rather than the entire standard operator table.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/backend/EngineCapabilities.java [47-52]

-/** Returns capabilities covering standard Calcite logical operators and all built-in functions. */
+/** Returns capabilities covering standard Calcite logical operators and common built-in functions. */
 public static EngineCapabilities defaultCapabilities() {
     return new EngineCapabilities(
         Set.of(LogicalTableScan.class, LogicalFilter.class, LogicalAggregate.class, LogicalSort.class),
-        new HashSet<>(SqlStdOperatorTable.instance().getOperatorList())
+        Set.of(
+            SqlStdOperatorTable.EQUALS,
+            SqlStdOperatorTable.NOT_EQUALS,
+            SqlStdOperatorTable.GREATER_THAN,
+            SqlStdOperatorTable.GREATER_THAN_OR_EQUAL,
+            SqlStdOperatorTable.LESS_THAN,
+            SqlStdOperatorTable.LESS_THAN_OR_EQUAL,
+            SqlStdOperatorTable.AND,
+            SqlStdOperatorTable.OR,
+            SqlStdOperatorTable.NOT,
+            SqlStdOperatorTable.IS_NULL,
+            SqlStdOperatorTable.IS_NOT_NULL,
+            SqlStdOperatorTable.PLUS,
+            SqlStdOperatorTable.MINUS,
+            SqlStdOperatorTable.MULTIPLY,
+            SqlStdOperatorTable.DIVIDE,
+            SqlStdOperatorTable.COUNT,
+            SqlStdOperatorTable.SUM,
+            SqlStdOperatorTable.MIN,
+            SqlStdOperatorTable.MAX,
+            SqlStdOperatorTable.AVG
+        )
     );
 }
Suggestion importance[1-10]: 5

__

Why: Using all operators from SqlStdOperatorTable as defaults is overly permissive and could allow unsupported operators to be pushed down. However, the comment in the code indicates this is illustrative/sandbox code, and the tests explicitly rely on the current broad default behavior.

Low
Remove unused planner call in test override

The mockLogicalPlan used in mockPlanner.plan(mockLogicalPlan) is never set up with
when(mockPlanner.plan(mockLogicalPlan)), only
when(mockPlanner.plan(any(RelNode.class))) is configured in setUp(). The result
mixed is also never used in the method body, making this call misleading and
potentially masking bugs in the test's intent to verify planner interaction.

sandbox/plugins/analytics-engine/src/test/java/fe/ppl/action/UnifiedQueryServiceTests.java [216-245]

 @Override
 public PPLResponse execute(String pplText, ClusterState cs) {
     // Replicate the real execute logic but track context cleanup
-    RelNode mixed = mockPlanner.plan(mockLogicalPlan);
-
     try {
         try (PreparedStatement statement = mockStatement) {
             ResultSet rs = statement.executeQuery();
-            ...
+            ResultSetMetaData metaData = rs.getMetaData();
+            int columnCount = metaData.getColumnCount();
+            List<String> columns = new ArrayList<>();
+            for (int i = 1; i <= columnCount; i++) {
+                columns.add(metaData.getColumnName(i));
+            }
+            List<Object[]> rows = new ArrayList<>();
+            while (rs.next()) {
+                Object[] row = new Object[columnCount];
+                for (int i = 1; i <= columnCount; i++) {
+                    row[i - 1] = rs.getObject(i);
+                }
+                rows.add(row);
+            }
+            return new PPLResponse(columns, rows);
         }
     } catch (Exception e) {
         if (e instanceof RuntimeException) throw (RuntimeException) e;
         throw new RuntimeException(e.getMessage(), e);
     } finally {
         contextClosed.set(true);
     }
 }
Suggestion importance[1-10]: 4

__

Why: The mockPlanner.plan(mockLogicalPlan) call result is unused and misleading in the test override, but this is a test-only issue with limited functional impact. The suggestion is valid but minor.

Low
Guard explain() call inside exception handler

Calling logicalFragment.explain() inside the exception handler can itself throw an
exception, which would suppress the original exception and make debugging very
difficult. The explain string should be computed before the try block or guarded
with its own try-catch.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rel/OpenSearchBoundaryTableScan.java [110-124]

 @SuppressWarnings("unchecked")
 public Enumerable<Object[]> bind(DataContext dataContext) {
     try {
         Iterable<Object[]> result = (Iterable<Object[]>) planExecutor.execute(logicalFragment, dataContext);
         return Linq4j.asEnumerable(result);
     } catch (Exception e) {
+        String fragmentDesc;
+        try {
+            fragmentDesc = logicalFragment.explain();
+        } catch (Exception ex) {
+            fragmentDesc = "<unable to explain fragment>";
+        }
         throw new RuntimeException(
             "Engine execution failed for table ["
                 + getTable().getQualifiedName()
                 + "] with logical fragment: "
-                + logicalFragment.explain(),
+                + fragmentDesc,
             e
         );
     }
 }
Suggestion importance[1-10]: 4

__

Why: Calling logicalFragment.explain() inside a catch block can throw and suppress the original exception, making debugging harder. The suggestion is valid but represents a defensive coding improvement rather than a critical bug fix.

Low
Restrict field visibility to prevent unguarded null access

The fields executor, schemaProvider, and capabilities are package-private and not
initialized to any default value. If setEngineComponents is never called (e.g.,
Guice injection is skipped or optional injection is not triggered), calling
getExecutor(), getSchemaProvider(), or getCapabilities() will return null, leading
to NullPointerExceptions in callers. These fields should be declared private and
callers should null-check or the class should enforce initialization.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/AnalyticsEngineAdapter.java [30-32]

-QueryPlanExecutor executor;
-SchemaProvider schemaProvider;
-EngineCapabilities capabilities;
+private volatile QueryPlanExecutor executor;
+private volatile SchemaProvider schemaProvider;
+private volatile EngineCapabilities capabilities;
Suggestion importance[1-10]: 3

__

Why: Making fields private volatile improves encapsulation and thread safety, but doesn't actually prevent NullPointerExceptions from callers — it only restricts direct field access. The suggestion is a minor style improvement with limited functional impact.

Low
Suggestions up to commit 259df47
CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure ResultSet is properly closed

The ResultSet obtained from statement.executeQuery() is never explicitly closed.
While closing the PreparedStatement may implicitly close the ResultSet in some JDBC
drivers, it is not guaranteed by the JDBC specification. The ResultSet should be
closed in its own try-with-resources block to ensure proper resource cleanup.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/action/UnifiedQueryService.java [71-93]

 PreparedStatement statement = compileAndPrepare(context, mixedPlan);
-try (statement) {
-    ResultSet rs = statement.executeQuery();
-    ...
+try (statement; ResultSet rs = statement.executeQuery()) {
+    ResultSetMetaData metaData = rs.getMetaData();
+    int columnCount = metaData.getColumnCount();
+    List<String> columns = new ArrayList<>();
+    for (int i = 1; i <= columnCount; i++) {
+        columns.add(metaData.getColumnName(i));
+    }
+    List<Object[]> rows = new ArrayList<>();
+    while (rs.next()) {
+        Object[] row = new Object[columnCount];
+        for (int i = 1; i <= columnCount; i++) {
+            row[i - 1] = rs.getObject(i);
+        }
+        rows.add(row);
+    }
     return new PPLResponse(columns, rows);
 }
Suggestion importance[1-10]: 6

__

Why: The ResultSet is not explicitly closed in a try-with-resources block. While closing the PreparedStatement often implicitly closes the ResultSet, this is not guaranteed by the JDBC spec. The improved code correctly adds ResultSet to the try-with-resources statement.

Low
Avoid returning null capabilities causing NPE

Returning null from getEngineCapabilities() will cause a NullPointerException when
the hub or planner calls capabilities.supportsOperator(...) or similar methods on
the returned value. The AnalyticsBackEndPlugin interface documents that null is
allowed, but callers in AbsorbFilterRule, AbsorbAggregateRule, and AbsorbSortRule
dereference the result without null checks. Either return a default/empty
capabilities object or ensure all callers guard against null.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [34-36]

 @Override
 public EngineCapabilities getEngineCapabilities() {
-    return null;
+    return EngineCapabilities.defaultCapabilities();
 }
Suggestion importance[1-10]: 6

__

Why: Returning null from getEngineCapabilities() is documented as allowed in the interface, but callers in the absorb rules dereference it without null checks. This is a valid concern, though the interface explicitly permits null, so callers should guard against it rather than forcing a non-null return here.

Low
Initialize schema provider reference with a default value

The SchemaProvider is set unconditionally in loadExtensions, but loadExtensions may
be called before the cluster state is available, and the schemaProviderRef is set
regardless of whether any back-ends were loaded. More critically, the SchemaProvider
lambda is created here but the Guice binding uses schemaProviderRef::get, which will
return null if loadExtensions is never called (e.g., no extending plugins), causing
injection failures. The schemaProviderRef should be initialized with a default value
or the binding should handle the null case.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/AnalyticsPlugin.java [47-51]

+private final AtomicReference<SchemaProvider> schemaProviderRef = new AtomicReference<>(
+    clusterState -> OpenSearchSchemaBuilder.buildSchema((ClusterState) clusterState)
+);
+
 @Override
 public void loadExtensions(ExtensionLoader loader) {
     backEnds.addAll(loader.loadExtensions(AnalyticsBackEndPlugin.class));
-    SchemaProvider schemaProvider = clusterState -> OpenSearchSchemaBuilder.buildSchema((ClusterState) clusterState);
-    schemaProviderRef.set(schemaProvider);
 }
Suggestion importance[1-10]: 5

__

Why: The schemaProviderRef could be null if loadExtensions is never called, causing Guice injection failures. Initializing it with a default value is a reasonable defensive improvement, though in practice loadExtensions is always called by the plugin framework.

Low
General
Fix incomplete boundary search in multi-input subtrees

The findBoundary method only recurses into nodes with exactly one input (size() ==
1), which means it will fail to find a boundary scan in multi-input subtrees (e.g.,
joins). While this may be intentional for now, it also skips recursing into nodes
with zero inputs that are not a boundary scan, which is fine, but nodes with more
than one input (other than joins) could also contain a boundary. Consider removing
the size() == 1 restriction and always recursing, or documenting this limitation
explicitly.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rules/AbsorbRuleUtils.java [38-52]

 static OpenSearchBoundaryTableScan findBoundary(RelNode node) {
     for (RelNode rawInput : node.getInputs()) {
         RelNode input = unwrap(rawInput);
         if (input instanceof OpenSearchBoundaryTableScan) {
             return (OpenSearchBoundaryTableScan) input;
         }
-        if (input.getInputs().size() == 1) {
-            OpenSearchBoundaryTableScan found = findBoundary(input);
-            if (found != null) {
-                return found;
-            }
+        OpenSearchBoundaryTableScan found = findBoundary(input);
+        if (found != null) {
+            return found;
         }
     }
     return null;
 }
Suggestion importance[1-10]: 5

__

Why: The size() == 1 restriction in findBoundary prevents recursing into nodes with multiple inputs, which could cause the boundary scan to not be found in valid plan shapes. Removing this restriction makes the search more robust, though the current design may intentionally limit to single-input chains.

Low
Remove unused variable causing misleading test logic

The overridden execute() method calls mockPlanner.plan(mockLogicalPlan) but
mockLogicalPlan is a class-level mock that is never passed to the real execute()
method — the actual PPL text and cluster state parameters are ignored. This means
the test does not verify that the correct arguments are forwarded to the planner,
making the resource-cleanup test misleading. The mixed variable is also unused in
the execution path.

sandbox/plugins/analytics-engine/src/test/java/fe/ppl/action/UnifiedQueryServiceTests.java [216-246]

 @Override
 public PPLResponse execute(String pplText, ClusterState cs) {
     // Replicate the real execute logic but track context cleanup
-    RelNode mixed = mockPlanner.plan(mockLogicalPlan);
-
     try {
         try (PreparedStatement statement = mockStatement) {
-            ...
+            ResultSet rs = statement.executeQuery();
+            ResultSetMetaData metaData = rs.getMetaData();
+            int columnCount = metaData.getColumnCount();
+            List<String> columns = new ArrayList<>();
+            for (int i = 1; i <= columnCount; i++) {
+                columns.add(metaData.getColumnName(i));
+            }
+            List<Object[]> rows = new ArrayList<>();
+            while (rs.next()) {
+                Object[] row = new Object[columnCount];
+                for (int i = 1; i <= columnCount; i++) {
+                    row[i - 1] = rs.getObject(i);
+                }
+                rows.add(row);
+            }
+            return new PPLResponse(columns, rows);
         }
     } catch (Exception e) {
         if (e instanceof RuntimeException) throw (RuntimeException) e;
         throw new RuntimeException(e.getMessage(), e);
     } finally {
         contextClosed.set(true);
     }
 }
Suggestion importance[1-10]: 4

__

Why: The mixed variable from mockPlanner.plan(mockLogicalPlan) is computed but never used in the execution path, and the test ignores the actual pplText and cs parameters. This is a test-only helper method meant to simulate resource cleanup, so the impact is limited, but removing the unused variable improves clarity.

Low
Prevent exception suppression in error handler

The logicalFragment.explain() call in the exception handler may itself throw an
exception, which would suppress the original exception and make debugging harder.
The explain call should be wrapped in a try-catch or moved outside the catch block
to avoid masking the root cause.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/planner/rel/OpenSearchBoundaryTableScan.java [111-124]

 @SuppressWarnings("unchecked")
 public Enumerable<Object[]> bind(DataContext dataContext) {
     try {
         Iterable<Object[]> result = (Iterable<Object[]>) planExecutor.execute(logicalFragment, dataContext);
         return Linq4j.asEnumerable(result);
     } catch (Exception e) {
+        String fragmentDesc;
+        try {
+            fragmentDesc = logicalFragment.explain();
+        } catch (Exception ignored) {
+            fragmentDesc = logicalFragment.toString();
+        }
         throw new RuntimeException(
             "Engine execution failed for table ["
                 + getTable().getQualifiedName()
                 + "] with logical fragment: "
-                + logicalFragment.explain(),
+                + fragmentDesc,
             e
         );
     }
 }
Suggestion importance[1-10]: 4

__

Why: The logicalFragment.explain() call inside the catch block could itself throw an exception, potentially suppressing the original exception. Wrapping it in a try-catch with a fallback to toString() is a reasonable defensive measure.

Low
Handle mapping parse errors gracefully per index

The mapping.sourceAsMap() call can throw an IOException (or a wrapped
RuntimeException depending on the implementation), but there is no error handling
around it. If a single index has a corrupt or unparseable mapping, the entire
buildSchema() call will fail, preventing any schema from being built. This should be
wrapped in a try-catch to skip problematic indices and log a warning.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/schema/OpenSearchSchemaBuilder.java [51-53]

-@SuppressWarnings("unchecked")
-Map<String, Object> sourceMap = mapping.sourceAsMap();
+Map<String, Object> sourceMap;
+try {
+    @SuppressWarnings("unchecked")
+    Map<String, Object> rawMap = mapping.sourceAsMap();
+    sourceMap = rawMap;
+} catch (Exception e) {
+    // Skip indices with unparseable mappings
+    continue;
+}
 @SuppressWarnings("unchecked")
 Map<String, Object> properties = (Map<String, Object>) sourceMap.get("properties");
Suggestion importance[1-10]: 4

__

Why: If mapping.sourceAsMap() throws for a corrupt index mapping, the entire buildSchema() call fails. Wrapping it in a try-catch to skip problematic indices would make the schema building more resilient, though in practice sourceAsMap() typically wraps IOExceptions in RuntimeExceptions already.

Low
Restrict field visibility to prevent unguarded null access

The fields executor, schemaProvider, and capabilities have package-private
visibility and no initialization, so if the optional injection is never triggered,
getExecutor(), getSchemaProvider(), and getCapabilities() will return null, leading
to NullPointerException in callers. These fields should be declared private and
callers should null-check the returned values, or the injection should not be
optional if these components are required.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/ppl/AnalyticsEngineAdapter.java [30-39]

-QueryPlanExecutor executor;
-SchemaProvider schemaProvider;
-EngineCapabilities capabilities;
+private volatile QueryPlanExecutor executor;
+private volatile SchemaProvider schemaProvider;
+private volatile EngineCapabilities capabilities;
 
 @Inject(optional = true)
 void setEngineComponents(QueryPlanExecutor executor, SchemaProvider schemaProvider, EngineCapabilities engineCapabilities) {
     this.executor = executor;
     this.schemaProvider = schemaProvider;
     this.capabilities = engineCapabilities;
 }
Suggestion importance[1-10]: 4

__

Why: The package-private fields could return null if optional injection is never triggered, but the improved_code only adds private volatile without addressing the null-safety concern for callers. The suggestion is partially valid but the volatile addition is unrelated to the core null-safety issue.

Low
Suggestions up to commit fa4ebf8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix mismatched handler name assertion in test

The test asserts the handler name is "unified_ppl_action", but
RestUnifiedPPLAction.getName() returns "ppl_action". This mismatch will cause the
test to fail. Update the expected value to match the actual implementation.

sandbox/plugins/analytics-frontend-ppl/src/test/java/fe/ppl/action/RestUnifiedPPLActionTests.java [43-45]

 public void testGetName() {
-    assertThat(action.getName(), equalTo("unified_ppl_action"));
+    assertThat(action.getName(), equalTo("ppl_action"));
 }
Suggestion importance[1-10]: 9

__

Why: The test asserts "unified_ppl_action" but RestUnifiedPPLAction.getName() returns "ppl_action". This is a clear bug that will cause the test to fail at runtime.

High
Fix mismatched route path assertion in test

The test asserts the route path is "/_plugins/_query_engine/_unified/ppl", but
RestUnifiedPPLAction.ROUTE_PATH is defined as "/_plugins/_analytics_engine/ppl".
This mismatch will cause the test to fail. The test should use
RestUnifiedPPLAction.ROUTE_PATH or the correct literal path.

sandbox/plugins/analytics-frontend-ppl/src/test/java/fe/ppl/action/RestUnifiedPPLActionTests.java [50-55]

 public void testRoutes() {
     List<RestUnifiedPPLAction.Route> routes = action.routes();
     assertThat(routes.size(), equalTo(1));
     assertThat(routes.get(0).getMethod(), equalTo(RestRequest.Method.POST));
-    assertThat(routes.get(0).getPath(), equalTo("/_plugins/_query_engine/_unified/ppl"));
+    assertThat(routes.get(0).getPath(), equalTo(RestUnifiedPPLAction.ROUTE_PATH));
 }
Suggestion importance[1-10]: 9

__

Why: The test hardcodes "/_plugins/_query_engine/_unified/ppl" but ROUTE_PATH is "/_plugins/_analytics_engine/ppl". This mismatch will cause the test to fail; using RestUnifiedPPLAction.ROUTE_PATH is the correct fix.

High
Close ResultSet to prevent resource leaks

The ResultSet obtained from statement.executeQuery() is never closed, which can
cause resource leaks. It should be wrapped in a try-with-resources block to ensure
it is closed after use.

sandbox/plugins/analytics-frontend-ppl/src/main/java/org/opensearch/fe/action/UnifiedQueryService.java [72-75]

-ResultSet rs = statement.executeQuery();
+try (ResultSet rs = statement.executeQuery()) {
+    // Step 4: Extract column metadata
+    ResultSetMetaData metaData = rs.getMetaData();
 
-// Step 4: Extract column metadata
-ResultSetMetaData metaData = rs.getMetaData();
-
Suggestion importance[1-10]: 7

__

Why: The ResultSet is never closed, which can cause resource leaks. Wrapping it in a try-with-resources block is a valid improvement, though the PreparedStatement is already closed via try-with-resources which may implicitly close the ResultSet in some JDBC drivers.

Medium
Fix incorrect ordering of sort and fields

The fields projection is applied before sort, which may cause issues if the sort
relies on fields not yet projected or if the engine requires sorting before field
selection. To match the SQL semantics (ORDER BY SearchPhrase after WHERE), the sort
should come before fields, consistent with q25.ppl and q27.ppl.

sandbox/plugins/analytics-frontend-ppl/src/test/resources/clickbench/queries/q26.ppl [4-8]

 source=hits
 | where SearchPhrase != ''
+| sort SearchPhrase
 | fields SearchPhrase
-| sort SearchPhrase
 | head 10
Suggestion importance[1-10]: 7

__

Why: The fields command before sort in q26.ppl is inconsistent with q25.ppl and q27.ppl where sort comes before fields. While functionally it may work since SearchPhrase is retained, the ordering is semantically incorrect and inconsistent with the other similar queries.

Medium
Defer component initialization to correct lifecycle phase

The SchemaProvider and QueryPlanExecutor are initialized in loadExtensions, but
loadExtensions is called before createComponents, meaning the ClusterService and
other server components are not yet available. Schema building typically requires
cluster state which is only reliably available after createComponents. The
initialization of these components should be deferred to createComponents where the
full server context is available.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/BaseAnalyticsPlugin.java [77-85]

-SchemaProvider schemaProvider = clusterState ->
-    OpenSearchSchemaBuilder.buildSchema((ClusterState) clusterState);
-schemaProviderRef.set(schemaProvider);
-
+// Move schema and executor initialization to createComponents where
+// ClusterService and other server components are available.
+// In loadExtensions, only collect the plugins:
 if (executorPlugins.isEmpty() == false) {
-    QueryPlanExecutorPlugin executorPlugin = executorPlugins.get(0);
-    QueryPlanExecutor executor = executorPlugin.createExecutor(backEnds);
-    executorRef.set(executor);
+    // defer executor creation to createComponents
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about lifecycle ordering, but the SchemaProvider created here is a lambda that only calls buildSchema when invoked (lazily), not at loadExtensions time. The executor creation is also done here, which could be a concern if createExecutor requires server components, but this is speculative without seeing the implementation.

Low
General
Recursively unwrap exception cause for correct status mapping

The unwrapping logic only peels one layer of RuntimeException. If the root cause is
wrapped multiple times (e.g., RuntimeException → RuntimeException →
SyntaxCheckException), the correct status code will not be returned and the method
will fall through to INTERNAL_SERVER_ERROR. Consider unwrapping recursively until a
non-RuntimeException cause is found.

sandbox/plugins/analytics-frontend-ppl/src/main/java/org/opensearch/fe/action/RestUnifiedPPLAction.java [133-137]

 static RestStatus mapExceptionToStatus(Exception e) {
     Throwable cause = e;
     // Unwrap RuntimeException wrappers to find the root cause
-    if (e instanceof RuntimeException && e.getCause() != null) {
-        cause = e.getCause();
+    while (cause instanceof RuntimeException && cause.getCause() != null) {
+        cause = cause.getCause();
     }
Suggestion importance[1-10]: 6

__

Why: The single-level unwrap in mapExceptionToStatus may miss deeply nested causes, leading to incorrect INTERNAL_SERVER_ERROR responses for user errors. Changing to a while loop is a valid improvement, though multi-level wrapping may be rare in practice.

Low
Fix visitor deep-traversal to prevent double-visiting

The FunctionSupportVisitor only overrides visitCall, so non-call leaf nodes (like
RexInputRef, RexLiteral) return null from the visitor. The result == null || result
check treats null as supported, which is correct for leaves. However, if a RexCall
has an unsupported operator but its operands return null, the early-return logic in
visitCall correctly returns false. The issue is that RexVisitorImpl with deep=true
will visit children automatically before visitCall is invoked, potentially causing
double-visiting. The visitor should use deep=false and manually recurse to avoid
this.

sandbox/libs/analytics-framework/src/main/java/org/opensearch/analytics/backend/EngineCapabilities.java [51-57]

-public boolean supportsAllFunctions(RexNode expression) {
-    if (expression == null) {
+private class FunctionSupportVisitor extends RexVisitorImpl<Boolean> {
+    FunctionSupportVisitor() {
+        super(false); // manual recursion to avoid double-visiting
+    }
+
+    @Override
+    public Boolean visitCall(RexCall call) {
+        if (!supportedFunctions.contains(call.getOperator())) {
+            return false;
+        }
+        for (RexNode operand : call.getOperands()) {
+            Boolean childResult = operand.accept(this);
+            if (childResult != null && !childResult) {
+                return false;
+            }
+        }
         return true;
     }
-    Boolean result = expression.accept(new FunctionSupportVisitor());
-    return result == null || result;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that using deep=true in RexVisitorImpl combined with manual recursion in visitCall could cause double-visiting of operands. Switching to deep=false with manual recursion is a valid fix for correctness.

Low
Remove unused planner call in test override method

The overridden execute method calls mockPlanner.plan(mockLogicalPlan) but
mockLogicalPlan is never initialized with a meaningful value — it is just a raw
mock. The when(mockPlanner.plan(any(RelNode.class))).thenReturn(mockMixedPlan) stub
in setUp will match, but the result mixed is never used in the method body. This
means the planner interaction is not actually tested, making the
testResourceCleanupOnSuccess and testResourceCleanupOnFailure tests misleading. The
mixed variable should be used when calling compileAndPrepare, or the override should
be simplified to remove the unused planner call.

sandbox/plugins/analytics-frontend-ppl/src/test/java/fe/ppl/action/UnifiedQueryServiceTests.java [216-219]

 @Override
 public UnifiedPPLResponse execute(String pplText, ClusterState cs) {
     // Replicate the real execute logic but track context cleanup
-    RelNode mixed = mockPlanner.plan(mockLogicalPlan);
-    ...
+    try {
+        try (PreparedStatement statement = mockStatement) {
+            ResultSet rs = statement.executeQuery();
+            ResultSetMetaData metaData = rs.getMetaData();
+            int columnCount = metaData.getColumnCount();
+            List<String> columns = new ArrayList<>();
+            for (int i = 1; i <= columnCount; i++) {
+                columns.add(metaData.getColumnName(i));
+            }
+            List<Object[]> rows = new ArrayList<>();
+            while (rs.next()) {
+                Object[] row = new Object[columnCount];
+                for (int i = 1; i <= columnCount; i++) {
+                    row[i - 1] = rs.getObject(i);
+                }
+                rows.add(row);
+            }
+            return new UnifiedPPLResponse(columns, rows);
+        }
+    } catch (Exception e) {
+        if (e instanceof RuntimeException) throw (RuntimeException) e;
+        throw new RuntimeException(e.getMessage(), e);
+    } finally {
+        contextClosed.set(true);
+    }
 }
Suggestion importance[1-10]: 4

__

Why: The mixed variable from mockPlanner.plan(mockLogicalPlan) is computed but never used in the overridden execute method, making the planner interaction misleading. Removing it improves clarity, though it doesn't affect test correctness since the mock stub still matches.

Low
Document intentional single-input traversal limitation

The findBoundary method only recurses into nodes with exactly one input
(input.getInputs().size() == 1), which means it will fail to find a boundary node
through multi-input operators (e.g., joins). While this may be intentional for now,
it also skips recursion into nodes with zero inputs that are not boundary nodes,
which is fine. However, the restriction to single-input nodes is inconsistent with
replaceWithFragment, which iterates all inputs. If the intent is to only absorb
linear chains, this should be documented explicitly to avoid confusion.

sandbox/plugins/analytics-frontend-ppl/src/main/java/org/opensearch/fe/planner/rules/AbsorbRuleUtils.java [38-52]

+/**
+ * Walks down single-input chains to find an OpenSearchBoundaryTableScan.
+ * Only traverses nodes with exactly one input to ensure linear absorption chains.
+ * Multi-input nodes (e.g., joins) are not traversed.
+ */
 static OpenSearchBoundaryTableScan findBoundary(RelNode node) {
     for (RelNode rawInput : node.getInputs()) {
         RelNode input = unwrap(rawInput);
         if (input instanceof OpenSearchBoundaryTableScan) {
             return (OpenSearchBoundaryTableScan) input;
         }
+        // Only recurse into single-input nodes to maintain linear chain invariant
         if (input.getInputs().size() == 1) {
             OpenSearchBoundaryTableScan found = findBoundary(input);
             if (found != null) {
                 return found;
             }
         }
     }
     return null;
 }
Suggestion importance[1-10]: 1

__

Why: This suggestion only adds comments/documentation to existing code without changing any logic. The existing_code and improved_code are functionally identical, so this scores very low.

Low
Suggestions up to commit 3190e36
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fully unwrap nested exception cause chain

**The current code only unwraps one level of RuntimeException wrapping. If the root
cause is nested more than one level deep (e.g., RuntimeException wrapping another
RuntimeException wrapping ...

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e921691.

PathLineSeverityDescription
sandbox/plugins/analytics-frontend-ppl/build.gradle17mediumMultiple SNAPSHOT dependencies (unified-query-api, unified-query-ppl, unified-query-common, unified-query-core, unified-query-protocol at 3.6.0.0-SNAPSHOT) pulled from an OpenSearch CI snapshot repository. SNAPSHOT artifacts are mutable and can silently change between builds, creating a supply chain risk if the snapshot server or pipeline is compromised.
sandbox/plugins/analytics-frontend-ppl/build.gradle28lowJitPack repository included to resolve com.github.babbel transitive dependency. JitPack builds on demand from GitHub source, providing weaker supply chain guarantees than Maven Central.
sandbox/plugins/analytics-engine/build.gradle49lowBlanket suppression of -Werror, -Xlint, and -Xdoclint compiler flags across all JavaCompile tasks. While attributed to Calcite/Arrow compatibility, this disables security-relevant compiler warnings for all newly added code in the module.
sandbox/plugins/analytics-frontend-ppl/src/main/java/org/opensearch/fe/ppl/action/RestUnifiedPPLAction.java37lowNew REST endpoint POST /_plugins/_query_engine/_unified/ppl enables arbitrary PPL query execution against any reachable index. No authentication enforcement is visible at the handler level; security depends entirely on the OpenSearch security plugin being present and configured.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 1 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@mch2 mch2 changed the title initial commit of extensible query engine plugin to sandbox initial commit of analytics engine plugin to sandbox Mar 2, 2026
@mch2
Copy link
Member Author

mch2 commented Mar 2, 2026

needs some clean up still - but updated the interfaces here @Bukhtawar.

server/src/main/java/org/opensearch/plugins/          ← Server-layer interfaces
├── QueryPlanExecutor.java                             │ execute(Object plan, Object context) → Iterable<Object[]>
├── QueryEnginePlugin.java                             │ createComponents(..., backEnds), getExecutor() → QueryPlanExecutor
├── AnalyticsFrontEndPlugin.java                       │ createComponents(..., QueryPlanExecutor)
└── AnalyticsBackEndPlugin.java                        │ marker interface

^ if we used SPI we could clean this up a bit - remove Object types, have front-end and back-end wire directly in.
Or... we could wire up by passing pluginsservice down to AnalyticsEnginePlugin and have it configure the fe/be plugins.

sandbox/plugins/analytics-engine/                      ← Engine plugin (Gradle: opensearchplugin)
├── build.gradle                                       │ calcite-core, calcite-linq4j, avatica-core, guava, + runtime deps
└── src/main/java/org/opensearch/fe/ppl/
    ├── AnalyticsEnginePlugin.java                     │ implements QueryEnginePlugin, ActionPlugin
    └── PlanExecutor.java                              │ implements QueryPlanExecutor (stub: logs fragment, returns empty)

sandbox/plugins/analytics-frontend-ppl/                ← PPL front-end plugin (Gradle: opensearchplugin)

and in future:

sandbox/plugins/analytics-frontend-dsl
sandbox/plugins/engine-datafusion
sandbox/plugins/engine-lucene

Copy link
Contributor

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mch2 for the changes. Maybe we need to see what is common across FE languages and how do we ensure there is no redundant logic across FE and Query planner plugin whilst maintaining single responsibility.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 4ae97ca.

PathLineSeverityDescription
sandbox/plugins/analytics-backend-datafusion/build.gradle22mediumRepository declaration adding JitPack (https://jitpack.io) as a dependency source. JitPack builds directly from GitHub commits without reproducible builds or security auditing, introducing supply chain risk for the okhttp-aws-signer transitive dependency.
sandbox/plugins/analytics-frontend-ppl/build.gradle33mediumRepository declaration adding an OpenSearch Snapshots CI server (https://ci.opensearch.org/ci/dbc/snapshots/maven/) as a dependency source. Snapshot builds are not release-audited and could introduce unverified artifacts into the build.
sandbox/libs/build.gradle12lowRemoval of the cross-library dependency enforcement block allows sandbox library subprojects to depend on arbitrary other library projects, weakening the isolation guardrails that existed before this change.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 2 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@andrross
Copy link
Member

andrross commented Mar 4, 2026

@mch2 Annoying high level comment, but how about some README.md files in each new component's root directory to give some context about what each thing does and how it fits in?

@mch2
Copy link
Member Author

mch2 commented Mar 4, 2026

@Bukhtawar For now i've separated this into front-end/back-end/executor, with a shared lib and plugin "hub" to wire in shared components. As for query planning, everything planning related in the ppl plugin is there to allow the UQP to build plans for our executor. The Executor itself will do its own planning rbo/cbo,scheduling that will be reused across front-ends

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 0084490.

PathLineSeverityDescription
sandbox/plugins/analytics-frontend-ppl/build.gradle37mediumSNAPSHOT dependency on org.opensearch.query:unified-query-* at version 3.6.0.0-SNAPSHOT sourced from an OpenSearch CI snapshot server. SNAPSHOT dependencies are mutable and can be silently replaced without a version bump, creating a supply chain vector if the snapshot server or artifact is compromised.
sandbox/plugins/analytics-frontend-ppl/build.gradle28mediumJitPack repository added as a dependency source for com.github.babbel. JitPack builds packages on demand directly from GitHub source; a compromise of the upstream GitHub repo would serve malicious artifacts without any version change.
sandbox/plugins/analytics-backend-datafusion/build.gradle22mediumRegisters a cargoBuild Exec task that runs 'cargo build --release' from a native/ subdirectory and loads the resulting binary as a JNI library via java.library.path. Native code is not included in this diff and cannot be audited; an unreviewed native Rust library executing inside the JVM can bypass all Java security controls.
sandbox/libs/build.gradle12lowRemoves the afterEvaluate block that enforced inter-library dependency restrictions for :libs projects. While the sandbox exclusion in forbidden-dependencies.gradle is a parallel change, removing this build-time guard weakens the enforcement layer that prevents unexpected cross-library dependencies.
sandbox/plugins/analytics-engine-executor/src/main/java/org/opensearch/analytics/engine/DefaultPlanExecutor.java50lowLogs the full query plan explanation (fragment.explain()) at INFO level, which may include index names, field names, filter conditions, and literal values from user queries. This could leak sensitive query content into log aggregation systems accessible to operators with lower privilege than data access requires.

The table above displays the top 10 most important findings.

Total: 5 | Critical: 0 | High: 0 | Medium: 3 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit fa4ebf8.

PathLineSeverityDescription
sandbox/plugins/analytics-frontend-ppl/src/test/java/fe/ppl/action/RestUnifiedPPLActionTests.java65lowTest assertions do not match the actual implementation values: testGetName() expects 'unified_ppl_action' but implementation returns 'ppl_action', and testRoutes() expects path '/_plugins/_query_engine/_unified/ppl' but implementation registers '/_plugins/_analytics_engine/ppl'. These tests would fail if executed, indicating stale test code copied from a different version.
sandbox/plugins/analytics-frontend-ppl/build.gradle37lowAdds a non-standard Maven repository pointing to OpenSearch CI snapshot server ('https://ci.opensearch.org/ci/dbc/snapshots/maven/') to resolve SNAPSHOT artifacts (sqlUnifiedQueryVersion = '3.6.0.0-SNAPSHOT'). Snapshot artifacts are mutable and could change between builds, creating a reproducibility and supply-chain risk.
sandbox/libs/build.gradle10lowRemoves the enforcement code that prevented sandbox libs from depending on other project libs. The comment warns against cross-dependencies but no runtime check remains, silently weakening the dependency graph constraint that was previously enforced.
gradle/forbidden-dependencies.gradle30lowAll projects under ':sandbox:' are now exempt from forbidden-dependency checks. This means sandbox code can pull in any transitive dependency without the standard security vetting that applies to the rest of the codebase.
sandbox/plugins/analytics-engine-executor/src/main/java/org/opensearch/analytics/engine/DefaultPlanExecutor.java49lowLogs the full logical query plan (fragment.explain()) at INFO level. This could expose sensitive query structure, table names, and filter conditions in production logs, representing an information disclosure risk.
sandbox/plugins/analytics-backend-datafusion/build.gradle22lowRegisters a 'cargoBuild' Exec task that runs 'cargo build --release' on a native Rust directory and wires it as a dependency of the test task. The Rust source files in the native directory are not included in this diff, making it impossible to verify what native code is compiled and loaded as a JNI library during tests.

The table above displays the top 10 most important findings.

Total: 6 | Critical: 0 | High: 0 | Medium: 0 | Low: 6


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Persistent review updated to latest commit fa4ebf8

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

❌ Gradle check result for fa4ebf8:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5d9bc23.

PathLineSeverityDescription
test/framework/src/main/java/org/opensearch/bootstrap/BootstrapForTesting.java146highSecurity Manager default changed from 'true' (enabled) to 'false' (disabled). This single-line change is completely unrelated to the analytics engine feature being implemented, is buried in a 4000+ line diff, and silently disables Java Security Manager for ALL tests by default. This removes network, filesystem, and class-loading restrictions during tests — a pattern consistent with supply chain attack technique to lower defenses before introducing native (JNI/Rust) code that would otherwise be restricted. Legitimate Java 17+ migration would typically be handled via explicit deprecation handling, not silently inverting the default.
sandbox/plugins/analytics-engine/build.gradle32mediumAn external snapshot Maven repository ('https://ci.opensearch.org/ci/dbc/snapshots/maven/') is added without version pinning or checksum verification. Snapshot repositories serve mutable artifacts — any dependency resolved from this repo could change between builds. The SQL Unified Query API dependencies pulled from this repo (unified-query-api, unified-query-ppl, unified-query-common, unified-query-core, unified-query-protocol) use a SNAPSHOT version ('3.6.0.0-SNAPSHOT'), meaning artifacts could be replaced server-side. This creates a supply chain risk if the snapshot server or its artifacts are compromised.
plugins/engine-datafusion/target/.rustc_info.json1lowA Cargo build artifact containing a specific developer's local machine path ('/Users/handalm/.rustup/toolchains/stable-aarch64-apple-darwin') was committed to the repository. While not directly malicious, committing build artifacts from a developer's local machine is poor security hygiene, discloses internal developer identity/environment details, and indicates the repository is being modified on a personal machine rather than via a controlled CI/CD environment. The file should be in .gitignore.
gradle/forbidden-dependencies.gradle30lowThe forbidden-dependencies enforcement (which prevents unauthorized dependency inclusion) is now bypassed for all ':sandbox:' subprojects. While this may be intentional for experimental sandbox code, removing dependency enforcement for an entire project tree means any dependencies — including potentially malicious or vulnerable ones — can be freely added to sandbox plugins without triggering the existing guardrails. The original exemption comment explaining WHY fixtures were excluded was also removed, reducing auditability.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 1 | Medium: 1 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

mch2 added 3 commits March 13, 2026 10:41
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
@github-actions
Copy link
Contributor

✅ Gradle check result for 47567de: SUCCESS

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 53.84615% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.34%. Comparing base (22a8d9d) to head (90e55df).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...java/org/opensearch/analytics/AnalyticsPlugin.java 4.34% 22 Missing ⚠️
...arch/analytics/schema/OpenSearchSchemaBuilder.java 85.00% 3 Missing and 3 partials ⚠️
...org/opensearch/be/datafusion/DataFusionPlugin.java 0.00% 5 Missing ⚠️
...org/opensearch/be/datafusion/DataFusionBridge.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main   #20697   +/-   ##
=========================================
  Coverage     73.33%   73.34%           
- Complexity    72306    72338   +32     
=========================================
  Files          5796     5801    +5     
  Lines        330263   330341   +78     
  Branches      47663    47672    +9     
=========================================
+ Hits         242189   242273   +84     
+ Misses        68660    68623   -37     
- Partials      19414    19445   +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mch2 added 6 commits March 13, 2026 15:18
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
wraps this and schema in an engineContext provided to front-ends

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as the initial framework to start building this all out in the sandbox

Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 90e55df: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 90e55df: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 90e55df: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 90e55df: SUCCESS

@mch2 mch2 merged commit 7c055e3 into opensearch-project:main Mar 16, 2026
43 of 50 checks passed
shayush622 pushed a commit to shayush622/OpenSearch that referenced this pull request Mar 16, 2026
…ect#20697)

* initial commit of extensible query engine plugin to sandbox

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle - update forbidden-dependencies to skip guava check in sandbox plugins,
calcite requires this dependency at compile time

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Rename plugin interfaces and default implementations.
Wire up a ppl front-end using UnifiedQueryAPI from sql plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* refactor to plugin-plugin SPI

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add readmes and start some clean up.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* analyzer errors

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move fe plugin into analytics plugin for testing only,  we will use sql plugin.
also remove "hub" plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* spotless

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* more clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fixing analyzer issues

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava forbidden check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix license check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc warning on transitive dependency.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle and fix weird javadoc issues with dependencies.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix calcite/guava dependencies

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix package name

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove EngineCapabilities, just use calcite's sqloperatortable.
wraps this and schema in an engineContext provided to front-ends

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* simplify unified IT to use params

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava NOTICE file to exactly match the file from grpc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* javadoc fix

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Update sandbox/plugins/analytics-engine/README.md

Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: shayush622 <ayush5267@gmail.com>
shayush622 pushed a commit to shayush622/OpenSearch that referenced this pull request Mar 16, 2026
…ect#20697)

* initial commit of extensible query engine plugin to sandbox

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle - update forbidden-dependencies to skip guava check in sandbox plugins,
calcite requires this dependency at compile time

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Rename plugin interfaces and default implementations.
Wire up a ppl front-end using UnifiedQueryAPI from sql plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* refactor to plugin-plugin SPI

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add readmes and start some clean up.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* analyzer errors

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move fe plugin into analytics plugin for testing only,  we will use sql plugin.
also remove "hub" plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* spotless

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* more clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fixing analyzer issues

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava forbidden check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix license check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc warning on transitive dependency.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle and fix weird javadoc issues with dependencies.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix calcite/guava dependencies

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix package name

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove EngineCapabilities, just use calcite's sqloperatortable.
wraps this and schema in an engineContext provided to front-ends

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* simplify unified IT to use params

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava NOTICE file to exactly match the file from grpc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* javadoc fix

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Update sandbox/plugins/analytics-engine/README.md

Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: shayush622 <ayush5267@gmail.com>
shayush622 pushed a commit to shayush622/OpenSearch that referenced this pull request Mar 16, 2026
…ect#20697)

* initial commit of extensible query engine plugin to sandbox

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle - update forbidden-dependencies to skip guava check in sandbox plugins,
calcite requires this dependency at compile time

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Rename plugin interfaces and default implementations.
Wire up a ppl front-end using UnifiedQueryAPI from sql plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* refactor to plugin-plugin SPI

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add readmes and start some clean up.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* analyzer errors

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move fe plugin into analytics plugin for testing only,  we will use sql plugin.
also remove "hub" plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* spotless

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* more clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fixing analyzer issues

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava forbidden check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix license check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc warning on transitive dependency.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle and fix weird javadoc issues with dependencies.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix calcite/guava dependencies

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix package name

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove EngineCapabilities, just use calcite's sqloperatortable.
wraps this and schema in an engineContext provided to front-ends

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* simplify unified IT to use params

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava NOTICE file to exactly match the file from grpc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* javadoc fix

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Update sandbox/plugins/analytics-engine/README.md

Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: shayush622 <ayush5267@gmail.com>
shayush622 pushed a commit to shayush622/OpenSearch that referenced this pull request Mar 16, 2026
…ect#20697)

* initial commit of extensible query engine plugin to sandbox

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle - update forbidden-dependencies to skip guava check in sandbox plugins,
calcite requires this dependency at compile time

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Rename plugin interfaces and default implementations.
Wire up a ppl front-end using UnifiedQueryAPI from sql plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* refactor to plugin-plugin SPI

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add readmes and start some clean up.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* analyzer errors

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move fe plugin into analytics plugin for testing only,  we will use sql plugin.
also remove "hub" plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* spotless

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* more clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fixing analyzer issues

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava forbidden check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix license check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc warning on transitive dependency.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle and fix weird javadoc issues with dependencies.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix calcite/guava dependencies

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix package name

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove EngineCapabilities, just use calcite's sqloperatortable.
wraps this and schema in an engineContext provided to front-ends

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* simplify unified IT to use params

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava NOTICE file to exactly match the file from grpc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* javadoc fix

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Update sandbox/plugins/analytics-engine/README.md

Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: shayush622 <ayush5267@gmail.com>
shayush622 pushed a commit to shayush622/OpenSearch that referenced this pull request Mar 16, 2026
…ect#20697)

* initial commit of extensible query engine plugin to sandbox

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle - update forbidden-dependencies to skip guava check in sandbox plugins,
calcite requires this dependency at compile time

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Rename plugin interfaces and default implementations.
Wire up a ppl front-end using UnifiedQueryAPI from sql plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* refactor to plugin-plugin SPI

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* add readmes and start some clean up.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* analyzer errors

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* move fe plugin into analytics plugin for testing only,  we will use sql plugin.
also remove "hub" plugin.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* spotless

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* more clean up

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fixing analyzer issues

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava forbidden check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix license check

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix javadoc warning on transitive dependency.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* clean up build.gradle and fix weird javadoc issues with dependencies.

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix calcite/guava dependencies

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix package name

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* remove EngineCapabilities, just use calcite's sqloperatortable.
wraps this and schema in an engineContext provided to front-ends

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* simplify unified IT to use params

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* fix guava NOTICE file to exactly match the file from grpc

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* javadoc fix

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>

* Update sandbox/plugins/analytics-engine/README.md

Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <marc.handalian@gmail.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: shayush622 <ayush5267@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants