Skip to content

Wire analytics-engine as extendedPlugins dependency#5302

Draft
ahkcs wants to merge 5 commits intofeature/mustang-ppl-integrationfrom
pr/mustang-plugin-wiring
Draft

Wire analytics-engine as extendedPlugins dependency#5302
ahkcs wants to merge 5 commits intofeature/mustang-ppl-integrationfrom
pr/mustang-plugin-wiring

Conversation

@ahkcs
Copy link
Copy Markdown
Collaborator

@ahkcs ahkcs commented Apr 1, 2026

Summary

  • Wire analytics-engine as extendedPlugins parent classloader for shared Calcite classes
  • Add bundlePlugin excludes to prevent jar hell between parent/child classloaders
  • Add analytics-engine plugin to all test clusters (integTest, yamlRestTest, security, JDBC, doctest)
  • Add commons-text to analytics-engine ZIP for Calcite fuzzy matching support
  • Fix isAnalyticsIndex to use lightweight parsing context (no cluster state needed)

Test plan

  • All existing integration tests pass with analytics-engine loaded
  • RestUnifiedQueryActionTest passes (routing logic)
  • Doctests pass (no ClassNotFoundException for LevenshteinDistance)
  • No jar hell errors on plugin startup

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 2c4c401.

PathLineSeverityDescription
libs/analytics-engine-3.6.0-SNAPSHOT.jar1highUnverifiable binary JAR committed directly to the repository. SNAPSHOT artifacts bypass standard artifact signing and reproducibility guarantees. Cannot audit for malicious bytecode without decompilation.
libs/analytics-framework-3.6.0-SNAPSHOT.jar1highUnverifiable binary JAR committed directly to the repository. SNAPSHOT artifacts bypass standard artifact signing and reproducibility guarantees. Cannot audit for malicious bytecode without decompilation.
libs/analytics-engine-3.6.0-SNAPSHOT.zip1highUnverifiable binary ZIP (plugin bundle) committed directly to the repository. Contents cannot be audited and are deployed as a live OpenSearch plugin in all test clusters.
core/build.gradle66highDependency added from a local binary file outside of a package registry (files(...)) referencing a SNAPSHOT JAR in libs/. Bypasses normal dependency verification, checksums, and provenance tracking.
plugin/build.gradle58highextendedPlugins now includes 'analytics-engine', making the analytics-engine classloader a trusted parent of the SQL plugin classloader. This grants analytics-engine code access to SQL plugin internals and elevates its privilege in the OpenSearch plugin security model. Combined with the unaudited binary ZIP, this is a significant trust escalation.
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java399mediumUses reflection with setAccessible(true) to access the private internalParameters field of CalcitePreparingStmt. Bypasses Java access controls to read/write internal Calcite state; if the field mapping is wrong or manipulated, it could allow injection of arbitrary execution parameters.
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java423mediumDynamically compiles and instantiates arbitrary Java source code (generated from Calcite query plans) using a reflection-invoked Janino compiler with a custom classloader. The compiled class is executed directly as a Bindable. Malicious input to the query plan could lead to code execution if Calcite code generation is compromised.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java154mediumDynamically compiles arbitrary Java source (Rex expressions) via reflection-loaded Janino ClassBodyEvaluator with a custom classloader, then instantiates and executes the resulting class. Same risk profile as CalciteToolsHelper: if Rex code generation can be influenced by attacker-controlled input, arbitrary code execution is possible.
plugin/build.gradle201mediumMultiple runtime JARs explicitly excluded from the plugin bundle (calcite-core, guava, janino, jackson, etc.) because they are expected to be provided by the analytics-engine plugin. This creates a hard runtime dependency on the unaudited analytics-engine binary; if that binary is replaced or tampered with, the SQL plugin will silently use the attacker-supplied versions of these critical libraries.
plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java171lowSchema is now built from live cluster state (clusterService.state()) via OpenSearchSchemaBuilder from the unaudited analytics-framework JAR. If that library exfiltrates or logs cluster metadata (index names, mappings, settings), it would have access to full cluster state on every query context construction.

The table above displays the top 10 most important findings.

Total: 10 | Critical: 0 | High: 5 | Medium: 4 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

ahkcs added 4 commits April 1, 2026 15:11
Step 1: Plugin wiring and dependency integration.

- Add analytics-engine as extendedPlugins in plugin/build.gradle
- Vendor analytics-framework JAR (interfaces) and analytics-engine
  ZIP (plugin) built from OpenSearch sandbox/3.6 branch
- Delete local QueryPlanExecutor interface, use upstream
  org.opensearch.analytics.exec.QueryPlanExecutor from JAR
- Replace StubSchemaProvider with OpenSearchSchemaBuilder which reads
  real index mappings from ClusterState
- Delete StubSchemaProvider (no longer needed)
- Exclude shared JARs (Calcite, Guava, commons-*, etc.) from SQL
  plugin bundle to avoid jar hell with analytics-engine classloader
- Load analytics-engine plugin in integTest and remoteCluster test
  clusters before opensearch-sql-plugin
- Create parquet_logs and parquet_metrics indices in ITs so
  OpenSearchSchemaBuilder can resolve the schema
- Update explain expected files for alphabetical field ordering

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Every test cluster that loads opensearch-sql-plugin needs the
analytics-engine plugin because SQL declares it as extendedPlugins.
Added to yamlRestTest, integTestWithSecurity, remoteIntegTestWithSecurity,
and integJdbcTest clusters.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
commons-text is needed by Calcite (parent classloader) for fuzzy matching
but was only in the SQL plugin (child classloader). Also use lightweight
parsing context in isAnalyticsIndex to avoid requiring cluster state.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the pr/mustang-plugin-wiring branch from 300fc86 to 62578cd Compare April 1, 2026 22:12
@ahkcs ahkcs added PPL Piped processing language enhancement New feature or request labels Apr 1, 2026
Calcite's EnumerableInterpretable.getBindable() hardcodes
EnumerableInterpretable.class.getClassLoader() for Janino compilation.
When analytics-engine is the parent classloader via extendedPlugins,
this returns the parent classloader which cannot see SQL plugin classes,
causing CompileException for any Enumerable code generation.

Override implement() in OpenSearchCalcitePreparingStmt to use our own
compileWithPluginClassLoader() which does the same code generation but
uses CalciteToolsHelper.class.getClassLoader() (SQL plugin's child
classloader) so Janino can resolve both parent and child classes.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the pr/mustang-plugin-wiring branch from ca8bcf1 to 2c4c401 Compare April 1, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant