Skip to content

[FEATURE] PPL query support for Analytics engine integration #5247

@dai-chen

Description

@dai-chen

Is your feature request related to a problem?

PPL queries currently only execute against Lucene-backed indices. With the Analytics engine providing Parquet-backed storage, PPL queries targeting non-Lucene indices need to be routed through the unified query pipeline to the Analytics engine for execution. The Unified Query API already supports PPL V3 Calcite-based RelNode generation, but the end-to-end integration — query routing, schema building, execution handoff, and response formatting — is not yet ready.

Technical requirements:

  • PPL queries against non-Lucene indices return correct results through _plugins/_ppl endpoint
  • PPL explain API (_plugins/_ppl/_explain) returns the logical plan handed off to the Analytics engine
  • Default response format (JDBC JSON with schema, datarows, total, size, status) matches existing PPL response format
  • Clear error messages distinguishing client errors (query parsing/planning in SQL plugin) from server errors (query optimization/distributed planning/execution from Analytics engine)
  • Observability: metrics and latency tracking across both SQL plugin (routing, parsing, planning) and Analytics engine (optimization, execution)
  • PPL queries on Lucene indices are unaffected (no regression)

What solution would you like?

  1. Plugin wiring and dependency integration: Add analytics-engine as extendedPlugins dependency, resolve Calcite jar conflicts (classloader sharing and bundlePlugin excludes), wire SchemaBuilder and QueryPlanExecutor from the Analytics engine via Guice.
  2. Query routing and execution handoff: Add RestUnifiedQueryAction and AnalyticsExecutionEngine that detect non-Lucene indices, route PPL queries through UnifiedQueryPlanner.plan()QueryPlanExecutor.execute(), and schedule execution on sql-worker thread pool with security context propagation.
  3. Response formatting and explain support: Format Iterable<Object[]> results via existing JdbcResponseFormatter; return logical RelNode plan via _plugins/_ppl/_explain, include physical plan if Analytics engine provides QueryPlanExecutor.explain() API.
  4. Error handling and observability: Client vs server / SQL plugin vs Analytics engine error classification, query size limit enforcement, request/failure metrics, and planning/execution latency logging.
  5. Integration and regression tests: End-to-end ITs with analytics-engine plugin verifying PPL query, explain, response format, error handling, non-Lucene routing, and Lucene regression.

What alternatives have you considered?

See parent issue #5246 for the design comparison of Option A (Query Delegation), Option B (Unified Query Pipeline), and Option C (Calcite Schema Adapter).

Do you have any additional context?

  • PoC PR: dai-chen/sql-1#10 — validates end-to-end flow with real analytics-engine plugin in integration tests
  • Pending from Analytics engine team: stable API interfaces (SchemaBuilder, QueryPlanExecutor), plugin integration mechanism, and integration-ready build for testing
  • If a workable Analytics engine build is not available, we may create an analytics-engine stub with mock data to unblock integration work and testing

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions