-
Notifications
You must be signed in to change notification settings - Fork 189
[FEATURE] PPL query support for Analytics engine integration #5247
Description
Is your feature request related to a problem?
PPL queries currently only execute against Lucene-backed indices. With the Analytics engine providing Parquet-backed storage, PPL queries targeting non-Lucene indices need to be routed through the unified query pipeline to the Analytics engine for execution. The Unified Query API already supports PPL V3 Calcite-based RelNode generation, but the end-to-end integration — query routing, schema building, execution handoff, and response formatting — is not yet ready.
Technical requirements:
- PPL queries against non-Lucene indices return correct results through
_plugins/_ppl endpoint - PPL explain API (
_plugins/_ppl/_explain) returns the logical plan handed off to the Analytics engine - Default response format (JDBC JSON with schema, datarows, total, size, status) matches existing PPL response format
- Clear error messages distinguishing client errors (query parsing/planning in SQL plugin) from server errors (query optimization/distributed planning/execution from Analytics engine)
- Observability: metrics and latency tracking across both SQL plugin (routing, parsing, planning) and Analytics engine (optimization, execution)
- PPL queries on Lucene indices are unaffected (no regression)
What solution would you like?
- Plugin wiring and dependency integration: Add analytics-engine as extendedPlugins dependency, resolve Calcite jar conflicts (classloader sharing and bundlePlugin excludes), wire
SchemaBuilderandQueryPlanExecutorfrom the Analytics engine via Guice. - Query routing and execution handoff: Add
RestUnifiedQueryActionandAnalyticsExecutionEnginethat detect non-Lucene indices, route PPL queries throughUnifiedQueryPlanner.plan()→QueryPlanExecutor.execute(), and schedule execution on sql-worker thread pool with security context propagation. - Response formatting and explain support: Format
Iterable<Object[]>results via existingJdbcResponseFormatter; return logicalRelNodeplan via_plugins/_ppl/_explain, include physical plan if Analytics engine providesQueryPlanExecutor.explain()API. - Error handling and observability: Client vs server / SQL plugin vs Analytics engine error classification, query size limit enforcement, request/failure metrics, and planning/execution latency logging.
- Integration and regression tests: End-to-end ITs with analytics-engine plugin verifying PPL query, explain, response format, error handling, non-Lucene routing, and Lucene regression.
What alternatives have you considered?
See parent issue #5246 for the design comparison of Option A (Query Delegation), Option B (Unified Query Pipeline), and Option C (Calcite Schema Adapter).
Do you have any additional context?
- PoC PR: dai-chen/sql-1#10 — validates end-to-end flow with real analytics-engine plugin in integration tests
- Pending from Analytics engine team: stable API interfaces (SchemaBuilder, QueryPlanExecutor), plugin integration mechanism, and integration-ready build for testing
- If a workable Analytics engine build is not available, we may create an analytics-engine stub with mock data to unblock integration work and testing
Metadata
Metadata
Assignees
Labels
Type
Projects
Status