-
Notifications
You must be signed in to change notification settings - Fork 189
RFC: OpenSearch SQL/PPL Telemetry Integration #5300
Description
Problem Statement
OpenSearch SQL/PPL plugin has no integration with OpenSearch's core telemetry framework. There is no distributed tracing for query execution, making it difficult to diagnose latency issues across the parse → optimize → compile → execute pipeline. The existing metrics implementation uses a custom Metrics singleton with BasicCounter/RollingCounter exposed via /_plugins/_ppl/stats and /_plugins/_sql/stats, which is disconnected from the standard OpenSearch telemetry export pipeline (OTel SDK → OTLP → observability backends).
Goals
- P0: Add distributed tracing spans to SQL/PPL query execution pipeline
- P1: Migrate existing custom metrics to OpenSearch's
MetricsRegistry(OTel-backed)
Non-Goals
- Changing the telemetry framework itself
- Adding telemetry to the sandbox analytics-engine (Calcite-based prototype)
Background
OpenSearch Telemetry Framework
OpenSearch provides a backend-agnostic telemetry framework:
libs/telemetry/— interfaces:Tracer,Span,SpanScope,MetricsRegistry,Counter,Histogramserver/— wiring:TelemetryModule,TracerFactory,WrappedTracer,SpanBuilderplugins/telemetry-otel/— OTel SDK implementation that exports viaBatchSpanProcessor(traces) andPeriodicMetricReader(metrics)
Plugins access the framework by implementing TelemetryAwarePlugin, which injects Tracer and MetricsRegistry via createComponents().
Current SQL/PPL Metrics
The SQL plugin (opensearch-project/sql) uses a custom metrics system:
| Component | Description |
|---|---|
Metrics singleton |
Global metrics registry with getInstance() |
MetricName enum |
REQ_TOTAL, REQ_COUNT_TOTAL, FAILED_REQ_COUNT_*, PPL_REQ_TOTAL, PPL_REQ_COUNT_TOTAL, etc. |
BasicCounter / RollingCounter |
Custom counter implementations |
RestSQLStatsAction / RestPPLStatsAction |
Exposes metrics via /_plugins/_sql/stats and /_plugins/_ppl/stats |
These metrics are not exported to OTel backends and cannot be correlated with OpenSearch server-level telemetry.
SQL/PPL Query Execution Pipeline
PPL text
→ Parse (PPLSyntaxParser → UnresolvedPlan AST)
→ Analyze (CalciteRelNodeVisitor → RelNode logical plan)
→ Push-down Optimize (PushDownPlanner → mixed plan with boundary nodes)
→ Compile (OpenSearchQueryCompiler → PreparedStatement)
→ Execute (PreparedStatement.executeQuery() → ResultSet)
→ Materialize (ResultSet → PPLResponse)
P0: Distributed Tracing
Interface Change
The SQL plugin class must implement TelemetryAwarePlugin:
public class SQLPlugin extends Plugin implements ScriptPlugin,
ActionPlugin, TelemetryAwarePlugin {
private Tracer tracer;
private MetricsRegistry metricsRegistry;
@Override
public Collection<Object> createComponents(
...,
Tracer tracer,
MetricsRegistry metricsRegistry
) {
this.tracer = tracer;
this.metricsRegistry = metricsRegistry;
// pass to internal components
}
}Note: TelemetryAwarePlugin is annotated @ExperimentalApi — the interface may evolve.
Span Hierarchy
Root span + 6 child spans per query (7 total):
sql/ppl.query ← root span, SpanKind.SERVER (TransportAction.doExecute)
├── sql/ppl.parse ← INTERNAL: PPLSyntaxParser + AST build
├── sql/ppl.analyze ← INTERNAL: CalciteRelNodeVisitor → logical plan
├── sql/ppl.optimize ← INTERNAL: PushDownPlanner push-down rules
├── sql/ppl.compile ← INTERNAL: OpenSearchQueryCompiler → PreparedStatement
├── sql/ppl.execute ← INTERNAL: PreparedStatement.executeQuery()
│ └── opensearch.search ← child spans from OpenSearch search (automatic)
└── sql/ppl.materialize ← INTERNAL: ResultSet → response
Only the root span uses SpanKind.SERVER (it represents a received request). All phase spans use SpanKind.INTERNAL — they are in-process operations, not network boundaries.
Span Attributes
| Attribute | Example | Span |
|---|---|---|
db.query.text |
source=logs | where ... | stats count() by host (sanitized — literals/values stripped) |
sql/ppl.query |
db.query.type |
ppl or sql |
sql/ppl.query |
db.query.id |
a1b2c3d4 |
sql/ppl.query |
db.collection.name |
logs |
sql/ppl.query |
db.query.datasource |
prometheus, s3, opensearch |
sql/ppl.query |
ppl.plan.node_count |
7 |
sql/ppl.optimize |
ppl.plan.pushed_down |
filter,aggregation |
sql/ppl.optimize |
ppl.execute.rows |
1024 |
sql/ppl.materialize |
ppl.cache.hit |
true / false |
sql/ppl.compile |
error |
true |
any span on failure |
error.type |
SemanticCheckException |
any span on failure |
Attribute naming follows OTel semantic conventions for database spans where applicable (db.* prefix).
Implementation Pattern
Follow TransportSearchAction pattern:
// TransportPPLQueryAction.doExecute()
Span span = tracer.startSpan(
SpanCreationContext.server().name("sql/ppl.query")
.attributes(Attributes.create()
.addAttribute("db.query.type", "ppl")
.addAttribute("db.query.text", sanitize(request.getQuery())))
);
try (SpanScope scope = tracer.withSpanInScope(span)) {
ActionListener<PPLQueryResponse> tracedListener =
TraceableActionListener.create(listener, span, tracer);
queryService.execute(request, tracedListener);
} catch (Exception e) {
span.setError(e);
span.endSpan();
listener.onFailure(e);
}Phase-level spans inside UnifiedQueryService.execute():
// Each phase follows this pattern
Span parseSpan = tracer.startSpan(
SpanCreationContext.internal().name("sql/ppl.parse"));
try (SpanScope s = tracer.withSpanInScope(parseSpan)) {
ast = parser.parse(query);
} catch (Exception e) {
parseSpan.setError(e);
throw e;
} finally {
parseSpan.endSpan();
}Graceful Degradation
When telemetry is disabled (default), Tracer is NoopTracer — all span operations are no-ops with near-zero overhead. No conditional checks needed in application code.
Feature Flag Dependency
The telemetry framework is gated behind FeatureFlags.TELEMETRY, which defaults to false. When disabled, TelemetryAwarePlugin.createComponents() is not called — the plugin falls back to the base Plugin.createComponents() path which does not receive Tracer or MetricsRegistry.
The SQL plugin must handle both paths:
- Implement both
Plugin.createComponents()andTelemetryAwarePlugin.createComponents() - When telemetry flag is off, default to
NoopTracerandNoopMetricsRegistry - This ensures the plugin works regardless of the feature flag state
Prerequisite: the telemetry feature flag must be flipped to true by default in a target release for this work to be useful in production.
API Stability Risk
TelemetryAwarePlugin is annotated @ExperimentalApi — the interface may change without semver guarantees within any release. Since the SQL plugin lives in a separate repository (opensearch-project/sql) with its own release cycle, a core OpenSearch patch release could break the SQL plugin's telemetry integration. Track the @ExperimentalApi → @PublicApi promotion timeline; defer P1 metrics migration until the interface stabilizes.
Async Span Propagation
Span context propagates via ThreadContextBasedTracerContextStorage on thread pool hops. Considerations:
- Cursor-based pagination: Each page fetch is a separate transport roundtrip. The root
sql/ppl.queryspan covers only the first execution. Subsequent cursor fetches create new root spans linked by acursor.idattribute — not one long-lived span. - Thread pool hops: SQL/PPL execution must go through OpenSearch's
ThreadContext-aware thread pools for automatic span propagation. CustomCompletableFuturechains or raw executors will silently lose span context. - Cross-node push-down: Operations pushed to data nodes are traced by OpenSearch's transport-layer instrumentation automatically. The coordinating node's
sql/ppl.executespan becomes the parent of the downstreamopensearch.searchspans.
P1: Metrics Migration
Current → New Mapping
| Current (custom) | New (MetricsRegistry) | Type |
|---|---|---|
REQ_TOTAL / PPL_REQ_TOTAL |
sql.query.total / ppl.query.total |
Counter |
REQ_COUNT_TOTAL / PPL_REQ_COUNT_TOTAL |
sql.query.count / ppl.query.count |
Counter |
FAILED_REQ_COUNT_SYS |
sql.query.error{type=system} |
Counter |
FAILED_REQ_COUNT_CUS |
ppl.query.error{type=client} |
Counter |
FAILED_REQ_COUNT_CB |
ppl.query.error{type=circuit_breaker} |
Counter |
| (new) | ppl.query.latency |
Histogram |
| (new) | ppl.query.parse.latency |
Histogram |
| (new) | ppl.query.optimize.latency |
Histogram |
| (new) | ppl.query.execute.latency |
Histogram |
Implementation
Create a dedicated metrics class following ClusterManagerMetrics pattern:
public class PPLQueryMetrics {
private final Counter queryTotal;
private final Counter queryErrorTotal;
private final Histogram queryLatency;
private final Histogram parseLatency;
private final Histogram executeLatency;
public PPLQueryMetrics(MetricsRegistry metricsRegistry) {
this.queryTotal = metricsRegistry.createCounter(
"ppl.query.total", "Total PPL queries", "1");
this.queryLatency = metricsRegistry.createHistogram(
"ppl.query.latency", "PPL query latency", "ms");
// ...
}
public void recordQuery(long latencyMs, boolean success) {
queryTotal.add(1);
queryLatency.record(latencyMs);
if (!success) queryErrorTotal.add(1, Tags.create().addTag("type", errorType));
}
}Migration Strategy
- Add new
MetricsRegistry-based metrics alongside existing custom metrics (dual-write) - Deprecate
/_plugins/_ppl/statsand/_plugins/_sql/statsendpoints - Remove custom
Metricssingleton in a future major version
Backward Compatibility
- The
/_plugins/_ppl/statsand/_plugins/_sql/statsendpoints continue to work during dual-write phase - New metrics are exported via the standard telemetry pipeline (OTel → OTLP) when
telemetry.feature.metrics.enabled=true - No behavior change when telemetry is disabled —
MetricsRegistryreturnsNoopCounter/NoopHistogram
Performance
7 spans per query. At 10K QPS = 70K spans/sec. Mitigations:
- When telemetry disabled (default):
NoopTracer— near-zero overhead,TraceableActionListener.create()short-circuits whentracer.isRecording() == false - When enabled: overhead is bounded by OTel SDK's
BatchSpanProcessor(async, non-blocking). Sampling rate (telemetry.tracer.sampler.probability, default 1%) limits actual export volume - Benchmark requirement: measure p50/p99 query latency with telemetry enabled vs disabled before merging Phase 2 (phase-level spans). Acceptable overhead target: <2% p99 regression
Testing Strategy
- Unit tests: verify spans are created with correct names, attributes, and parent-child relationships using
MockTracer/MockMetricsRegistry - Integration tests: verify end-to-end span export with
telemetry-otelplugin andLoggingSpanExporter - NoopTracer path: verify no NPEs or behavioral changes when telemetry feature flag is off
- Performance: benchmark query latency with/without telemetry on http_logs workload before each rollout phase
Rollout Plan
| Phase | Scope | Gate to Next Phase |
|---|---|---|
| Phase 1 | TelemetryAwarePlugin interface + top-level sql/ppl.query span |
Unit tests pass, NoopTracer path verified, no p99 regression |
| Phase 2 | Phase-level child spans (parse, analyze, optimize, compile, execute, materialize) | Benchmark: <2% p99 latency regression with telemetry enabled on http_logs workload |
| Phase 3 | Migrate counters to MetricsRegistry (dual-write with existing custom metrics) |
Existing /_plugins/_ppl/stats values match new OTel counters in integration test |
| Phase 4 | Add latency histograms per phase | Dashboard prototype confirms histograms produce actionable percentiles |
| Phase 5 | Deprecate custom stats endpoints, remove Metrics singleton |
One major version deprecation notice. Removal in next major version. |