Skip to content

RFC: OpenSearch SQL/PPL Telemetry Integration #5300

@penghuo

Description

@penghuo

Problem Statement

OpenSearch SQL/PPL plugin has no integration with OpenSearch's core telemetry framework. There is no distributed tracing for query execution, making it difficult to diagnose latency issues across the parse → optimize → compile → execute pipeline. The existing metrics implementation uses a custom Metrics singleton with BasicCounter/RollingCounter exposed via /_plugins/_ppl/stats and /_plugins/_sql/stats, which is disconnected from the standard OpenSearch telemetry export pipeline (OTel SDK → OTLP → observability backends).

Goals

  • P0: Add distributed tracing spans to SQL/PPL query execution pipeline
  • P1: Migrate existing custom metrics to OpenSearch's MetricsRegistry (OTel-backed)

Non-Goals

  • Changing the telemetry framework itself
  • Adding telemetry to the sandbox analytics-engine (Calcite-based prototype)

Background

OpenSearch Telemetry Framework

OpenSearch provides a backend-agnostic telemetry framework:

  • libs/telemetry/ — interfaces: Tracer, Span, SpanScope, MetricsRegistry, Counter, Histogram
  • server/ — wiring: TelemetryModule, TracerFactory, WrappedTracer, SpanBuilder
  • plugins/telemetry-otel/ — OTel SDK implementation that exports via BatchSpanProcessor (traces) and PeriodicMetricReader (metrics)

Plugins access the framework by implementing TelemetryAwarePlugin, which injects Tracer and MetricsRegistry via createComponents().

Current SQL/PPL Metrics

The SQL plugin (opensearch-project/sql) uses a custom metrics system:

Component Description
Metrics singleton Global metrics registry with getInstance()
MetricName enum REQ_TOTAL, REQ_COUNT_TOTAL, FAILED_REQ_COUNT_*, PPL_REQ_TOTAL, PPL_REQ_COUNT_TOTAL, etc.
BasicCounter / RollingCounter Custom counter implementations
RestSQLStatsAction / RestPPLStatsAction Exposes metrics via /_plugins/_sql/stats and /_plugins/_ppl/stats

These metrics are not exported to OTel backends and cannot be correlated with OpenSearch server-level telemetry.

SQL/PPL Query Execution Pipeline

PPL text
  → Parse (PPLSyntaxParser → UnresolvedPlan AST)
  → Analyze (CalciteRelNodeVisitor → RelNode logical plan)
  → Push-down Optimize (PushDownPlanner → mixed plan with boundary nodes)
  → Compile (OpenSearchQueryCompiler → PreparedStatement)
  → Execute (PreparedStatement.executeQuery() → ResultSet)
  → Materialize (ResultSet → PPLResponse)

P0: Distributed Tracing

Interface Change

The SQL plugin class must implement TelemetryAwarePlugin:

public class SQLPlugin extends Plugin implements ScriptPlugin,
    ActionPlugin, TelemetryAwarePlugin {

    private Tracer tracer;
    private MetricsRegistry metricsRegistry;

    @Override
    public Collection<Object> createComponents(
        ...,
        Tracer tracer,
        MetricsRegistry metricsRegistry
    ) {
        this.tracer = tracer;
        this.metricsRegistry = metricsRegistry;
        // pass to internal components
    }
}

Note: TelemetryAwarePlugin is annotated @ExperimentalApi — the interface may evolve.

Span Hierarchy

Root span + 6 child spans per query (7 total):

sql/ppl.query                          ← root span, SpanKind.SERVER (TransportAction.doExecute)
  ├── sql/ppl.parse                    ← INTERNAL: PPLSyntaxParser + AST build
  ├── sql/ppl.analyze                  ← INTERNAL: CalciteRelNodeVisitor → logical plan
  ├── sql/ppl.optimize                 ← INTERNAL: PushDownPlanner push-down rules
  ├── sql/ppl.compile                  ← INTERNAL: OpenSearchQueryCompiler → PreparedStatement
  ├── sql/ppl.execute                  ← INTERNAL: PreparedStatement.executeQuery()
  │     └── opensearch.search          ← child spans from OpenSearch search (automatic)
  └── sql/ppl.materialize              ← INTERNAL: ResultSet → response

Only the root span uses SpanKind.SERVER (it represents a received request). All phase spans use SpanKind.INTERNAL — they are in-process operations, not network boundaries.

Span Attributes

Attribute Example Span
db.query.text source=logs | where ... | stats count() by host (sanitized — literals/values stripped) sql/ppl.query
db.query.type ppl or sql sql/ppl.query
db.query.id a1b2c3d4 sql/ppl.query
db.collection.name logs sql/ppl.query
db.query.datasource prometheus, s3, opensearch sql/ppl.query
ppl.plan.node_count 7 sql/ppl.optimize
ppl.plan.pushed_down filter,aggregation sql/ppl.optimize
ppl.execute.rows 1024 sql/ppl.materialize
ppl.cache.hit true / false sql/ppl.compile
error true any span on failure
error.type SemanticCheckException any span on failure

Attribute naming follows OTel semantic conventions for database spans where applicable (db.* prefix).

Implementation Pattern

Follow TransportSearchAction pattern:

// TransportPPLQueryAction.doExecute()
Span span = tracer.startSpan(
    SpanCreationContext.server().name("sql/ppl.query")
        .attributes(Attributes.create()
            .addAttribute("db.query.type", "ppl")
            .addAttribute("db.query.text", sanitize(request.getQuery())))
);
try (SpanScope scope = tracer.withSpanInScope(span)) {
    ActionListener<PPLQueryResponse> tracedListener =
        TraceableActionListener.create(listener, span, tracer);
    queryService.execute(request, tracedListener);
} catch (Exception e) {
    span.setError(e);
    span.endSpan();
    listener.onFailure(e);
}

Phase-level spans inside UnifiedQueryService.execute():

// Each phase follows this pattern
Span parseSpan = tracer.startSpan(
    SpanCreationContext.internal().name("sql/ppl.parse"));
try (SpanScope s = tracer.withSpanInScope(parseSpan)) {
    ast = parser.parse(query);
} catch (Exception e) {
    parseSpan.setError(e);
    throw e;
} finally {
    parseSpan.endSpan();
}

Graceful Degradation

When telemetry is disabled (default), Tracer is NoopTracer — all span operations are no-ops with near-zero overhead. No conditional checks needed in application code.

Feature Flag Dependency

The telemetry framework is gated behind FeatureFlags.TELEMETRY, which defaults to false. When disabled, TelemetryAwarePlugin.createComponents() is not called — the plugin falls back to the base Plugin.createComponents() path which does not receive Tracer or MetricsRegistry.

The SQL plugin must handle both paths:

  • Implement both Plugin.createComponents() and TelemetryAwarePlugin.createComponents()
  • When telemetry flag is off, default to NoopTracer and NoopMetricsRegistry
  • This ensures the plugin works regardless of the feature flag state

Prerequisite: the telemetry feature flag must be flipped to true by default in a target release for this work to be useful in production.

API Stability Risk

TelemetryAwarePlugin is annotated @ExperimentalApi — the interface may change without semver guarantees within any release. Since the SQL plugin lives in a separate repository (opensearch-project/sql) with its own release cycle, a core OpenSearch patch release could break the SQL plugin's telemetry integration. Track the @ExperimentalApi@PublicApi promotion timeline; defer P1 metrics migration until the interface stabilizes.

Async Span Propagation

Span context propagates via ThreadContextBasedTracerContextStorage on thread pool hops. Considerations:

  • Cursor-based pagination: Each page fetch is a separate transport roundtrip. The root sql/ppl.query span covers only the first execution. Subsequent cursor fetches create new root spans linked by a cursor.id attribute — not one long-lived span.
  • Thread pool hops: SQL/PPL execution must go through OpenSearch's ThreadContext-aware thread pools for automatic span propagation. Custom CompletableFuture chains or raw executors will silently lose span context.
  • Cross-node push-down: Operations pushed to data nodes are traced by OpenSearch's transport-layer instrumentation automatically. The coordinating node's sql/ppl.execute span becomes the parent of the downstream opensearch.search spans.

P1: Metrics Migration

Current → New Mapping

Current (custom) New (MetricsRegistry) Type
REQ_TOTAL / PPL_REQ_TOTAL sql.query.total / ppl.query.total Counter
REQ_COUNT_TOTAL / PPL_REQ_COUNT_TOTAL sql.query.count / ppl.query.count Counter
FAILED_REQ_COUNT_SYS sql.query.error{type=system} Counter
FAILED_REQ_COUNT_CUS ppl.query.error{type=client} Counter
FAILED_REQ_COUNT_CB ppl.query.error{type=circuit_breaker} Counter
(new) ppl.query.latency Histogram
(new) ppl.query.parse.latency Histogram
(new) ppl.query.optimize.latency Histogram
(new) ppl.query.execute.latency Histogram

Implementation

Create a dedicated metrics class following ClusterManagerMetrics pattern:

public class PPLQueryMetrics {
    private final Counter queryTotal;
    private final Counter queryErrorTotal;
    private final Histogram queryLatency;
    private final Histogram parseLatency;
    private final Histogram executeLatency;

    public PPLQueryMetrics(MetricsRegistry metricsRegistry) {
        this.queryTotal = metricsRegistry.createCounter(
            "ppl.query.total", "Total PPL queries", "1");
        this.queryLatency = metricsRegistry.createHistogram(
            "ppl.query.latency", "PPL query latency", "ms");
        // ...
    }

    public void recordQuery(long latencyMs, boolean success) {
        queryTotal.add(1);
        queryLatency.record(latencyMs);
        if (!success) queryErrorTotal.add(1, Tags.create().addTag("type", errorType));
    }
}

Migration Strategy

  1. Add new MetricsRegistry-based metrics alongside existing custom metrics (dual-write)
  2. Deprecate /_plugins/_ppl/stats and /_plugins/_sql/stats endpoints
  3. Remove custom Metrics singleton in a future major version

Backward Compatibility

  • The /_plugins/_ppl/stats and /_plugins/_sql/stats endpoints continue to work during dual-write phase
  • New metrics are exported via the standard telemetry pipeline (OTel → OTLP) when telemetry.feature.metrics.enabled=true
  • No behavior change when telemetry is disabled — MetricsRegistry returns NoopCounter/NoopHistogram

Performance

7 spans per query. At 10K QPS = 70K spans/sec. Mitigations:

  • When telemetry disabled (default): NoopTracer — near-zero overhead, TraceableActionListener.create() short-circuits when tracer.isRecording() == false
  • When enabled: overhead is bounded by OTel SDK's BatchSpanProcessor (async, non-blocking). Sampling rate (telemetry.tracer.sampler.probability, default 1%) limits actual export volume
  • Benchmark requirement: measure p50/p99 query latency with telemetry enabled vs disabled before merging Phase 2 (phase-level spans). Acceptable overhead target: <2% p99 regression

Testing Strategy

  • Unit tests: verify spans are created with correct names, attributes, and parent-child relationships using MockTracer / MockMetricsRegistry
  • Integration tests: verify end-to-end span export with telemetry-otel plugin and LoggingSpanExporter
  • NoopTracer path: verify no NPEs or behavioral changes when telemetry feature flag is off
  • Performance: benchmark query latency with/without telemetry on http_logs workload before each rollout phase

Rollout Plan

Phase Scope Gate to Next Phase
Phase 1 TelemetryAwarePlugin interface + top-level sql/ppl.query span Unit tests pass, NoopTracer path verified, no p99 regression
Phase 2 Phase-level child spans (parse, analyze, optimize, compile, execute, materialize) Benchmark: <2% p99 latency regression with telemetry enabled on http_logs workload
Phase 3 Migrate counters to MetricsRegistry (dual-write with existing custom metrics) Existing /_plugins/_ppl/stats values match new OTel counters in integration test
Phase 4 Add latency histograms per phase Dashboard prototype confirms histograms produce actionable percentiles
Phase 5 Deprecate custom stats endpoints, remove Metrics singleton One major version deprecation notice. Removal in next major version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions