Skip to content

Conversation

@bramwelt
Copy link
Contributor

This pull request introduces comprehensive OpenTelemetry (OTel) support to the lfx-v2-query-service, enabling distributed tracing, metrics, and log correlation for improved observability. It adds OTel configuration options to the Helm chart and application, instruments the HTTP server, and enhances logging with trace context. The most significant changes are grouped below.

OpenTelemetry integration and configuration:

  • Added a new utility package (pkg/utils/otel.go) that provides OTel configuration via environment variables, sets up OTel SDK (tracing, metrics, logs), and configures exporters and propagators. This enables flexible OTel setup based on deployment needs.
  • Updated the Helm chart (values.yaml and deployment.yaml) to allow configuring OTel environment variables such as service name, protocol, endpoint, exporters, and propagators, making OTel setup declarative and deployment-friendly. [1] [2]
  • Bumped chart version to 0.4.10 to reflect these configuration changes.

Application instrumentation and startup:

  • Modified cmd/main.go to initialize the OTel SDK on startup using the new utility, ensuring that all telemetry is exported as configured and that shutdown is handled gracefully. [1] [2]
  • Wrapped the HTTP server handler in OTel instrumentation (otelhttp.NewHandler) to automatically generate trace spans for incoming HTTP requests. [1] [2]

Logging improvements for trace correlation:

  • Enhanced the logging system to inject trace and span IDs into log records using the slog-otel handler and a new helper in pkg/logging/context.go, allowing logs to be correlated with distributed traces. [1] [2] [3]

Dependency updates:

  • Added and updated dependencies in go.mod to include OTel SDKs, exporters, and the slog-otel handler, ensuring all new functionality is supported.

These changes collectively provide robust observability for the service, making it easier to monitor, debug, and analyze system behavior in production environments.

Integrate slog-otel to automatically include trace_id and span_id
from OpenTelemetry context in log output, enabling correlation
between logs and distributed traces.

Changes:
- pkg/log/log.go: Wrap JSON handler with slogotel.OtelHandler
- pkg/utils/otel_test.go: Fix TestNewPropagator to pass OTelConfig
- charts/: Bump version to 0.4.10
- go.mod: Add github.com/remychantenay/slog-otel v1.3.4

Issue: LFXV2-609

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Signed-off-by: Trevor Bramwell <[email protected]>
@bramwelt bramwelt requested a review from a team as a code owner January 14, 2026 15:17
Copilot AI review requested due to automatic review settings January 14, 2026 15:17
@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

Walkthrough

This PR introduces OpenTelemetry SDK integration to the query service. Changes include a Helm chart version bump, new OTEL environment variable configuration in the deployment template and values, Go dependency additions for OpenTelemetry packages, core OTEL setup utilities with support for traces, metrics, and logs, HTTP request instrumentation, and logging enhancements to inject trace context.

Changes

Cohort / File(s) Change Summary
Helm Chart Infrastructure
charts/lfx-v2-query-service/Chart.yaml, charts/lfx-v2-query-service/templates/deployment.yaml, charts/lfx-v2-query-service/values.yaml
Chart version bumped from 0.4.9 to 0.4.10. Deployment template extended with extraEnv injection and conditional OTEL environment variables (service name, version, exporter protocol, endpoint, security, trace/metric/log exporters, propagators, sample ratio). Values file adds app.extraEnv array and app.otel configuration block with OpenTelemetry settings and defaults.
Go Module Dependencies
go.mod
Extensive dependency updates: testify upgraded v1.10.0 → v1.11.1; crypto shifted to v0.44.0. Added OpenTelemetry core packages (otel, otel/sdk, otel/trace, otel/metric, otel/log), instrumentation (otelhttp), OTLP exporters (otlptrace, otlpmetric, otlplog) with gRPC/HTTP transports, and Jaeger propagator. Multiple indirect dependencies added/updated for supporting libraries (grpc, protobuf, backoff, xxhash, httpsnoop, stdr, grpc-gateway, etc.).
Core OpenTelemetry Utilities
pkg/utils/otel.go, pkg/utils/otel_test.go
New otel.go file (391 lines) introduces OTelConfig struct, environment-based configuration parsing (OTelConfigFromEnv), and comprehensive SDK setup functions (SetupOTelSDK, SetupOTelSDKWithConfig) that initialize resource, propagators, and optional providers (traces, metrics, logs) with OTLP exporters, batching, and sampling. Private factories for each provider type with gRPC/HTTP protocol support. Matching test suite (227 lines) validates defaults, custom values, unsupported protocols, SDK initialization, resource creation, propagator composition, and constant values.
HTTP and Application Integration
cmd/http.go, cmd/main.go
HTTP handler wrapped with otelhttp.NewHandler for request tracing. Main startup sequence adds OTEL config retrieval from environment, SDK initialization immediately after context creation, deferred shutdown with error logging.
Logging Enhancement
pkg/log/log.go, pkg/logging/context.go
log.go imports slog-otel and wraps JSON handler with OtelHandler to attach trace_id and span_id from context. New context.go file (37 lines) exports LogAttrsFromContext() and LogWithContext() functions to extract and inject OpenTelemetry span context into slog attributes.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant OTel as OTEL SDK
    participant Providers as Providers<br/>(Trace/Metric/Log)
    participant Exporter as OTLP Exporter
    participant Handler as HTTP Handler

    App->>OTel: SetupOTelSDKWithConfig(ctx, config)
    OTel->>OTel: Create Resource<br/>(service name/version)
    OTel->>OTel: Create Propagator<br/>(tracecontext, baggage, jaeger)
    OTel->>Providers: Initialize TracerProvider<br/>(with OTLP exporter)
    OTel->>Providers: Initialize MeterProvider<br/>(with OTLP exporter)
    OTel->>Providers: Initialize LoggerProvider<br/>(with OTLP exporter)
    Providers->>Exporter: Configure OTLP transport<br/>(gRPC/HTTP endpoint)
    OTel-->>App: Return shutdown function
    
    App->>App: Start HTTP server
    App->>Handler: Wrap with otelhttp.NewHandler()
    
    App->>Handler: Incoming HTTP request
    Handler->>Providers: Extract/inject span context
    Handler->>Providers: Record trace data
    Handler-->>App: HTTP response
    Providers->>Exporter: Batch send traces/metrics/logs
    
    App->>OTel: Call shutdown function
    Providers->>Exporter: Flush pending data
    OTel-->>App: Shutdown complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'LFXV2-609 Add OpenTelemetry tracing support' accurately and concisely summarizes the main change: adding OpenTelemetry instrumentation to the service.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining all major components including OTel configuration, SDK initialization, HTTP instrumentation, logging enhancements, and dependency updates.
Linked Issues check ✅ Passed All primary objectives from LFXV2-609 are met: OTel libraries added to go.mod, environment-based configuration implemented, trace context propagation enabled via propagators, HTTP instrumentation added, and logging enhanced with trace/span ID correlation.
Out of Scope Changes check ✅ Passed All code changes are directly aligned with LFXV2-609 objectives: OTel setup, Helm chart configuration, HTTP instrumentation, and logging enhancements. Chart version bump is a necessary supporting change.
Docstring Coverage ✅ Passed Docstring coverage is 95.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bramwelt/otel-implementation

🧹 Recent nitpick comments
pkg/logging/context.go (1)

28-37: Consider reusing LogAttrsFromContext for DRY.

The span context extraction logic is duplicated. You could simplify LogWithContext by reusing the existing function:

♻️ Optional refactor
 func LogWithContext(ctx context.Context, logger *slog.Logger) *slog.Logger {
-	spanCtx := trace.SpanContextFromContext(ctx)
-	if !spanCtx.IsValid() {
+	attrs := LogAttrsFromContext(ctx)
+	if attrs == nil {
 		return logger
 	}
-	return logger.With(
-		slog.String("trace_id", spanCtx.TraceID().String()),
-		slog.String("span_id", spanCtx.SpanID().String()),
-	)
+	// Convert []slog.Attr to []any for With()
+	args := make([]any, len(attrs))
+	for i, attr := range attrs {
+		args[i] = attr
+	}
+	return logger.With(args...)
 }

That said, the current approach avoids the slice conversion overhead, so keeping it as-is is also reasonable.

pkg/log/log.go (1)

104-109: Handler chaining is correct.

The wrapping order ensures both OTEL trace context (trace_id, span_id) and custom context attributes are added to log records.

Note: The exported helpers in pkg/logging/context.go (LogWithContext, LogAttrsFromContext) both extract trace_id and span_id from context, duplicating what OtelHandler already injects. If these are used alongside the default logger with OtelHandler active, duplicate trace attributes will be added to logs. Consider documenting the intended usage:

  • Use the default logger when OtelHandler is active (automatic trace injection)
  • Use LogWithContext or LogAttrsFromContext only when a custom logger without OtelHandler is needed
pkg/utils/otel.go (1)

284-302: Unsupported protocol values silently fall back to gRPC.

If cfg.Protocol is neither "http" nor "grpc" (e.g., a typo like "grpcs"), the code silently uses gRPC. This is consistent across all three provider functions but may mask configuration errors.

Consider logging a warning when the protocol isn't explicitly "grpc" but the gRPC path is taken, or validate the protocol value in OTelConfigFromEnv.

🔧 Optional: Add protocol validation in OTelConfigFromEnv
 	protocol := os.Getenv("OTEL_EXPORTER_OTLP_PROTOCOL")
 	if protocol == "" {
 		protocol = OTelProtocolGRPC
+	} else if protocol != OTelProtocolGRPC && protocol != OTelProtocolHTTP {
+		slog.Warn("unsupported OTEL_EXPORTER_OTLP_PROTOCOL, defaulting to grpc",
+			"provided-value", protocol)
+		protocol = OTelProtocolGRPC
 	}
pkg/utils/otel_test.go (2)

13-43: Missing assertion for Propagators default value.

The test verifies defaults for all config fields except Propagators, which defaults to "tracecontext,baggage".

✅ Add missing Propagators assertion
 	if cfg.LogsExporter != OTelExporterNone {
 		t.Errorf("expected default LogsExporter %q, got %q", OTelExporterNone, cfg.LogsExporter)
 	}
+	if cfg.Propagators != "tracecontext,baggage" {
+		t.Errorf("expected default Propagators 'tracecontext,baggage', got %q", cfg.Propagators)
+	}
 }

47-87: Missing test for OTEL_PROPAGATORS environment variable.

The test sets and verifies all OTEL environment variables except OTEL_PROPAGATORS.

✅ Add OTEL_PROPAGATORS test
 	t.Setenv("OTEL_LOGS_EXPORTER", "otlp")
+	t.Setenv("OTEL_PROPAGATORS", "tracecontext,jaeger")

 	cfg := OTelConfigFromEnv()
 	// ... existing assertions ...
 	if cfg.LogsExporter != OTelExporterOTLP {
 		t.Errorf("expected LogsExporter %q, got %q", OTelExporterOTLP, cfg.LogsExporter)
 	}
+	if cfg.Propagators != "tracecontext,jaeger" {
+		t.Errorf("expected Propagators 'tracecontext,jaeger', got %q", cfg.Propagators)
+	}
cmd/main.go (1)

52-65: Add timeout to OTel shutdown context.

The deferred shutdown uses context.Background() which has no deadline. If the OTLP collector is unreachable during shutdown, this could block indefinitely and delay process termination.

⏱️ Add shutdown timeout
 	// Handle shutdown properly so nothing leaks.
 	defer func() {
-		if shutdownErr := otelShutdown(context.Background()); shutdownErr != nil {
+		shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		defer cancel()
+		if shutdownErr := otelShutdown(shutdownCtx); shutdownErr != nil {
 			slog.ErrorContext(ctx, "error shutting down OpenTelemetry SDK", "error", shutdownErr)
 		}
 	}()

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 963b99e and 47beafa.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • charts/lfx-v2-query-service/Chart.yaml
  • charts/lfx-v2-query-service/templates/deployment.yaml
  • charts/lfx-v2-query-service/values.yaml
  • cmd/http.go
  • cmd/main.go
  • go.mod
  • pkg/log/log.go
  • pkg/logging/context.go
  • pkg/utils/otel.go
  • pkg/utils/otel_test.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*_test.go

📄 CodeRabbit inference engine (CLAUDE.md)

Name Go test files with the *_test.go suffix and keep them alongside implementation files

Files:

  • pkg/utils/otel_test.go
cmd/main.go

📄 CodeRabbit inference engine (CLAUDE.md)

Inject concrete implementations via dependency injection in cmd/main.go

Files:

  • cmd/main.go
🧬 Code graph analysis (2)
pkg/utils/otel_test.go (1)
pkg/utils/otel.go (7)
  • OTelConfigFromEnv (84-161)
  • OTelProtocolGRPC (34-34)
  • OTelExporterNone (41-41)
  • OTelProtocolHTTP (36-36)
  • OTelExporterOTLP (39-39)
  • OTelConfig (45-80)
  • SetupOTelSDKWithConfig (171-239)
cmd/main.go (1)
pkg/utils/otel.go (2)
  • OTelConfigFromEnv (84-161)
  • SetupOTelSDKWithConfig (171-239)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Upload results
🔇 Additional comments (21)
charts/lfx-v2-query-service/Chart.yaml (1)

8-8: LGTM!

Chart version bump to 0.4.10 appropriately reflects the new OpenTelemetry configuration capabilities being added.

charts/lfx-v2-query-service/values.yaml (1)

36-50: Well-structured OTEL configuration with sensible defaults.

Good choices:

  • Exporters default to "none" (opt-in)
  • insecure: "false" is secure by default
  • Common propagators included

The empty serviceName and endpoint defaults mean OTEL won't export data without explicit configuration, which is appropriate for production safety.

charts/lfx-v2-query-service/templates/deployment.yaml (2)

34-36: LGTM!

Correctly uses with to conditionally inject extra environment variables with proper indentation.


37-76: OTEL environment variable configuration looks good.

The conditional blocks will emit env vars when defaults are set (e.g., protocol: "grpc", tracesExporter: "none"), which provides explicit configuration to the application. This is appropriate since the application code can interpret values like "none" to disable specific exporters.

pkg/logging/context.go (1)

15-24: LGTM!

Clean implementation that correctly validates the span context before extracting trace IDs. Returning nil for invalid spans is appropriate.

pkg/log/log.go (1)

11-12: slog-otel library is suitable for production use.

The github.com/remychantenay/slog-otel library (v1.3.4) is published, documented, and in active use across multiple production projects. The library provides well-defined OpenTelemetry integration for structured logging. Maintenance is moderate—last update ~8 months ago with occasional activity—so monitor upstream for security updates and compatibility changes, but this does not prevent adoption.

pkg/utils/otel.go (6)

32-42: LGTM!

Constants are well-defined with clear documentation. The protocol and exporter options align with standard OTLP configuration patterns.


44-80: LGTM!

The OTelConfig struct is comprehensive and well-documented with environment variable mappings and default values in the comments. This provides good self-documentation for operators.


171-239: LGTM!

The setup function follows good patterns:

  • Accumulates shutdown functions and joins errors on cleanup
  • Handles partial initialization failures by cleaning up already-initialized components
  • Sets the propagator regardless of exporter configuration (correct for context propagation)

The shutdown closure correctly captures and clears shutdownFuncs to prevent double-shutdown.


241-251: LGTM!

The resource creation correctly merges the default resource (which includes SDK and host info) with service-specific attributes using semantic conventions.


253-277: LGTM!

The propagator configuration handles edge cases well:

  • Trims whitespace for flexible input parsing
  • Falls back to W3C defaults (tracecontext + baggage) when no valid propagators are configured
  • Warns but doesn't fail on unknown propagator names

136-147: No action needed. The slog.With(...).Debug("OTelConfig") pattern is valid and will emit a log message with all the attached attributes. This is the correct way to log with key-value pairs using the Go slog package.

Likely an incorrect or invalid review comment.

pkg/utils/otel_test.go (5)

89-99: LGTM!

This test correctly documents the current behavior where unsupported protocol values pass through unchanged (and will default to gRPC in the provider functions).


101-131: LGTM!

Good test for the disabled-exporters scenario, which is the default configuration. This verifies the SDK can be initialized and shut down cleanly when no exporters are active.


133-176: LGTM!

Good table-driven test covering basic, empty version, and special character cases. The test correctly verifies the service.name attribute is set.


178-210: LGTM!

The test correctly verifies that the composite propagator includes the expected W3C Trace Context and Baggage fields.


212-227: LGTM!

Good contract test ensuring the exported constants maintain their expected values for API compatibility.

cmd/http.go (2)

18-19: LGTM!

Import for OpenTelemetry HTTP instrumentation.


78-79: HTTP instrumentation correctly applied.

The otelhttp.NewHandler wrapper is placed at the outermost layer (after other middleware), which is correct for capturing the full request lifecycle including middleware processing.

Note: The static operation name "query-service" means all spans will share this name. The otelhttp handler automatically adds HTTP attributes (method, route, status) to distinguish requests. If more granular span names are desired later, consider using otelhttp.WithSpanNameFormatter.

cmd/main.go (1)

20-21: LGTM!

Import for the new OTel utilities package.

go.mod (1)

13-28: OpenTelemetry dependencies are well-structured.

The dependency set includes all necessary OTel components: core SDK, exporters (traces/metrics/logs), HTTP instrumentation, and the Jaeger propagator. The slog-otel library enables log-trace correlation.

The exporter versions differ (traces: v1.35.0, metrics: v1.36.0, logs: v0.15.0) which is expected as modules within the OpenTelemetry Go repository are released independently. Log exporters remain experimental (v0.x), while trace and metric exporters are stable (v1.x).

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces comprehensive OpenTelemetry (OTel) support to the lfx-v2-query-service, enabling distributed tracing, metrics, and log correlation. The implementation includes a new OTel utility package, HTTP instrumentation, enhanced logging with trace context, and declarative Helm chart configuration.

Changes:

  • Added OTel SDK setup with configurable exporters for traces, metrics, and logs
  • Instrumented HTTP server with automatic span generation for incoming requests
  • Enhanced logging to inject trace and span IDs for correlation with distributed traces

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/utils/otel.go New utility package providing OTel configuration and SDK initialization with support for OTLP exporters
pkg/utils/otel_test.go Comprehensive test coverage for OTel configuration parsing, SDK setup, and provider functions
pkg/logging/context.go New helpers to extract trace context from context and add to log attributes
pkg/log/log.go Wrapped logging handler with slog-otel to automatically inject trace context into logs
cmd/main.go Initialize OTel SDK on startup with proper shutdown handling
cmd/http.go Wrap HTTP handler with OTel instrumentation for automatic trace span generation
charts/lfx-v2-query-service/values.yaml Added OTel configuration options with sensible defaults
charts/lfx-v2-query-service/templates/deployment.yaml Inject OTel environment variables into deployment based on Helm values
charts/lfx-v2-query-service/Chart.yaml Bump chart version to 0.4.10
go.mod, go.sum Added OTel SDK dependencies and updated related packages

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +15 to +24
func LogAttrsFromContext(ctx context.Context) []slog.Attr {
spanCtx := trace.SpanContextFromContext(ctx)
if !spanCtx.IsValid() {
return nil
}
return []slog.Attr{
slog.String("trace_id", spanCtx.TraceID().String()),
slog.String("span_id", spanCtx.SpanID().String()),
}
}
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogAttrsFromContext function lacks test coverage. Given that other packages in this repository have comprehensive test coverage (e.g., pkg/utils/otel_test.go, pkg/paging/codec_test.go), this function should also have tests covering scenarios like valid span context, invalid span context, and nil context.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +37
func LogWithContext(ctx context.Context, logger *slog.Logger) *slog.Logger {
spanCtx := trace.SpanContextFromContext(ctx)
if !spanCtx.IsValid() {
return logger
}
return logger.With(
slog.String("trace_id", spanCtx.TraceID().String()),
slog.String("span_id", spanCtx.SpanID().String()),
)
}
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogWithContext function lacks test coverage. Tests should verify that trace context is correctly added to the logger when a valid span context exists, and that the original logger is returned unchanged when no valid span context is present.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants