Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 2, 2025

  • Add OpenTelemetry dependencies to go.mod
  • Create tracing configuration structure in config package
  • Implement tracing initialization and provider setup in observability package
  • Add span utilities and context propagation helpers
  • Instrument request headers with trace context extraction
  • Add tracing to jailbreak detection
  • Add tracing to cache operations
  • Add comprehensive tracing to classification operations
  • Add tracing to PII detection
  • Add tracing to routing decisions and backend selection
  • Add tracing to system prompt injection
  • Update configuration files with tracing settings
  • Add unit tests for tracing functionality
  • Add example configuration for production deployment
  • Update documentation with tracing examples and usage guide
  • Fix broken documentation link (configuration.md)
  • Run go mod tidy to properly organize dependencies
  • Fix markdown lint errors and remove unnecessary file
  • Fix OTLP exporter to prevent test panics
  • Fix StartSpan to handle nil context

Summary

Fixed StartSpan to handle nil context gracefully:

  • Added nil check in StartSpan to use context.Background() when ctx is nil
  • This prevents panic when RequestContext.TraceContext is not initialized
  • Added test case TestStartSpanWithNilContext to ensure proper handling
  • All observability tests pass successfully
Original prompt

This section details on the original issue you should resolve

<issue_title>Distributed Tracing Support for Fine-Grained Observability</issue_title>
<issue_description>### Is your feature request related to a problem? Please describe.

Currently, vLLM Semantic Router provides basic observability through Prometheus metrics and structured logging. However, these approaches have limitations when it comes to understanding the complete request lifecycle across distributed components:

  • Limited Request Context: Metrics provide aggregated data but lack per-request visibility into the routing decision flow
  • Difficult Root Cause Analysis: When issues occur (e.g., high latency, routing errors), it's challenging to trace the exact path a request took through classification, routing, security checks, and backend selection
  • No Cross-Service Correlation: As the system integrates with vLLM engines and other components, there's no unified way to correlate traces across service boundaries
  • Missing Fine-Grained Timing: While we have overall latency metrics, we lack detailed breakdowns of time spent in each processing stage (classification, PII detection, jailbreak detection, cache lookup, model selection, etc.)

This becomes especially problematic when:

  • Debugging production issues where specific requests fail or perform poorly
  • Optimizing the routing pipeline by identifying bottlenecks
  • Understanding the impact of different routing strategies on end-to-end latency
  • Integrating with the broader vLLM Production Stack for unified observability

Describe the solution you'd like

Implement comprehensive distributed tracing support using industry-standard OpenTelemetry instrumentation, leveraging either:

  1. OpenInference (https://github.com/Arize-ai/openinference) - Specialized for LLM observability with semantic conventions for AI/ML workloads
  2. OpenLLMetry (https://github.com/traceloop/openllmetry) - Purpose-built for LLM application tracing with automatic instrumentation

Key Implementation Requirements:

1. Core Tracing Infrastructure

  • Integrate OpenTelemetry SDK for Go (main router service)
  • Add trace context propagation through HTTP headers and gRPC metadata
  • Support multiple trace exporters (OTLP, Jaeger, Zipkin)
  • Configure sampling strategies (always-on for development, probabilistic for production)

2. Instrumentation Points

Instrument the following critical paths with spans:

Request Processing Pipeline:

  • semantic_router.request.received - Entry point span
  • semantic_router.classification - Category classification with model name and confidence
  • semantic_router.security.pii_detection - PII detection with results
  • semantic_router.security.jailbreak_detection - Jailbreak detection with results
  • semantic_router.cache.lookup - Semantic cache operations
  • semantic_router.routing.decision - Model selection logic with reasoning
  • semantic_router.backend.selection - Endpoint selection
  • semantic_router.upstream.request - Forwarding to vLLM backend
  • semantic_router.response.processing - Response handling

Span Attributes (following OpenInference conventions):

  • Request metadata: request_id, user_id, session_id
  • Model information: model.name, model.provider, model.version
  • Classification: category.name, category.confidence, classifier.type
  • Routing: routing.strategy, routing.reason, original_model, selected_model
  • Security: pii.detected, jailbreak.detected, security.action
  • Performance: token.count.prompt, token.count.completion, cache.hit
  • Reasoning: reasoning.enabled, reasoning.effort, reasoning.family

3. Integration with vLLM Production Stack

  • Propagate trace context to vLLM engine requests
  • Correlate router traces with vLLM engine traces
  • Support unified trace visualization across the full stack
  • Enable end-to-end latency analysis from router to model inference

4. Configuration

Add tracing configuration to config.yaml:

observability:
  tracing:
    enabled: true
    provider: "opentelemetry"  # or "openinference", "openllmetry"
    exporter:
      type: "otlp"  # otlp, jaeger, zipkin, stdout
      endpoint: "localhost:4317"
      insecure: true
    sampling:
      type: "probabilistic"  # always_on, always_off, probabilistic
      rate: 0.1  # 10% sampling for production
    resource:
      service_name: "vllm-semantic-router"
      service_version: "v0.1.0"
      deployment_environment: "production"

5. Visualization and Analysis

  • Provide example Jaeger/Grafana Tempo configurations
  • Create sample trace queries for common debugging scenarios
  • Document trace analysis workflows for performance optimization
  • Include dashboard templates for trace-based metrics

6. Performance Considerations

  • Minimize tracing overhead (< 1% latency impact)
  • Use async span export to avoid blocking request processing
  • Implement efficient span batching
  • Su...
Fixes #311

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@netlify
Copy link

netlify bot commented Oct 2, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit ebb269f
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68dfef28a8b91f0008ae64da
😎 Deploy Preview https://deploy-preview-322--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot AI changed the title [WIP] Distributed Tracing Support for Fine-Grained Observability Add OpenTelemetry Distributed Tracing for Fine-Grained Observability Oct 2, 2025
Copilot AI requested a review from rootfs October 2, 2025 23:18
Copilot finished work on behalf of rootfs October 2, 2025 23:18
@github-actions
Copy link

github-actions bot commented Oct 2, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 config

Owners: @rootfs
Files changed:

  • config/config.development.yaml
  • config/config.production.yaml
  • config/config.yaml

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/docker-compose.tracing.yaml
  • deploy/tracing/README.md

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/observability/propagation.go
  • src/semantic-router/pkg/observability/tracing.go
  • src/semantic-router/pkg/observability/tracing_test.go
  • src/semantic-router/cmd/main.go
  • src/semantic-router/go.mod
  • src/semantic-router/go.sum
  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/extproc/request_handler.go

📁 website

Owners: @Xunzhuo
Files changed:

  • website/docs/tutorials/observability/distributed-tracing.md
  • website/docs/tutorials/observability/tracing-quickstart.md

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • README.md

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Oct 2, 2025

@copilot fix the doc errors:

7:18:06 PM: Exhaustive list of all broken links found:
7:18:06 PM: - Broken link on source page path = /docs/tutorials/observability/tracing-quickstart:
7:18:06 PM: -> linking to ../../api/configuration.md (resolved as: /docs/api/configuration.md)

Copy link
Contributor Author

Copilot AI commented Oct 3, 2025

@copilot fix the doc errors:

7:18:06 PM: Exhaustive list of all broken links found:
...

Fixed in commit 57b733b. Changed the broken link from ../../api/configuration.md to ../../installation/configuration.md which is the correct location of the configuration documentation.

Copilot finished work on behalf of rootfs October 3, 2025 00:02

require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cenkalti/backoff/v5 v5.0.3 // indirect
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: go.mod or go.sum files are not tidy in src/semantic-router. Please run 'go mod tidy' in src/semantic-router directory and commit the changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot ERROR: go.mod or go.sum files are not tidy in src/semantic-router. Please run 'go mod tidy' in src/semantic-router directory and commit the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit c054384. Ran go mod tidy which properly organized the dependencies - moved OpenTelemetry packages from indirect to direct dependencies and removed unused imports.

@@ -0,0 +1,110 @@
# Quick Start: Distributed Tracing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 9cde50a. Added blank lines between list items and code blocks to comply with markdown linting rules (MD031/blanks-around-fences).

@@ -0,0 +1,257 @@
# Distributed Tracing Feature Summary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot this file is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in commit 9cde50a.

Copilot AI requested a review from rootfs October 3, 2025 14:03
Copilot finished work on behalf of rootfs October 3, 2025 14:03
Copilot finished work on behalf of rootfs October 3, 2025 14:07
@rootfs
Copy link
Collaborator

rootfs commented Oct 3, 2025

Copy link
Contributor Author

Copilot AI commented Oct 3, 2025

@copilot test panicked in createspan: https://github.com/vllm-project/semantic-router/actions/runs/18224466353/job/51892070297?pr=322

Fixed in commit 06643c0. The OTLP exporter was using grpc.WithBlock() which caused blocking behavior and panics when the collector was unavailable. Removed the blocking dial option to allow asynchronous connection, preventing test failures in CI environments.

Copilot finished work on behalf of rootfs October 3, 2025 14:23
@rootfs
Copy link
Collaborator

rootfs commented Oct 3, 2025

@copilot the panick stack is:
• [PANICKED] [0.000 seconds]
Request Processing handleResponseBody with category-specific system prompt [It] should replace existing system prompt with category-specific one
/home/runner/work/semantic-router/semantic-router/src/semantic-router/pkg/extproc/request_processing_test.go:518

[PANICKED] Test Panicked
In [It] at: /opt/hostedtoolcache/go/1.24.7/x64/src/context/context.go:720 @ 10/03/25 14:29:32.659

cannot create context from nil parent

Full Stack Trace
context.WithValue({0x0?, 0x0?}, {0x1000fc0?, 0x13fe1b8?}, {0x121e540?, 0xc00063df20?})
/opt/hostedtoolcache/go/1.24.7/x64/src/context/context.go:720 +0x14d
go.opentelemetry.io/otel/trace.ContextWithSpan(...)
/home/runner/go/pkg/mod/go.opentelemetry.io/otel/[email protected]/context.go:14
go.opentelemetry.io/otel/internal/global.(*tracer).newSpan(0xc0004e0af0, {0x0, 0x0}, 0x816089?, {0x1282c34?, 0x126f2ad?}, {0x0?, 0x0?, 0x0?})
/home/runner/go/pkg/mod/go.opentelemetry.io/[email protected]/internal/global/trace.go:186 +0x1de
go.opentelemetry.io/otel/internal/global.(*tracer).Start(0x126f2ad?, {0x0?, 0x0?}, {0x1282c34?, 0x1a?}, {0x0?, 0x0?, 0x0?})
/home/runner/go/pkg/mod/go.opentelemetry.io/[email protected]/internal/global/trace.go:150 +0xad
github.com/vllm-project/semantic-router/src/semantic-router/pkg/observability.StartSpan({0x0, 0x0}, {0x1282c34, 0x1e}, {0x0, 0x0, 0x0})
/home/runner/work/semantic-router/semantic-router/src/semantic-router/pkg/observability/tracing.go:155 +0xca
github.com/vllm-project/semantic-router/src/semantic-router/pkg/extproc.(*OpenAIRouter).handleModelRouting(0xc0004e0eb0, 0xc000218408, {0xc00059118a, 0x4}, {0xc0003a91e0, 0x1e}, {0xc000362830, 0x1, 0x1}, 0xc0006f5e80)
/home/runner/work/semantic-router/semantic-router/src/semantic-router/pkg/extproc/request_handler.go:564 +0x248
github.com/vllm-project/semantic-router/src/semantic-router/pkg/extproc.(*OpenAIRouter).handleRequestBody(0xc0004e0eb0, 0xc000089e68, 0xc0006f5e80)
/home/runner/work/semantic-router/semantic-router/src/semantic-router/pkg/extproc/request_handler.go:408 +0x5e5
github.com/vllm-project/semantic-router/src/semantic-router/pkg/extproc.(*OpenAIRouter).HandleRequestBody(...)
/home/runner/work/semantic-router/semantic-router/src/semantic-router/pkg/extproc/testing_helpers_test.go:16
github.com/vllm-project/semantic-router/src/semantic-router/pkg/extproc_test.init.func5.5.4.3()
/home/runner/work/semantic-router/semantic-router/src/semantic-router/pkg/extproc/request_processing_test.go:542 +0x256

Copy link
Contributor Author

Copilot AI commented Oct 3, 2025

@copilot the panick stack is:
• [PANICKED] [0.000 seconds]
Request Processing handleResponseBody with category-specific system prompt [It] should replace existing system prompt with category-speci...

Fixed in commit 8da86ab. The panic was caused by StartSpan being called with a nil context (when RequestContext.TraceContext was not initialized). Added a nil check to use context.Background() when ctx is nil, preventing the "cannot create context from nil parent" panic. Also added TestStartSpanWithNilContext test case to ensure proper handling.

Copilot finished work on behalf of rootfs October 3, 2025 15:25
Signed-off-by: Huamin Chen <[email protected]>
@rootfs rootfs marked this pull request as ready for review October 3, 2025 15:43
@rootfs rootfs merged commit ed22126 into main Oct 3, 2025
9 checks passed
@rootfs rootfs deleted the copilot/fix-1eb484f9-9a0f-4baa-82c1-63f02a7ebecf branch October 3, 2025 15:48
Aias00 pushed a commit to Aias00/semantic-router that referenced this pull request Oct 4, 2025
…llm-project#322)

* Initial plan

* Add OpenTelemetry tracing infrastructure and basic instrumentation

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing instrumentation and tests

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing documentation and deployment examples

Co-authored-by: rootfs <[email protected]>

* Update README and add feature summary documentation

Co-authored-by: rootfs <[email protected]>

* Fix broken documentation link in tracing quickstart guide

Co-authored-by: rootfs <[email protected]>

* Run go mod tidy to fix go.mod and go.sum

Co-authored-by: rootfs <[email protected]>

* Fix markdown lint errors and remove TRACING_FEATURE.md

Co-authored-by: rootfs <[email protected]>

* Fix OTLP exporter to connect asynchronously to prevent test panics

Co-authored-by: rootfs <[email protected]>

* Fix StartSpan to handle nil context gracefully

Co-authored-by: rootfs <[email protected]>

* fix lint error

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Signed-off-by: liuhy <[email protected]>
Aias00 pushed a commit to Aias00/semantic-router that referenced this pull request Oct 4, 2025
…llm-project#322)

* Initial plan

* Add OpenTelemetry tracing infrastructure and basic instrumentation

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing instrumentation and tests

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing documentation and deployment examples

Co-authored-by: rootfs <[email protected]>

* Update README and add feature summary documentation

Co-authored-by: rootfs <[email protected]>

* Fix broken documentation link in tracing quickstart guide

Co-authored-by: rootfs <[email protected]>

* Run go mod tidy to fix go.mod and go.sum

Co-authored-by: rootfs <[email protected]>

* Fix markdown lint errors and remove TRACING_FEATURE.md

Co-authored-by: rootfs <[email protected]>

* Fix OTLP exporter to connect asynchronously to prevent test panics

Co-authored-by: rootfs <[email protected]>

* Fix StartSpan to handle nil context gracefully

Co-authored-by: rootfs <[email protected]>

* fix lint error

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Signed-off-by: liuhy <[email protected]>
Aias00 pushed a commit to Aias00/semantic-router that referenced this pull request Oct 4, 2025
…llm-project#322)

* Initial plan

* Add OpenTelemetry tracing infrastructure and basic instrumentation

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing instrumentation and tests

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing documentation and deployment examples

Co-authored-by: rootfs <[email protected]>

* Update README and add feature summary documentation

Co-authored-by: rootfs <[email protected]>

* Fix broken documentation link in tracing quickstart guide

Co-authored-by: rootfs <[email protected]>

* Run go mod tidy to fix go.mod and go.sum

Co-authored-by: rootfs <[email protected]>

* Fix markdown lint errors and remove TRACING_FEATURE.md

Co-authored-by: rootfs <[email protected]>

* Fix OTLP exporter to connect asynchronously to prevent test panics

Co-authored-by: rootfs <[email protected]>

* Fix StartSpan to handle nil context gracefully

Co-authored-by: rootfs <[email protected]>

* fix lint error

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Signed-off-by: liuhy <[email protected]>
Aias00 pushed a commit to Aias00/semantic-router that referenced this pull request Oct 4, 2025
…llm-project#322)

* Initial plan

* Add OpenTelemetry tracing infrastructure and basic instrumentation

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing instrumentation and tests

Co-authored-by: rootfs <[email protected]>

* Add comprehensive tracing documentation and deployment examples

Co-authored-by: rootfs <[email protected]>

* Update README and add feature summary documentation

Co-authored-by: rootfs <[email protected]>

* Fix broken documentation link in tracing quickstart guide

Co-authored-by: rootfs <[email protected]>

* Run go mod tidy to fix go.mod and go.sum

Co-authored-by: rootfs <[email protected]>

* Fix markdown lint errors and remove TRACING_FEATURE.md

Co-authored-by: rootfs <[email protected]>

* Fix OTLP exporter to connect asynchronously to prevent test panics

Co-authored-by: rootfs <[email protected]>

* Fix StartSpan to handle nil context gracefully

Co-authored-by: rootfs <[email protected]>

* fix lint error

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Signed-off-by: liuhy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distributed Tracing Support for Fine-Grained Observability

4 participants