Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 3, 2025

Overview

This PR adds comprehensive OpenTelemetry (OTEL) distributed tracing integration examples to demonstrate end-to-end observability from client applications through the semantic router to vLLM backends. This addresses the requirement for practical examples showing how to implement distributed tracing across the entire LLM inference pipeline.

What's Added

Complete Python Example (examples/distributed-tracing/)

A fully functional example demonstrating:

Auto-instrumentation of OpenAI Python Client:

from openai import OpenAI
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Auto-instrument for automatic trace header injection
RequestsInstrumentor().instrument()
OpenAIInstrumentor().instrument()

# Point client to semantic router
client = OpenAI(base_url="http://semantic-router:8000/v1")

# Make request - trace context flows automatically
response = client.chat.completions.create(
    model="auto",  # Triggers semantic routing
    messages=[{"role": "user", "content": "What is quantum computing?"}]
)

Three practical scenarios:

  1. Auto-routing with model selection
  2. Math/reasoning queries
  3. Streaming responses

Docker Compose Stack

Complete tracing stack with:

  • Jaeger all-in-one for trace collection and visualization
  • Semantic Router with tracing enabled
  • Optional vLLM backend template (commented out for flexibility)

Configuration Files

  • router-config.yaml: Example router configuration with OTLP exporter
  • requirements.txt: All necessary Python dependencies
  • README.md: Comprehensive 339-line guide with setup, troubleshooting, and production recommendations

Documentation Updates

Enhanced website/docs/tutorials/observability/distributed-tracing.md with:

  • End-to-end tracing example section
  • Trace context flow diagrams showing how traceparent headers propagate
  • Benefits breakdown for different personas (Developers, Operations, Product Teams)
  • Links to working examples in the repository

Trace Context Flow

Client Application
    ↓ (HTTP headers: traceparent, tracestate)
Semantic Router ExtProc
    ↓ (Extract trace context from headers)
Processing Spans
    ├─ semantic_router.classification (45ms) [category=science]
    ├─ semantic_router.cache.lookup (3ms) [cache_miss=true]
    ├─ semantic_router.security.pii_detection (2ms) [pii_detected=false]
    └─ semantic_router.routing.decision (23ms) [selected=llama-3.1-70b]
    ↓ (Inject trace context into upstream headers)
vLLM Backend Request
    ↓ (HTTP headers: traceparent, tracestate)
vLLM Processing (if OTEL-enabled)
    └─ vllm.generate (2.0s) [tokens=156]
    ↓
OTLP Collector / Jaeger

Quick Start

cd examples/distributed-tracing
docker-compose up -d
pip install -r requirements.txt
python openai_client_tracing.py
# Open http://localhost:16686 to view traces in Jaeger UI

Benefits

For Developers:

  • End-to-end visibility from application to vLLM with complete request traces
  • Performance debugging with detailed timing breakdowns for each operation
  • Error correlation across service boundaries with distributed context

For Operations:

  • SLA monitoring with distributed latency tracking across the stack
  • Capacity planning based on actual usage patterns and routing decisions
  • Incident response with complete request traces for root cause analysis

For Product Teams:

  • User experience insights with real performance data
  • A/B testing of routing strategies with trace correlation
  • Quality metrics tied to specific routing decisions and model selections

Validation

  • ✅ Python syntax validated with py_compile
  • ✅ YAML files validated with yamllint
  • ✅ Docker Compose configuration tested
  • ✅ Documentation formatting verified

Files Changed

  • examples/distributed-tracing/openai_client_tracing.py (NEW)
  • examples/distributed-tracing/requirements.txt (NEW)
  • examples/distributed-tracing/docker-compose.yml (NEW)
  • examples/distributed-tracing/router-config.yaml (NEW)
  • examples/distributed-tracing/README.md (NEW)
  • website/docs/tutorials/observability/distributed-tracing.md (UPDATED)

Total: 6 files changed, 815 insertions(+), 6 deletions(-)

Closes #[issue_number]

Original prompt

This section details on the original issue you should resolve

<issue_title>Add OpenTelemetry (OTEL) distributed tracing integration</issue_title>
<issue_description>## Requirement

Add OpenTelemetry (OTEL) distributed tracing integration example to illustrate end-to-end observability from client applications through the router to vLLM backends. This will provide comprehensive visibility into request flows, routing decisions, performance bottlenecks, and error propagation across the entire LLM inference pipeline.

Motivation

Currently, semantic-router lacks distributed tracing capabilities, making it difficult to:

  • Debug performance issues across the application → semantic-router → vLLM chain
  • Monitor routing decisions and their impact on latency/quality
  • Correlate errors between different components in the stack
  • Optimize model selection based on end-to-end performance data
  • Track cache hit/miss patterns in relation to overall request performance
  • Measure Time-to-First-Token (TTFT) and completion latencies in context

OpenAI Python Client

from openai import OpenAI
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Auto-instrument for automatic trace header injection
RequestsInstrumentor().instrument()
OpenAIInstrumentor().instrument()

client = OpenAI(base_url="http://semantic-router:8000")
response = client.chat.completions.create(
    model="auto",  # Triggers semantic routing
    messages=[{"role": "user", "content": "What is quantum computing?"}]
)

Trace Context Flow

Application Request
    ↓ (HTTP headers: traceparent, tracestate)
Semantic Router ExtProc
    ↓ (Extract trace context)
Processing Spans (classification, routing, etc.)
    ↓ (Inject trace context)
vLLM Backend Request
    ↓ (HTTP headers: traceparent, tracestate)
vLLM Processing (if OTEL-enabled)
    ↓
OTLP Collector / Jaeger

Persona

For Developers

  • End-to-end visibility from application to vLLM
  • Performance debugging with detailed timing breakdowns
  • Error correlation across service boundaries
  • Routing decision analysis with context

For Operations

  • SLA monitoring with distributed latency tracking
  • Capacity planning based on actual usage patterns
  • Incident response with complete request traces
  • Cost optimization through routing efficiency analysis

For Product Teams

  • User experience insights with real performance data
  • A/B testing of routing strategies with trace correlation
  • Quality metrics tied to specific routing decisions

Example Trace Visualization

Trace: user-query-quantum-computing (2.3s total)
├── app.chat_completion (2.3s)
│   └── HTTP POST /v1/chat/completions (2.2s)
│       ├── extproc.process_request (45ms)
│       │   ├── extproc.handle_request_headers (2ms)
│       │   └── extproc.handle_request_body (43ms)
│       │       ├── classification.classify_intent (15ms) [category=science]
│       │       ├── cache.lookup (3ms) [cache_miss=true]
│       │       ├── security.check_pii (2ms) [pii_detected=false]
│       │       └── routing.select_model (23ms) [selected=llama-3.1-70b]
│       └── vllm.chat_completion (2.1s)
│           ├── vllm.process_request (50ms)
│           ├── vllm.generate_tokens (2.0s) [tokens=156]
│           └── vllm.format_response (5ms)

</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #328

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

@Copilot Copilot AI assigned Copilot and rootfs Oct 3, 2025
Copy link

netlify bot commented Oct 3, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 11535b4
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68e01157e28e440008ebf016
😎 Deploy Preview https://deploy-preview-329--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@Copilot Copilot AI changed the title [WIP] Add OpenTelemetry (OTEL) distributed tracing integration Add OpenTelemetry distributed tracing integration examples with OpenAI client Oct 3, 2025
Copilot finished work on behalf of rootfs October 3, 2025 18:13
@Copilot Copilot AI requested a review from rootfs October 3, 2025 18:13
@rootfs
Copy link
Collaborator

rootfs commented Oct 3, 2025

@Xunzhuo @JaredforReal @yuluo-yx can you review it?

@rootfs rootfs requested a review from Copilot October 3, 2025 19:15
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive OpenTelemetry (OTEL) distributed tracing integration examples to demonstrate end-to-end observability from client applications through the semantic router to vLLM backends. The implementation provides practical, working examples showing how to implement distributed tracing across the entire LLM inference pipeline.

  • Complete Python example with auto-instrumentation of OpenAI client and automatic trace context propagation
  • Full Docker Compose stack with Jaeger for trace collection and visualization
  • Enhanced documentation with detailed setup, troubleshooting, and production deployment guidance

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
website/docs/tutorials/observability/distributed-tracing.md Enhanced documentation with end-to-end tracing examples and trace flow diagrams
examples/distributed-tracing/router-config.yaml Example router configuration with OTLP exporter and sampling settings
examples/distributed-tracing/requirements.txt Python dependencies for OpenTelemetry instrumentation
examples/distributed-tracing/openai_client_tracing.py Complete Python example demonstrating auto-instrumentation and trace propagation
examples/distributed-tracing/docker-compose.yml Docker Compose stack with Jaeger and semantic router
examples/distributed-tracing/README.md Comprehensive 339-line guide with setup, troubleshooting, and production recommendations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

github-actions bot commented Oct 3, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • examples/distributed-tracing/README.md
  • examples/distributed-tracing/docker-compose.yml
  • examples/distributed-tracing/openai_client_tracing.py
  • examples/distributed-tracing/requirements.txt
  • examples/distributed-tracing/router-config.yaml

📁 website

Owners: @Xunzhuo
Files changed:

  • website/docs/tutorials/observability/distributed-tracing.md

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OpenTelemetry (OTEL) distributed tracing integration
3 participants