Add OpenTelemetry distributed tracing integration examples with OpenAI client #329

Copilot · 2025-10-03T17:58:48Z

Overview

This PR adds comprehensive OpenTelemetry (OTEL) distributed tracing integration examples to demonstrate end-to-end observability from client applications through the semantic router to vLLM backends. This addresses the requirement for practical examples showing how to implement distributed tracing across the entire LLM inference pipeline.

What's Added

Complete Python Example (`examples/distributed-tracing/`)

A fully functional example demonstrating:

Auto-instrumentation of OpenAI Python Client:

from openai import OpenAI
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Auto-instrument for automatic trace header injection
RequestsInstrumentor().instrument()
OpenAIInstrumentor().instrument()

# Point client to semantic router
client = OpenAI(base_url="http://semantic-router:8000/v1")

# Make request - trace context flows automatically
response = client.chat.completions.create(
    model="auto",  # Triggers semantic routing
    messages=[{"role": "user", "content": "What is quantum computing?"}]
)

Three practical scenarios:

Auto-routing with model selection
Math/reasoning queries
Streaming responses

Docker Compose Stack

Complete tracing stack with:

Jaeger all-in-one for trace collection and visualization
Semantic Router with tracing enabled
Optional vLLM backend template (commented out for flexibility)

Configuration Files

router-config.yaml: Example router configuration with OTLP exporter
requirements.txt: All necessary Python dependencies
README.md: Comprehensive 339-line guide with setup, troubleshooting, and production recommendations

Documentation Updates

Enhanced website/docs/tutorials/observability/distributed-tracing.md with:

End-to-end tracing example section
Trace context flow diagrams showing how traceparent headers propagate
Benefits breakdown for different personas (Developers, Operations, Product Teams)
Links to working examples in the repository

Trace Context Flow

Client Application
    ↓ (HTTP headers: traceparent, tracestate)
Semantic Router ExtProc
    ↓ (Extract trace context from headers)
Processing Spans
    ├─ semantic_router.classification (45ms) [category=science]
    ├─ semantic_router.cache.lookup (3ms) [cache_miss=true]
    ├─ semantic_router.security.pii_detection (2ms) [pii_detected=false]
    └─ semantic_router.routing.decision (23ms) [selected=llama-3.1-70b]
    ↓ (Inject trace context into upstream headers)
vLLM Backend Request
    ↓ (HTTP headers: traceparent, tracestate)
vLLM Processing (if OTEL-enabled)
    └─ vllm.generate (2.0s) [tokens=156]
    ↓
OTLP Collector / Jaeger

Quick Start

cd examples/distributed-tracing
docker-compose up -d
pip install -r requirements.txt
python openai_client_tracing.py
# Open http://localhost:16686 to view traces in Jaeger UI

Benefits

For Developers:

End-to-end visibility from application to vLLM with complete request traces
Performance debugging with detailed timing breakdowns for each operation
Error correlation across service boundaries with distributed context

For Operations:

SLA monitoring with distributed latency tracking across the stack
Capacity planning based on actual usage patterns and routing decisions
Incident response with complete request traces for root cause analysis

For Product Teams:

User experience insights with real performance data
A/B testing of routing strategies with trace correlation
Quality metrics tied to specific routing decisions and model selections

Validation

✅ Python syntax validated with py_compile
✅ YAML files validated with yamllint
✅ Docker Compose configuration tested
✅ Documentation formatting verified

Files Changed

examples/distributed-tracing/openai_client_tracing.py (NEW)
examples/distributed-tracing/requirements.txt (NEW)
examples/distributed-tracing/docker-compose.yml (NEW)
examples/distributed-tracing/router-config.yaml (NEW)
examples/distributed-tracing/README.md (NEW)
website/docs/tutorials/observability/distributed-tracing.md (UPDATED)

Total: 6 files changed, 815 insertions(+), 6 deletions(-)

Closes #[issue_number]

Original prompt

This section details on the original issue you should resolve

<issue_title>Add OpenTelemetry (OTEL) distributed tracing integration</issue_title>
<issue_description>## Requirement

Add OpenTelemetry (OTEL) distributed tracing integration example to illustrate end-to-end observability from client applications through the router to vLLM backends. This will provide comprehensive visibility into request flows, routing decisions, performance bottlenecks, and error propagation across the entire LLM inference pipeline.

Motivation

Currently, semantic-router lacks distributed tracing capabilities, making it difficult to:

Debug performance issues across the application → semantic-router → vLLM chain

Monitor routing decisions and their impact on latency/quality

Correlate errors between different components in the stack

Optimize model selection based on end-to-end performance data

Track cache hit/miss patterns in relation to overall request performance

Measure Time-to-First-Token (TTFT) and completion latencies in context

OpenAI Python Client
from openai import OpenAI
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Auto-instrument for automatic trace header injection
RequestsInstrumentor().instrument()
OpenAIInstrumentor().instrument()

client = OpenAI(base_url="http://semantic-router:8000")
response = client.chat.completions.create(
    model="auto",  # Triggers semantic routing
    messages=[{"role": "user", "content": "What is quantum computing?"}]
)
Trace Context Flow
Application Request
    ↓ (HTTP headers: traceparent, tracestate)
Semantic Router ExtProc
    ↓ (Extract trace context)
Processing Spans (classification, routing, etc.)
    ↓ (Inject trace context)
vLLM Backend Request
    ↓ (HTTP headers: traceparent, tracestate)
vLLM Processing (if OTEL-enabled)
    ↓
OTLP Collector / Jaeger
Persona

For Developers

End-to-end visibility from application to vLLM

Performance debugging with detailed timing breakdowns

Error correlation across service boundaries

Routing decision analysis with context

For Operations

SLA monitoring with distributed latency tracking

Capacity planning based on actual usage patterns

Incident response with complete request traces

Cost optimization through routing efficiency analysis

For Product Teams

User experience insights with real performance data

A/B testing of routing strategies with trace correlation

Quality metrics tied to specific routing decisions

Example Trace Visualization
Trace: user-query-quantum-computing (2.3s total)
├── app.chat_completion (2.3s)
│   └── HTTP POST /v1/chat/completions (2.2s)
│       ├── extproc.process_request (45ms)
│       │   ├── extproc.handle_request_headers (2ms)
│       │   └── extproc.handle_request_body (43ms)
│       │       ├── classification.classify_intent (15ms) [category=science]
│       │       ├── cache.lookup (3ms) [cache_miss=true]
│       │       ├── security.check_pii (2ms) [pii_detected=false]
│       │       └── routing.select_model (23ms) [selected=llama-3.1-70b]
│       └── vllm.chat_completion (2.1s)
│           ├── vllm.process_request (50ms)
│           ├── vllm.generate_tokens (2.0s) [tokens=156]
│           └── vllm.format_response (5ms)
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #328

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

netlify · 2025-10-03T17:58:55Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`11535b4`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e01157e28e440008ebf016
😎 Deploy Preview	https://deploy-preview-329--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Co-authored-by: rootfs <[email protected]>

rootfs · 2025-10-03T19:09:35Z

@Xunzhuo @JaredforReal @yuluo-yx can you review it?

Copilot

Pull Request Overview

This PR adds comprehensive OpenTelemetry (OTEL) distributed tracing integration examples to demonstrate end-to-end observability from client applications through the semantic router to vLLM backends. The implementation provides practical, working examples showing how to implement distributed tracing across the entire LLM inference pipeline.

Complete Python example with auto-instrumentation of OpenAI client and automatic trace context propagation
Full Docker Compose stack with Jaeger for trace collection and visualization
Enhanced documentation with detailed setup, troubleshooting, and production deployment guidance

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`website/docs/tutorials/observability/distributed-tracing.md`	Enhanced documentation with end-to-end tracing examples and trace flow diagrams
`examples/distributed-tracing/router-config.yaml`	Example router configuration with OTLP exporter and sampling settings
`examples/distributed-tracing/requirements.txt`	Python dependencies for OpenTelemetry instrumentation
`examples/distributed-tracing/openai_client_tracing.py`	Complete Python example demonstrating auto-instrumentation and trace propagation
`examples/distributed-tracing/docker-compose.yml`	Docker Compose stack with Jaeger and semantic router
`examples/distributed-tracing/README.md`	Comprehensive 339-line guide with setup, troubleshooting, and production recommendations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

github-actions · 2025-10-03T19:16:33Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

examples/distributed-tracing/README.md
examples/distributed-tracing/docker-compose.yml
examples/distributed-tracing/openai_client_tracing.py
examples/distributed-tracing/requirements.txt
examples/distributed-tracing/router-config.yaml

📁 `website`

Owners: @Xunzhuo
Files changed:

website/docs/tutorials/observability/distributed-tracing.md

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Initial plan

c8eba52

Copilot AI assigned Copilot and rootfs Oct 3, 2025

Copilot started work on behalf of rootfs October 3, 2025 17:59 View session

Add OpenTelemetry distributed tracing integration examples

11535b4

Co-authored-by: rootfs <[email protected]>

Copilot AI changed the title ~~[WIP] Add OpenTelemetry (OTEL) distributed tracing integration~~ Add OpenTelemetry distributed tracing integration examples with OpenAI client Oct 3, 2025

Copilot finished work on behalf of rootfs October 3, 2025 18:13

Copilot AI requested a review from rootfs October 3, 2025 18:13

rootfs requested a review from Copilot October 3, 2025 19:15

Copilot AI reviewed Oct 3, 2025

View reviewed changes

github-actions bot assigned Xunzhuo Oct 3, 2025

This was referenced Oct 14, 2025

Docs: Add integration proposal for PS and SR #418

Merged

feat: add tracing to docker compose #434

Merged

github-actions bot deleted a comment Nov 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenTelemetry distributed tracing integration examples with OpenAI client #329

Add OpenTelemetry distributed tracing integration examples with OpenAI client #329

Copilot AI commented Oct 3, 2025 •

edited

Loading

Uh oh!

netlify bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

rootfs commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add OpenTelemetry distributed tracing integration examples with OpenAI client #329

Are you sure you want to change the base?

Add OpenTelemetry distributed tracing integration examples with OpenAI client #329

Conversation

Copilot AI commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What's Added

Complete Python Example (examples/distributed-tracing/)

Docker Compose Stack

Configuration Files

Documentation Updates

Trace Context Flow

Quick Start

Benefits

Validation

Files Changed

Motivation

OpenAI Python Client

Trace Context Flow

Persona

For Developers

For Operations

For Product Teams

Example Trace Visualization

Comments on the Issue (you are @copilot in this section)

Uh oh!

netlify bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

rootfs commented Oct 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

github-actions bot commented Oct 3, 2025

👥 vLLM Semantic Team Notification

📁 Root Directory

📁 website

🎉 Thanks for your contributions!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Oct 3, 2025 •

edited

Loading

Complete Python Example (`examples/distributed-tracing/`)

netlify bot commented Oct 3, 2025 •

edited

Loading

📁 `Root Directory`

📁 `website`