Add OpenTelemetry (OTEL) distributed tracing integration

## Requirement

Add OpenTelemetry (OTEL) distributed tracing integration example to illustrate end-to-end observability from client applications through the router to vLLM backends. This will provide comprehensive visibility into request flows, routing decisions, performance bottlenecks, and error propagation across the entire LLM inference pipeline.

## Motivation

Currently, semantic-router lacks distributed tracing capabilities, making it difficult to:

- **Debug performance issues** across the application → semantic-router → vLLM chain
- **Monitor routing decisions** and their impact on latency/quality
- **Correlate errors** between different components in the stack
- **Optimize model selection** based on end-to-end performance data
- **Track cache hit/miss patterns** in relation to overall request performance
- **Measure Time-to-First-Token (TTFT)** and completion latencies in context

### OpenAI Python Client
```python
from openai import OpenAI
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Auto-instrument for automatic trace header injection
RequestsInstrumentor().instrument()
OpenAIInstrumentor().instrument()

client = OpenAI(base_url="http://semantic-router:8000")
response = client.chat.completions.create(
    model="auto",  # Triggers semantic routing
    messages=[{"role": "user", "content": "What is quantum computing?"}]
)
```

### Trace Context Flow

```
Application Request
    ↓ (HTTP headers: traceparent, tracestate)
Semantic Router ExtProc
    ↓ (Extract trace context)
Processing Spans (classification, routing, etc.)
    ↓ (Inject trace context)
vLLM Backend Request
    ↓ (HTTP headers: traceparent, tracestate)
vLLM Processing (if OTEL-enabled)
    ↓
OTLP Collector / Jaeger
```

## Persona

### For Developers
- **End-to-end visibility** from application to vLLM
- **Performance debugging** with detailed timing breakdowns
- **Error correlation** across service boundaries
- **Routing decision analysis** with context

### For Operations
- **SLA monitoring** with distributed latency tracking
- **Capacity planning** based on actual usage patterns
- **Incident response** with complete request traces
- **Cost optimization** through routing efficiency analysis

### For Product Teams
- **User experience insights** with real performance data
- **A/B testing** of routing strategies with trace correlation
- **Quality metrics** tied to specific routing decisions

## Example Trace Visualization

```
Trace: user-query-quantum-computing (2.3s total)
├── app.chat_completion (2.3s)
│   └── HTTP POST /v1/chat/completions (2.2s)
│       ├── extproc.process_request (45ms)
│       │   ├── extproc.handle_request_headers (2ms)
│       │   └── extproc.handle_request_body (43ms)
│       │       ├── classification.classify_intent (15ms) [category=science]
│       │       ├── cache.lookup (3ms) [cache_miss=true]
│       │       ├── security.check_pii (2ms) [pii_detected=false]
│       │       └── routing.select_model (23ms) [selected=llama-3.1-70b]
│       └── vllm.chat_completion (2.1s)
│           ├── vllm.process_request (50ms)
│           ├── vllm.generate_tokens (2.0s) [tokens=156]
│           └── vllm.format_response (5ms)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenTelemetry (OTEL) distributed tracing integration #328

Requirement

Motivation

OpenAI Python Client

Trace Context Flow

Persona

For Developers

For Operations

For Product Teams

Example Trace Visualization

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add OpenTelemetry (OTEL) distributed tracing integration #328

Description

Requirement

Motivation

OpenAI Python Client

Trace Context Flow

Persona

For Developers

For Operations

For Product Teams

Example Trace Visualization

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions