From c8eba526508734aab23021e3037162a4a2c752ff Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Fri, 3 Oct 2025 17:58:47 +0000
Subject: [PATCH 1/2] Initial plan


From 11535b4deedd5bf3b70689bf42ab285622d7a42a Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Fri, 3 Oct 2025 18:09:23 +0000
Subject: [PATCH 2/2] Add OpenTelemetry distributed tracing integration
 examples

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
---
 examples/distributed-tracing/README.md        | 339 ++++++++++++++++++
 .../distributed-tracing/docker-compose.yml    |  66 ++++
 .../openai_client_tracing.py                  | 202 +++++++++++
 examples/distributed-tracing/requirements.txt |  19 +
 .../distributed-tracing/router-config.yaml    |  74 ++++
 .../observability/distributed-tracing.md      | 121 ++++++-
 6 files changed, 815 insertions(+), 6 deletions(-)
 create mode 100644 examples/distributed-tracing/README.md
 create mode 100644 examples/distributed-tracing/docker-compose.yml
 create mode 100644 examples/distributed-tracing/openai_client_tracing.py
 create mode 100644 examples/distributed-tracing/requirements.txt
 create mode 100644 examples/distributed-tracing/router-config.yaml

diff --git a/examples/distributed-tracing/README.md b/examples/distributed-tracing/README.md
new file mode 100644
index 00000000..4b32736a
--- /dev/null
+++ b/examples/distributed-tracing/README.md
@@ -0,0 +1,339 @@
+# OpenTelemetry Distributed Tracing Examples
+
+This directory contains examples demonstrating end-to-end distributed tracing with vLLM Semantic Router using OpenTelemetry.
+
+## Overview
+
+These examples show how to:
+
+- **Auto-instrument OpenAI Python client** for automatic trace creation
+- **Propagate trace context** from client → router → vLLM backends
+- **Visualize request flows** in Jaeger UI
+- **Debug performance issues** with detailed span timings
+- **Correlate errors** across service boundaries
+
+## Architecture
+
+```
+┌─────────────────┐     traceparent     ┌──────────────────┐     traceparent     ┌────────────────┐
+│  OpenAI Client  │ ──────────────────> │ Semantic Router  │ ──────────────────> │  vLLM Backend  │
+│  (Python App)   │     HTTP Headers    │   (ExtProc)      │    HTTP Headers     │   (Optional)   │
+└─────────────────┘                     └──────────────────┘                     └────────────────┘
+        │                                        │                                        │
+        │                                        │                                        │
+        └────────────────────────────────────────┴────────────────────────────────────────┘
+                                                  │
+                                                  ▼
+                                        ┌──────────────────┐
+                                        │  Jaeger/Tempo    │
+                                        │  (OTLP Collector)│
+                                        └──────────────────┘
+```
+
+## Files
+
+- **`openai_client_tracing.py`** - Python example with OpenAI client auto-instrumentation
+- **`docker-compose.yml`** - Complete tracing stack with Jaeger
+- **`router-config.yaml`** - Router configuration with tracing enabled
+- **`requirements.txt`** - Python dependencies for the example
+- **`README.md`** - This file
+
+## Quick Start
+
+### Prerequisites
+
+1. **Docker and Docker Compose** installed
+2. **Python 3.8+** installed
+3. **Semantic Router image** built (or use from registry)
+
+### Step 1: Start the Tracing Stack
+
+Start Jaeger and the semantic router with tracing enabled:
+
+```bash
+# From this directory
+docker-compose up -d
+```
+
+This starts:
+- **Jaeger** on ports 16686 (UI) and 4317 (OTLP gRPC)
+- **Semantic Router** on port 8000 with tracing enabled
+
+Verify services are running:
+
+```bash
+docker-compose ps
+```
+
+### Step 2: Install Python Dependencies
+
+Install the required Python packages:
+
+```bash
+pip install -r requirements.txt
+```
+
+### Step 3: Run the Example
+
+Run the Python example that makes requests to the router:
+
+```bash
+python openai_client_tracing.py
+```
+
+The example will:
+1. Initialize OpenTelemetry tracing
+2. Auto-instrument the OpenAI client
+3. Make several example requests with different query types
+4. Send traces to Jaeger
+
+### Step 4: View Traces in Jaeger
+
+Open the Jaeger UI in your browser:
+
+```
+http://localhost:16686
+```
+
+1. Select **Service**: `openai-client-example`
+2. Click **Find Traces**
+3. Click on a trace to see the detailed timeline
+
+You should see traces with spans from:
+- `openai-client-example` (the Python client)
+- `vllm-semantic-router` (the router processing)
+- Individual operations (classification, routing, etc.)
+
+## Example Trace Visualization
+
+A typical trace will show:
+
+```
+Trace: example_1_auto_routing (2.3s total)
+├── openai.chat.completions.create (2.3s)
+│   └── HTTP POST /v1/chat/completions (2.2s)
+│       ├── semantic_router.request.received (2.2s)
+│       │   ├── semantic_router.classification (45ms) [category=science]
+│       │   ├── semantic_router.cache.lookup (3ms) [cache_miss=true]
+│       │   ├── semantic_router.routing.decision (23ms) [selected=llama-3.1-70b]
+│       │   └── semantic_router.upstream.request (2.1s)
+│       │       └── vllm.generate (2.0s) [tokens=156]
+```
+
+## Configuration Options
+
+### Environment Variables
+
+The example supports these environment variables:
+
+- **`SEMANTIC_ROUTER_URL`** - Router URL (default: `http://localhost:8000`)
+- **`OTLP_ENDPOINT`** - OTLP collector endpoint (default: `http://localhost:4317`)
+- **`OPENAI_API_KEY`** - API key (default: `dummy-key-for-local-testing`)
+
+Example with custom configuration:
+
+```bash
+export SEMANTIC_ROUTER_URL="http://my-router:8000"
+export OTLP_ENDPOINT="http://tempo:4317"
+python openai_client_tracing.py
+```
+
+### Router Configuration
+
+Edit `router-config.yaml` to customize tracing behavior:
+
+```yaml
+observability:
+  tracing:
+    enabled: true
+    exporter:
+      type: "otlp"
+      endpoint: "jaeger:4317"
+    sampling:
+      type: "always_on"  # or "probabilistic"
+      rate: 1.0          # sample rate for probabilistic
+```
+
+**Sampling strategies:**
+
+- **`always_on`** - Sample 100% of requests (development/debugging)
+- **`always_off`** - Disable sampling (emergency)
+- **`probabilistic`** - Sample a percentage (production)
+
+**Production recommendation:**
+```yaml
+sampling:
+  type: "probabilistic"
+  rate: 0.1  # Sample 10% of requests
+```
+
+## Advanced Usage
+
+### Custom Spans
+
+Add custom spans to track specific operations:
+
+```python
+from opentelemetry import trace
+
+tracer = trace.get_tracer(__name__)
+
+with tracer.start_as_current_span("my_operation") as span:
+    span.set_attribute("custom.attribute", "value")
+    # Your code here
+```
+
+### Multiple Backends
+
+Configure the router to route to different vLLM backends and see how traces flow:
+
+```yaml
+models:
+  - name: "llama-3.1-8b"
+    category: "general"
+    endpoints:
+      - name: "backend-1"
+        address: "http://vllm-1:8000"
+        
+  - name: "llama-3.1-70b"
+    category: "reasoning"
+    endpoints:
+      - name: "backend-2"
+        address: "http://vllm-2:8000"
+```
+
+### Alternative Tracing Backends
+
+#### Grafana Tempo
+
+Replace Jaeger with Tempo in `docker-compose.yml`:
+
+```yaml
+services:
+  tempo:
+    image: grafana/tempo:latest
+    ports:
+      - "4317:4317"
+      - "3200:3200"
+    command: ["-config.file=/etc/tempo.yaml"]
+```
+
+Update router config:
+
+```yaml
+exporter:
+  endpoint: "tempo:4317"
+```
+
+#### Datadog
+
+Use Datadog OTLP endpoint:
+
+```yaml
+exporter:
+  type: "otlp"
+  endpoint: "https://otlp.datadoghq.com"
+  insecure: false
+```
+
+## Troubleshooting
+
+### Traces Not Appearing
+
+1. **Check services are running:**
+   ```bash
+   docker-compose ps
+   ```
+
+2. **Check router logs for tracing initialization:**
+   ```bash
+   docker-compose logs semantic-router | grep -i tracing
+   ```
+
+3. **Verify OTLP endpoint connectivity:**
+   ```bash
+   telnet localhost 4317
+   ```
+
+4. **Check sampling rate:**
+   - Ensure `always_on` for development
+   - Increase `rate` if using probabilistic sampling
+
+### Connection Refused Errors
+
+If the Python example can't connect to the router:
+
+```bash
+# Check router is accessible
+curl http://localhost:8000/health
+
+# Check Docker network
+docker network inspect distributed-tracing_tracing-network
+```
+
+### High Memory Usage
+
+If you see high memory usage:
+
+1. **Reduce sampling rate:**
+   ```yaml
+   sampling:
+     type: "probabilistic"
+     rate: 0.01  # 1% sampling
+   ```
+
+2. **Check batch export settings** in the application code
+
+## Best Practices
+
+1. **Start with stdout exporter** to verify tracing works before using OTLP
+2. **Use probabilistic sampling** in production (10% is a good starting point)
+3. **Add meaningful attributes** to spans for debugging
+4. **Monitor exporter health** and track export failures
+5. **Correlate traces with logs** using the same service name
+6. **Set appropriate timeout values** for span export
+
+## Production Deployment
+
+For production, consider:
+
+1. **Use TLS** for OTLP endpoint:
+   ```yaml
+   exporter:
+     endpoint: "otlp-collector.prod.svc:4317"
+     insecure: false
+   ```
+
+2. **Tune sampling rate** based on traffic:
+   - High traffic: 0.01-0.1 (1-10%)
+   - Medium traffic: 0.1-0.5 (10-50%)
+   - Low traffic: 0.5-1.0 (50-100%)
+
+3. **Use a dedicated OTLP collector** (not Jaeger directly)
+
+4. **Set resource limits** in Kubernetes:
+   ```yaml
+   resources:
+     limits:
+       memory: "2Gi"
+       cpu: "1000m"
+   ```
+
+## Additional Resources
+
+- [OpenTelemetry Python Documentation](https://opentelemetry.io/docs/instrumentation/python/)
+- [OpenAI Instrumentation](https://github.com/traceloop/openllmetry)
+- [Jaeger Documentation](https://www.jaegertracing.io/docs/)
+- [vLLM Semantic Router Tracing Guide](../../website/docs/tutorials/observability/distributed-tracing.md)
+- [W3C Trace Context](https://www.w3.org/TR/trace-context/)
+
+## Support
+
+For questions or issues:
+- Create an issue in the repository
+- Join the `#semantic-router` channel in vLLM Slack
+- Check the [troubleshooting guide](../../website/docs/troubleshooting/)
+
+## License
+
+Apache 2.0 - See LICENSE file in the repository root.
diff --git a/examples/distributed-tracing/docker-compose.yml b/examples/distributed-tracing/docker-compose.yml
new file mode 100644
index 00000000..e37760a2
--- /dev/null
+++ b/examples/distributed-tracing/docker-compose.yml
@@ -0,0 +1,66 @@
+version: '3.8'
+
+services:
+  # Jaeger all-in-one for tracing
+  jaeger:
+    image: jaegertracing/all-in-one:latest
+    container_name: jaeger-tracing
+    ports:
+      # Jaeger UI
+      - "16686:16686"
+      # OTLP gRPC receiver
+      - "4317:4317"
+      # OTLP HTTP receiver
+      - "4318:4318"
+      # Jaeger Thrift HTTP
+      - "14268:14268"
+      # Jaeger Thrift compact
+      - "6831:6831/udp"
+    environment:
+      - COLLECTOR_OTLP_ENABLED=true
+      - LOG_LEVEL=debug
+    networks:
+      - tracing-network
+
+  # Semantic Router with tracing enabled
+  semantic-router:
+    image: vllm-semantic-router:latest
+    container_name: semantic-router
+    ports:
+      - "8000:8000"
+    volumes:
+      - ./router-config.yaml:/config/config.yaml
+    environment:
+      - CONFIG_PATH=/config/config.yaml
+    networks:
+      - tracing-network
+    depends_on:
+      - jaeger
+
+  # Example vLLM backend (optional - requires vLLM image)
+  # Uncomment if you have a vLLM backend to test with
+  # vllm-backend:
+  #   image: vllm/vllm-openai:latest
+  #   container_name: vllm-backend
+  #   ports:
+  #     - "8001:8000"
+  #   command:
+  #     - --model
+  #     - facebook/opt-125m
+  #     - --max-model-len
+  #     - "2048"
+  #   environment:
+  #     - CUDA_VISIBLE_DEVICES=0
+  #   networks:
+  #     - tracing-network
+  #   deploy:
+  #     resources:
+  #       reservations:
+  #         devices:
+  #           - driver: nvidia
+  #             count: 1
+  #             capabilities: [gpu]
+
+networks:
+  tracing-network:
+    driver: bridge
diff --git a/examples/distributed-tracing/openai_client_tracing.py b/examples/distributed-tracing/openai_client_tracing.py
new file mode 100644
index 00000000..087345ef
--- /dev/null
+++ b/examples/distributed-tracing/openai_client_tracing.py
@@ -0,0 +1,202 @@
+#!/usr/bin/env python3
+"""
+OpenTelemetry Distributed Tracing with OpenAI Client and Semantic Router
+
+This example demonstrates end-to-end distributed tracing from a client application
+through the semantic router to vLLM backends using OpenTelemetry.
+
+The example shows:
+1. Auto-instrumentation of OpenAI Python client
+2. Auto-instrumentation of HTTP requests library
+3. Trace context propagation across service boundaries
+4. Span attributes for debugging and analysis
+
+Prerequisites:
+- Semantic Router running with tracing enabled
+- Jaeger or another OTLP collector running
+- Python packages: openai, opentelemetry-* (see requirements.txt)
+
+Signed-off-by: GitHub Copilot <noreply@github.com>
+"""
+
+import os
+import sys
+from typing import Optional
+
+from openai import OpenAI
+from opentelemetry import trace
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+from opentelemetry.instrumentation.requests import RequestsInstrumentor
+from opentelemetry.sdk.resources import SERVICE_NAME, Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+
+
+def setup_tracing(
+    service_name: str = "openai-client-example",
+    otlp_endpoint: str = "http://localhost:4317",
+) -> None:
+    """
+    Initialize OpenTelemetry tracing with OTLP exporter.
+
+    Args:
+        service_name: Name of the service for trace identification
+        otlp_endpoint: OTLP collector endpoint (e.g., Jaeger, Tempo)
+    """
+    # Create a resource that identifies this service
+    resource = Resource(attributes={SERVICE_NAME: service_name})
+
+    # Create tracer provider with the resource
+    provider = TracerProvider(resource=resource)
+
+    # Create OTLP exporter that sends spans to the collector
+    otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
+
+    # Add batch processor for efficient span export
+    provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
+
+    # Set the global tracer provider
+    trace.set_tracer_provider(provider)
+
+    # Auto-instrument OpenAI client for automatic span creation
+    OpenAIInstrumentor().instrument()
+
+    # Auto-instrument requests library for HTTP trace header injection
+    RequestsInstrumentor().instrument()
+
+    print(f"✅ OpenTelemetry tracing initialized")
+    print(f"   Service: {service_name}")
+    print(f"   OTLP Endpoint: {otlp_endpoint}")
+
+
+def main():
+    """
+    Main example demonstrating distributed tracing with semantic router.
+    """
+    # Configuration from environment variables with defaults
+    router_url = os.getenv("SEMANTIC_ROUTER_URL", "http://localhost:8000")
+    otlp_endpoint = os.getenv("OTLP_ENDPOINT", "http://localhost:4317")
+    api_key = os.getenv("OPENAI_API_KEY", "dummy-key-for-local-testing")
+
+    print("=" * 80)
+    print("OpenTelemetry Distributed Tracing Example")
+    print("=" * 80)
+    print(f"Router URL: {router_url}")
+    print(f"OTLP Endpoint: {otlp_endpoint}")
+    print()
+
+    # Setup OpenTelemetry tracing
+    setup_tracing(
+        service_name="openai-client-example", otlp_endpoint=otlp_endpoint
+    )
+
+    # Create OpenAI client pointing to semantic router
+    client = OpenAI(
+        base_url=f"{router_url}/v1",
+        api_key=api_key,
+    )
+
+    # Get tracer for creating custom spans
+    tracer = trace.get_tracer(__name__)
+
+    try:
+        # Example 1: Simple completion with auto-routing
+        print("\n📝 Example 1: Auto-routing (model='auto')")
+        print("-" * 80)
+
+        with tracer.start_as_current_span("example_1_auto_routing") as span:
+            # Add custom attributes to the span
+            span.set_attribute("example.type", "auto_routing")
+            span.set_attribute("example.number", 1)
+
+            response = client.chat.completions.create(
+                model="auto",  # Triggers semantic routing
+                messages=[
+                    {
+                        "role": "user",
+                        "content": "What is quantum computing? Explain in simple terms.",
+                    }
+                ],
+                max_tokens=150,
+            )
+
+            print(f"Model used: {response.model}")
+            print(f"Response: {response.choices[0].message.content[:200]}...")
+
+        # Example 2: Math/reasoning query
+        print("\n📝 Example 2: Math/Reasoning Query")
+        print("-" * 80)
+
+        with tracer.start_as_current_span("example_2_math_reasoning") as span:
+            span.set_attribute("example.type", "math_reasoning")
+            span.set_attribute("example.number", 2)
+
+            response = client.chat.completions.create(
+                model="auto",
+                messages=[
+                    {
+                        "role": "user",
+                        "content": "Calculate the compound interest on $10,000 at 5% annual rate over 3 years.",
+                    }
+                ],
+                max_tokens=150,
+            )
+
+            print(f"Model used: {response.model}")
+            print(f"Response: {response.choices[0].message.content[:200]}...")
+
+        # Example 3: Streaming response
+        print("\n📝 Example 3: Streaming Response")
+        print("-" * 80)
+
+        with tracer.start_as_current_span("example_3_streaming") as span:
+            span.set_attribute("example.type", "streaming")
+            span.set_attribute("example.number", 3)
+
+            stream = client.chat.completions.create(
+                model="auto",
+                messages=[
+                    {
+                        "role": "user",
+                        "content": "Write a haiku about distributed tracing.",
+                    }
+                ],
+                max_tokens=50,
+                stream=True,
+            )
+
+            print("Streaming response: ", end="", flush=True)
+            for chunk in stream:
+                if chunk.choices[0].delta.content is not None:
+                    print(chunk.choices[0].delta.content, end="", flush=True)
+            print()
+
+        print("\n" + "=" * 80)
+        print("✅ Examples completed successfully!")
+        print("=" * 80)
+        print("\n📊 View traces in Jaeger UI:")
+        print("   http://localhost:16686")
+        print("\n🔍 Search for service: 'openai-client-example'")
+        print("   You should see traces with spans from:")
+        print("   - openai-client-example (this application)")
+        print("   - vllm-semantic-router (the router)")
+        print("   - Any configured vLLM backends (if instrumented)")
+
+    except Exception as e:
+        print(f"\n❌ Error: {e}", file=sys.stderr)
+        print(
+            "\nTroubleshooting:",
+            file=sys.stderr,
+        )
+        print(f"  1. Ensure semantic router is running at {router_url}", file=sys.stderr)
+        print(f"  2. Ensure OTLP collector is running at {otlp_endpoint}", file=sys.stderr)
+        print(
+            "  3. Check router config has tracing enabled",
+            file=sys.stderr,
+        )
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/distributed-tracing/requirements.txt b/examples/distributed-tracing/requirements.txt
new file mode 100644
index 00000000..bd5e67a2
--- /dev/null
+++ b/examples/distributed-tracing/requirements.txt
@@ -0,0 +1,19 @@
+# OpenTelemetry Distributed Tracing Example Requirements
+# These packages are needed to run the distributed tracing examples
+
+# OpenAI Python SDK
+openai>=1.0.0
+
+# OpenTelemetry Core
+opentelemetry-api>=1.20.0
+opentelemetry-sdk>=1.20.0
+
+# OpenTelemetry OTLP Exporter (for Jaeger, Tempo, etc.)
+opentelemetry-exporter-otlp-proto-grpc>=1.20.0
+
+# OpenTelemetry Instrumentations
+opentelemetry-instrumentation-openai>=0.20.0
+opentelemetry-instrumentation-requests>=0.41b0
+
+# HTTP client
+requests>=2.28.0
diff --git a/examples/distributed-tracing/router-config.yaml b/examples/distributed-tracing/router-config.yaml
new file mode 100644
index 00000000..eef66af2
--- /dev/null
+++ b/examples/distributed-tracing/router-config.yaml
@@ -0,0 +1,74 @@
+# Example Router Configuration with OpenTelemetry Tracing Enabled
+#
+# This configuration demonstrates how to enable distributed tracing
+# in the semantic router for end-to-end observability.
+
+# Observability configuration
+observability:
+  tracing:
+    # Enable distributed tracing
+    enabled: true
+
+    # OpenTelemetry provider
+    provider: "opentelemetry"
+
+    # OTLP exporter configuration
+    exporter:
+      # Export to OTLP collector (Jaeger in this example)
+      type: "otlp"
+
+      # Jaeger OTLP gRPC endpoint
+      endpoint: "jaeger:4317"
+
+      # Use insecure connection (for local development)
+      # In production, set to false and configure TLS
+      insecure: true
+
+    # Sampling configuration
+    sampling:
+      # Sample all requests for development/debugging
+      # In production, use "probabilistic" with a lower rate
+      type: "always_on"
+
+      # For production, use probabilistic sampling:
+      # type: "probabilistic"
+      # rate: 0.1  # Sample 10% of requests
+
+    # Resource attributes for service identification
+    resource:
+      service_name: "vllm-semantic-router"
+      service_version: "v0.1.0"
+      deployment_environment: "development"
+
+# Model routing configuration (example)
+models:
+  - name: "llama-3.1-8b"
+    category: "general"
+    endpoints:
+      - name: "vllm-backend-1"
+        address: "http://vllm-backend:8000"
+        weight: 1.0
+
+  - name: "llama-3.1-70b"
+    category: "reasoning"
+    endpoints:
+      - name: "vllm-backend-2"
+        address: "http://vllm-backend:8000"
+        weight: 1.0
+
+# Classification configuration
+classification:
+  enabled: true
+  # You can add more classification settings here
+
+# Security features (optional)
+security:
+  pii_detection:
+    enabled: false
+  jailbreak_detection:
+    enabled: false
+
+# Caching (optional)
+cache:
+  enabled: false
+  # cache configuration here
diff --git a/website/docs/tutorials/observability/distributed-tracing.md b/website/docs/tutorials/observability/distributed-tracing.md
index a0e47612..7c740d51 100644
--- a/website/docs/tutorials/observability/distributed-tracing.md
+++ b/website/docs/tutorials/observability/distributed-tracing.md
@@ -499,16 +499,125 @@ sampling:
 
 ## Integration with vLLM Stack
 
+### End-to-End Tracing Example
+
+A complete end-to-end tracing example is available in the repository at [`examples/distributed-tracing/`](https://github.com/vllm-project/semantic-router/tree/main/examples/distributed-tracing).
+
+The example demonstrates:
+
+**Client Application (Python with OpenAI SDK)**
+
+```python
+from openai import OpenAI
+from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+from opentelemetry.instrumentation.requests import RequestsInstrumentor
+
+# Auto-instrument for automatic trace header injection
+RequestsInstrumentor().instrument()
+OpenAIInstrumentor().instrument()
+
+# Point client to semantic router
+client = OpenAI(base_url="http://semantic-router:8000/v1")
+
+# Make request - trace context flows automatically
+response = client.chat.completions.create(
+    model="auto",  # Triggers semantic routing
+    messages=[{"role": "user", "content": "What is quantum computing?"}]
+)
+```
+
+**Trace Context Flow**
+
+```
+Client Application
+    ↓ (HTTP headers: traceparent, tracestate)
+Semantic Router ExtProc
+    ↓ (Extract trace context from headers)
+Processing Spans
+    ├─ semantic_router.classification (45ms) [category=science]
+    ├─ semantic_router.cache.lookup (3ms) [cache_miss=true]
+    ├─ semantic_router.security.pii_detection (2ms) [pii_detected=false]
+    └─ semantic_router.routing.decision (23ms) [selected=llama-3.1-70b]
+    ↓ (Inject trace context into upstream headers)
+vLLM Backend Request
+    ↓ (HTTP headers: traceparent, tracestate)
+vLLM Processing (if OTEL-enabled)
+    └─ vllm.generate (2.0s) [tokens=156]
+    ↓
+OTLP Collector / Jaeger
+```
+
+**Running the Example**
+
+1. **Start the tracing stack:**
+
+```bash
+cd examples/distributed-tracing
+docker-compose up -d
+```
+
+2. **Install dependencies:**
+
+```bash
+pip install -r requirements.txt
+```
+
+3. **Run the example:**
+
+```bash
+python openai_client_tracing.py
+```
+
+4. **View traces in Jaeger UI:**
+
+Open http://localhost:16686 and search for service `openai-client-example`.
+
+**Example Trace Visualization**
+
+You'll see a complete trace showing the request flow:
+
+```
+Trace: user-query-quantum-computing (2.3s total)
+├── app.chat_completion (2.3s)
+│   └── HTTP POST /v1/chat/completions (2.2s)
+│       ├── semantic_router.request.received (2.2s)
+│       │   ├── semantic_router.classification (45ms) [category=science]
+│       │   ├── semantic_router.cache.lookup (3ms) [cache_miss=true]
+│       │   ├── semantic_router.security.pii_detection (2ms) [pii_detected=false]
+│       │   └── semantic_router.routing.decision (23ms) [selected=llama-3.1-70b]
+│       └── semantic_router.upstream.request (2.1s)
+│           └── vllm.chat_completion (2.1s) [tokens=156]
+```
+
+**Benefits for Different Personas**
+
+*For Developers:*
+- **End-to-end visibility** from application to vLLM
+- **Performance debugging** with detailed timing breakdowns
+- **Error correlation** across service boundaries
+- **Routing decision analysis** with full context
+
+*For Operations:*
+- **SLA monitoring** with distributed latency tracking
+- **Capacity planning** based on actual usage patterns
+- **Incident response** with complete request traces
+- **Cost optimization** through routing efficiency analysis
+
+*For Product Teams:*
+- **User experience insights** with real performance data
+- **A/B testing** of routing strategies with trace correlation
+- **Quality metrics** tied to specific routing decisions
+
+See the [complete example README](https://github.com/vllm-project/semantic-router/tree/main/examples/distributed-tracing/README.md) for more details.
+
 ### Future Enhancements
 
 The tracing implementation is designed to support future integration with vLLM backends:
 
-1. **Trace context propagation** to vLLM
-2. **Correlated spans** across router and engine
-3. **End-to-end latency** analysis
-4. **Token-level timing** from vLLM
-
-Stay tuned for updates on vLLM integration!
+1. **Trace context propagation** to vLLM (already supported via HTTP headers)
+2. **Native vLLM OTEL support** for correlated spans
+3. **Token-level timing** from vLLM generation
+4. **Structured logging** correlation with trace IDs
 
 ## References