From c8eba526508734aab23021e3037162a4a2c752ff Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 3 Oct 2025 17:58:47 +0000 Subject: [PATCH 1/2] Initial plan From 11535b4deedd5bf3b70689bf42ab285622d7a42a Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 3 Oct 2025 18:09:23 +0000 Subject: [PATCH 2/2] Add OpenTelemetry distributed tracing integration examples Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com> --- examples/distributed-tracing/README.md | 339 ++++++++++++++++++ .../distributed-tracing/docker-compose.yml | 66 ++++ .../openai_client_tracing.py | 202 +++++++++++ examples/distributed-tracing/requirements.txt | 19 + .../distributed-tracing/router-config.yaml | 74 ++++ .../observability/distributed-tracing.md | 121 ++++++- 6 files changed, 815 insertions(+), 6 deletions(-) create mode 100644 examples/distributed-tracing/README.md create mode 100644 examples/distributed-tracing/docker-compose.yml create mode 100644 examples/distributed-tracing/openai_client_tracing.py create mode 100644 examples/distributed-tracing/requirements.txt create mode 100644 examples/distributed-tracing/router-config.yaml diff --git a/examples/distributed-tracing/README.md b/examples/distributed-tracing/README.md new file mode 100644 index 00000000..4b32736a --- /dev/null +++ b/examples/distributed-tracing/README.md @@ -0,0 +1,339 @@ +# OpenTelemetry Distributed Tracing Examples + +This directory contains examples demonstrating end-to-end distributed tracing with vLLM Semantic Router using OpenTelemetry. + +## Overview + +These examples show how to: + +- **Auto-instrument OpenAI Python client** for automatic trace creation +- **Propagate trace context** from client → router → vLLM backends +- **Visualize request flows** in Jaeger UI +- **Debug performance issues** with detailed span timings +- **Correlate errors** across service boundaries + +## Architecture + +``` +┌─────────────────┐ traceparent ┌──────────────────┐ traceparent ┌────────────────┐ +│ OpenAI Client │ ──────────────────> │ Semantic Router │ ──────────────────> │ vLLM Backend │ +│ (Python App) │ HTTP Headers │ (ExtProc) │ HTTP Headers │ (Optional) │ +└─────────────────┘ └──────────────────┘ └────────────────┘ + │ │ │ + │ │ │ + └────────────────────────────────────────┴────────────────────────────────────────┘ + │ + ▼ + ┌──────────────────┐ + │ Jaeger/Tempo │ + │ (OTLP Collector)│ + └──────────────────┘ +``` + +## Files + +- **`openai_client_tracing.py`** - Python example with OpenAI client auto-instrumentation +- **`docker-compose.yml`** - Complete tracing stack with Jaeger +- **`router-config.yaml`** - Router configuration with tracing enabled +- **`requirements.txt`** - Python dependencies for the example +- **`README.md`** - This file + +## Quick Start + +### Prerequisites + +1. **Docker and Docker Compose** installed +2. **Python 3.8+** installed +3. **Semantic Router image** built (or use from registry) + +### Step 1: Start the Tracing Stack + +Start Jaeger and the semantic router with tracing enabled: + +```bash +# From this directory +docker-compose up -d +``` + +This starts: +- **Jaeger** on ports 16686 (UI) and 4317 (OTLP gRPC) +- **Semantic Router** on port 8000 with tracing enabled + +Verify services are running: + +```bash +docker-compose ps +``` + +### Step 2: Install Python Dependencies + +Install the required Python packages: + +```bash +pip install -r requirements.txt +``` + +### Step 3: Run the Example + +Run the Python example that makes requests to the router: + +```bash +python openai_client_tracing.py +``` + +The example will: +1. Initialize OpenTelemetry tracing +2. Auto-instrument the OpenAI client +3. Make several example requests with different query types +4. Send traces to Jaeger + +### Step 4: View Traces in Jaeger + +Open the Jaeger UI in your browser: + +``` +http://localhost:16686 +``` + +1. Select **Service**: `openai-client-example` +2. Click **Find Traces** +3. Click on a trace to see the detailed timeline + +You should see traces with spans from: +- `openai-client-example` (the Python client) +- `vllm-semantic-router` (the router processing) +- Individual operations (classification, routing, etc.) + +## Example Trace Visualization + +A typical trace will show: + +``` +Trace: example_1_auto_routing (2.3s total) +├── openai.chat.completions.create (2.3s) +│ └── HTTP POST /v1/chat/completions (2.2s) +│ ├── semantic_router.request.received (2.2s) +│ │ ├── semantic_router.classification (45ms) [category=science] +│ │ ├── semantic_router.cache.lookup (3ms) [cache_miss=true] +│ │ ├── semantic_router.routing.decision (23ms) [selected=llama-3.1-70b] +│ │ └── semantic_router.upstream.request (2.1s) +│ │ └── vllm.generate (2.0s) [tokens=156] +``` + +## Configuration Options + +### Environment Variables + +The example supports these environment variables: + +- **`SEMANTIC_ROUTER_URL`** - Router URL (default: `http://localhost:8000`) +- **`OTLP_ENDPOINT`** - OTLP collector endpoint (default: `http://localhost:4317`) +- **`OPENAI_API_KEY`** - API key (default: `dummy-key-for-local-testing`) + +Example with custom configuration: + +```bash +export SEMANTIC_ROUTER_URL="http://my-router:8000" +export OTLP_ENDPOINT="http://tempo:4317" +python openai_client_tracing.py +``` + +### Router Configuration + +Edit `router-config.yaml` to customize tracing behavior: + +```yaml +observability: + tracing: + enabled: true + exporter: + type: "otlp" + endpoint: "jaeger:4317" + sampling: + type: "always_on" # or "probabilistic" + rate: 1.0 # sample rate for probabilistic +``` + +**Sampling strategies:** + +- **`always_on`** - Sample 100% of requests (development/debugging) +- **`always_off`** - Disable sampling (emergency) +- **`probabilistic`** - Sample a percentage (production) + +**Production recommendation:** +```yaml +sampling: + type: "probabilistic" + rate: 0.1 # Sample 10% of requests +``` + +## Advanced Usage + +### Custom Spans + +Add custom spans to track specific operations: + +```python +from opentelemetry import trace + +tracer = trace.get_tracer(__name__) + +with tracer.start_as_current_span("my_operation") as span: + span.set_attribute("custom.attribute", "value") + # Your code here +``` + +### Multiple Backends + +Configure the router to route to different vLLM backends and see how traces flow: + +```yaml +models: + - name: "llama-3.1-8b" + category: "general" + endpoints: + - name: "backend-1" + address: "http://vllm-1:8000" + + - name: "llama-3.1-70b" + category: "reasoning" + endpoints: + - name: "backend-2" + address: "http://vllm-2:8000" +``` + +### Alternative Tracing Backends + +#### Grafana Tempo + +Replace Jaeger with Tempo in `docker-compose.yml`: + +```yaml +services: + tempo: + image: grafana/tempo:latest + ports: + - "4317:4317" + - "3200:3200" + command: ["-config.file=/etc/tempo.yaml"] +``` + +Update router config: + +```yaml +exporter: + endpoint: "tempo:4317" +``` + +#### Datadog + +Use Datadog OTLP endpoint: + +```yaml +exporter: + type: "otlp" + endpoint: "https://otlp.datadoghq.com" + insecure: false +``` + +## Troubleshooting + +### Traces Not Appearing + +1. **Check services are running:** + ```bash + docker-compose ps + ``` + +2. **Check router logs for tracing initialization:** + ```bash + docker-compose logs semantic-router | grep -i tracing + ``` + +3. **Verify OTLP endpoint connectivity:** + ```bash + telnet localhost 4317 + ``` + +4. **Check sampling rate:** + - Ensure `always_on` for development + - Increase `rate` if using probabilistic sampling + +### Connection Refused Errors + +If the Python example can't connect to the router: + +```bash +# Check router is accessible +curl http://localhost:8000/health + +# Check Docker network +docker network inspect distributed-tracing_tracing-network +``` + +### High Memory Usage + +If you see high memory usage: + +1. **Reduce sampling rate:** + ```yaml + sampling: + type: "probabilistic" + rate: 0.01 # 1% sampling + ``` + +2. **Check batch export settings** in the application code + +## Best Practices + +1. **Start with stdout exporter** to verify tracing works before using OTLP +2. **Use probabilistic sampling** in production (10% is a good starting point) +3. **Add meaningful attributes** to spans for debugging +4. **Monitor exporter health** and track export failures +5. **Correlate traces with logs** using the same service name +6. **Set appropriate timeout values** for span export + +## Production Deployment + +For production, consider: + +1. **Use TLS** for OTLP endpoint: + ```yaml + exporter: + endpoint: "otlp-collector.prod.svc:4317" + insecure: false + ``` + +2. **Tune sampling rate** based on traffic: + - High traffic: 0.01-0.1 (1-10%) + - Medium traffic: 0.1-0.5 (10-50%) + - Low traffic: 0.5-1.0 (50-100%) + +3. **Use a dedicated OTLP collector** (not Jaeger directly) + +4. **Set resource limits** in Kubernetes: + ```yaml + resources: + limits: + memory: "2Gi" + cpu: "1000m" + ``` + +## Additional Resources + +- [OpenTelemetry Python Documentation](https://opentelemetry.io/docs/instrumentation/python/) +- [OpenAI Instrumentation](https://github.com/traceloop/openllmetry) +- [Jaeger Documentation](https://www.jaegertracing.io/docs/) +- [vLLM Semantic Router Tracing Guide](../../website/docs/tutorials/observability/distributed-tracing.md) +- [W3C Trace Context](https://www.w3.org/TR/trace-context/) + +## Support + +For questions or issues: +- Create an issue in the repository +- Join the `#semantic-router` channel in vLLM Slack +- Check the [troubleshooting guide](../../website/docs/troubleshooting/) + +## License + +Apache 2.0 - See LICENSE file in the repository root. diff --git a/examples/distributed-tracing/docker-compose.yml b/examples/distributed-tracing/docker-compose.yml new file mode 100644 index 00000000..e37760a2 --- /dev/null +++ b/examples/distributed-tracing/docker-compose.yml @@ -0,0 +1,66 @@ +version: '3.8' + +services: + # Jaeger all-in-one for tracing + jaeger: + image: jaegertracing/all-in-one:latest + container_name: jaeger-tracing + ports: + # Jaeger UI + - "16686:16686" + # OTLP gRPC receiver + - "4317:4317" + # OTLP HTTP receiver + - "4318:4318" + # Jaeger Thrift HTTP + - "14268:14268" + # Jaeger Thrift compact + - "6831:6831/udp" + environment: + - COLLECTOR_OTLP_ENABLED=true + - LOG_LEVEL=debug + networks: + - tracing-network + + # Semantic Router with tracing enabled + semantic-router: + image: vllm-semantic-router:latest + container_name: semantic-router + ports: + - "8000:8000" + volumes: + - ./router-config.yaml:/config/config.yaml + environment: + - CONFIG_PATH=/config/config.yaml + networks: + - tracing-network + depends_on: + - jaeger + + # Example vLLM backend (optional - requires vLLM image) + # Uncomment if you have a vLLM backend to test with + # vllm-backend: + # image: vllm/vllm-openai:latest + # container_name: vllm-backend + # ports: + # - "8001:8000" + # command: + # - --model + # - facebook/opt-125m + # - --max-model-len + # - "2048" + # environment: + # - CUDA_VISIBLE_DEVICES=0 + # networks: + # - tracing-network + # deploy: + # resources: + # reservations: + # devices: + # - driver: nvidia + # count: 1 + # capabilities: [gpu] + +networks: + tracing-network: + driver: bridge diff --git a/examples/distributed-tracing/openai_client_tracing.py b/examples/distributed-tracing/openai_client_tracing.py new file mode 100644 index 00000000..087345ef --- /dev/null +++ b/examples/distributed-tracing/openai_client_tracing.py @@ -0,0 +1,202 @@ +#!/usr/bin/env python3 +""" +OpenTelemetry Distributed Tracing with OpenAI Client and Semantic Router + +This example demonstrates end-to-end distributed tracing from a client application +through the semantic router to vLLM backends using OpenTelemetry. + +The example shows: +1. Auto-instrumentation of OpenAI Python client +2. Auto-instrumentation of HTTP requests library +3. Trace context propagation across service boundaries +4. Span attributes for debugging and analysis + +Prerequisites: +- Semantic Router running with tracing enabled +- Jaeger or another OTLP collector running +- Python packages: openai, opentelemetry-* (see requirements.txt) + +Signed-off-by: GitHub Copilot +""" + +import os +import sys +from typing import Optional + +from openai import OpenAI +from opentelemetry import trace +from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter +from opentelemetry.instrumentation.openai import OpenAIInstrumentor +from opentelemetry.instrumentation.requests import RequestsInstrumentor +from opentelemetry.sdk.resources import SERVICE_NAME, Resource +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import BatchSpanProcessor + + +def setup_tracing( + service_name: str = "openai-client-example", + otlp_endpoint: str = "http://localhost:4317", +) -> None: + """ + Initialize OpenTelemetry tracing with OTLP exporter. + + Args: + service_name: Name of the service for trace identification + otlp_endpoint: OTLP collector endpoint (e.g., Jaeger, Tempo) + """ + # Create a resource that identifies this service + resource = Resource(attributes={SERVICE_NAME: service_name}) + + # Create tracer provider with the resource + provider = TracerProvider(resource=resource) + + # Create OTLP exporter that sends spans to the collector + otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True) + + # Add batch processor for efficient span export + provider.add_span_processor(BatchSpanProcessor(otlp_exporter)) + + # Set the global tracer provider + trace.set_tracer_provider(provider) + + # Auto-instrument OpenAI client for automatic span creation + OpenAIInstrumentor().instrument() + + # Auto-instrument requests library for HTTP trace header injection + RequestsInstrumentor().instrument() + + print(f"✅ OpenTelemetry tracing initialized") + print(f" Service: {service_name}") + print(f" OTLP Endpoint: {otlp_endpoint}") + + +def main(): + """ + Main example demonstrating distributed tracing with semantic router. + """ + # Configuration from environment variables with defaults + router_url = os.getenv("SEMANTIC_ROUTER_URL", "http://localhost:8000") + otlp_endpoint = os.getenv("OTLP_ENDPOINT", "http://localhost:4317") + api_key = os.getenv("OPENAI_API_KEY", "dummy-key-for-local-testing") + + print("=" * 80) + print("OpenTelemetry Distributed Tracing Example") + print("=" * 80) + print(f"Router URL: {router_url}") + print(f"OTLP Endpoint: {otlp_endpoint}") + print() + + # Setup OpenTelemetry tracing + setup_tracing( + service_name="openai-client-example", otlp_endpoint=otlp_endpoint + ) + + # Create OpenAI client pointing to semantic router + client = OpenAI( + base_url=f"{router_url}/v1", + api_key=api_key, + ) + + # Get tracer for creating custom spans + tracer = trace.get_tracer(__name__) + + try: + # Example 1: Simple completion with auto-routing + print("\n📝 Example 1: Auto-routing (model='auto')") + print("-" * 80) + + with tracer.start_as_current_span("example_1_auto_routing") as span: + # Add custom attributes to the span + span.set_attribute("example.type", "auto_routing") + span.set_attribute("example.number", 1) + + response = client.chat.completions.create( + model="auto", # Triggers semantic routing + messages=[ + { + "role": "user", + "content": "What is quantum computing? Explain in simple terms.", + } + ], + max_tokens=150, + ) + + print(f"Model used: {response.model}") + print(f"Response: {response.choices[0].message.content[:200]}...") + + # Example 2: Math/reasoning query + print("\n📝 Example 2: Math/Reasoning Query") + print("-" * 80) + + with tracer.start_as_current_span("example_2_math_reasoning") as span: + span.set_attribute("example.type", "math_reasoning") + span.set_attribute("example.number", 2) + + response = client.chat.completions.create( + model="auto", + messages=[ + { + "role": "user", + "content": "Calculate the compound interest on $10,000 at 5% annual rate over 3 years.", + } + ], + max_tokens=150, + ) + + print(f"Model used: {response.model}") + print(f"Response: {response.choices[0].message.content[:200]}...") + + # Example 3: Streaming response + print("\n📝 Example 3: Streaming Response") + print("-" * 80) + + with tracer.start_as_current_span("example_3_streaming") as span: + span.set_attribute("example.type", "streaming") + span.set_attribute("example.number", 3) + + stream = client.chat.completions.create( + model="auto", + messages=[ + { + "role": "user", + "content": "Write a haiku about distributed tracing.", + } + ], + max_tokens=50, + stream=True, + ) + + print("Streaming response: ", end="", flush=True) + for chunk in stream: + if chunk.choices[0].delta.content is not None: + print(chunk.choices[0].delta.content, end="", flush=True) + print() + + print("\n" + "=" * 80) + print("✅ Examples completed successfully!") + print("=" * 80) + print("\n📊 View traces in Jaeger UI:") + print(" http://localhost:16686") + print("\n🔍 Search for service: 'openai-client-example'") + print(" You should see traces with spans from:") + print(" - openai-client-example (this application)") + print(" - vllm-semantic-router (the router)") + print(" - Any configured vLLM backends (if instrumented)") + + except Exception as e: + print(f"\n❌ Error: {e}", file=sys.stderr) + print( + "\nTroubleshooting:", + file=sys.stderr, + ) + print(f" 1. Ensure semantic router is running at {router_url}", file=sys.stderr) + print(f" 2. Ensure OTLP collector is running at {otlp_endpoint}", file=sys.stderr) + print( + " 3. Check router config has tracing enabled", + file=sys.stderr, + ) + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/examples/distributed-tracing/requirements.txt b/examples/distributed-tracing/requirements.txt new file mode 100644 index 00000000..bd5e67a2 --- /dev/null +++ b/examples/distributed-tracing/requirements.txt @@ -0,0 +1,19 @@ +# OpenTelemetry Distributed Tracing Example Requirements +# These packages are needed to run the distributed tracing examples + +# OpenAI Python SDK +openai>=1.0.0 + +# OpenTelemetry Core +opentelemetry-api>=1.20.0 +opentelemetry-sdk>=1.20.0 + +# OpenTelemetry OTLP Exporter (for Jaeger, Tempo, etc.) +opentelemetry-exporter-otlp-proto-grpc>=1.20.0 + +# OpenTelemetry Instrumentations +opentelemetry-instrumentation-openai>=0.20.0 +opentelemetry-instrumentation-requests>=0.41b0 + +# HTTP client +requests>=2.28.0 diff --git a/examples/distributed-tracing/router-config.yaml b/examples/distributed-tracing/router-config.yaml new file mode 100644 index 00000000..eef66af2 --- /dev/null +++ b/examples/distributed-tracing/router-config.yaml @@ -0,0 +1,74 @@ +# Example Router Configuration with OpenTelemetry Tracing Enabled +# +# This configuration demonstrates how to enable distributed tracing +# in the semantic router for end-to-end observability. + +# Observability configuration +observability: + tracing: + # Enable distributed tracing + enabled: true + + # OpenTelemetry provider + provider: "opentelemetry" + + # OTLP exporter configuration + exporter: + # Export to OTLP collector (Jaeger in this example) + type: "otlp" + + # Jaeger OTLP gRPC endpoint + endpoint: "jaeger:4317" + + # Use insecure connection (for local development) + # In production, set to false and configure TLS + insecure: true + + # Sampling configuration + sampling: + # Sample all requests for development/debugging + # In production, use "probabilistic" with a lower rate + type: "always_on" + + # For production, use probabilistic sampling: + # type: "probabilistic" + # rate: 0.1 # Sample 10% of requests + + # Resource attributes for service identification + resource: + service_name: "vllm-semantic-router" + service_version: "v0.1.0" + deployment_environment: "development" + +# Model routing configuration (example) +models: + - name: "llama-3.1-8b" + category: "general" + endpoints: + - name: "vllm-backend-1" + address: "http://vllm-backend:8000" + weight: 1.0 + + - name: "llama-3.1-70b" + category: "reasoning" + endpoints: + - name: "vllm-backend-2" + address: "http://vllm-backend:8000" + weight: 1.0 + +# Classification configuration +classification: + enabled: true + # You can add more classification settings here + +# Security features (optional) +security: + pii_detection: + enabled: false + jailbreak_detection: + enabled: false + +# Caching (optional) +cache: + enabled: false + # cache configuration here diff --git a/website/docs/tutorials/observability/distributed-tracing.md b/website/docs/tutorials/observability/distributed-tracing.md index a0e47612..7c740d51 100644 --- a/website/docs/tutorials/observability/distributed-tracing.md +++ b/website/docs/tutorials/observability/distributed-tracing.md @@ -499,16 +499,125 @@ sampling: ## Integration with vLLM Stack +### End-to-End Tracing Example + +A complete end-to-end tracing example is available in the repository at [`examples/distributed-tracing/`](https://github.com/vllm-project/semantic-router/tree/main/examples/distributed-tracing). + +The example demonstrates: + +**Client Application (Python with OpenAI SDK)** + +```python +from openai import OpenAI +from opentelemetry.instrumentation.openai import OpenAIInstrumentor +from opentelemetry.instrumentation.requests import RequestsInstrumentor + +# Auto-instrument for automatic trace header injection +RequestsInstrumentor().instrument() +OpenAIInstrumentor().instrument() + +# Point client to semantic router +client = OpenAI(base_url="http://semantic-router:8000/v1") + +# Make request - trace context flows automatically +response = client.chat.completions.create( + model="auto", # Triggers semantic routing + messages=[{"role": "user", "content": "What is quantum computing?"}] +) +``` + +**Trace Context Flow** + +``` +Client Application + ↓ (HTTP headers: traceparent, tracestate) +Semantic Router ExtProc + ↓ (Extract trace context from headers) +Processing Spans + ├─ semantic_router.classification (45ms) [category=science] + ├─ semantic_router.cache.lookup (3ms) [cache_miss=true] + ├─ semantic_router.security.pii_detection (2ms) [pii_detected=false] + └─ semantic_router.routing.decision (23ms) [selected=llama-3.1-70b] + ↓ (Inject trace context into upstream headers) +vLLM Backend Request + ↓ (HTTP headers: traceparent, tracestate) +vLLM Processing (if OTEL-enabled) + └─ vllm.generate (2.0s) [tokens=156] + ↓ +OTLP Collector / Jaeger +``` + +**Running the Example** + +1. **Start the tracing stack:** + +```bash +cd examples/distributed-tracing +docker-compose up -d +``` + +2. **Install dependencies:** + +```bash +pip install -r requirements.txt +``` + +3. **Run the example:** + +```bash +python openai_client_tracing.py +``` + +4. **View traces in Jaeger UI:** + +Open http://localhost:16686 and search for service `openai-client-example`. + +**Example Trace Visualization** + +You'll see a complete trace showing the request flow: + +``` +Trace: user-query-quantum-computing (2.3s total) +├── app.chat_completion (2.3s) +│ └── HTTP POST /v1/chat/completions (2.2s) +│ ├── semantic_router.request.received (2.2s) +│ │ ├── semantic_router.classification (45ms) [category=science] +│ │ ├── semantic_router.cache.lookup (3ms) [cache_miss=true] +│ │ ├── semantic_router.security.pii_detection (2ms) [pii_detected=false] +│ │ └── semantic_router.routing.decision (23ms) [selected=llama-3.1-70b] +│ └── semantic_router.upstream.request (2.1s) +│ └── vllm.chat_completion (2.1s) [tokens=156] +``` + +**Benefits for Different Personas** + +*For Developers:* +- **End-to-end visibility** from application to vLLM +- **Performance debugging** with detailed timing breakdowns +- **Error correlation** across service boundaries +- **Routing decision analysis** with full context + +*For Operations:* +- **SLA monitoring** with distributed latency tracking +- **Capacity planning** based on actual usage patterns +- **Incident response** with complete request traces +- **Cost optimization** through routing efficiency analysis + +*For Product Teams:* +- **User experience insights** with real performance data +- **A/B testing** of routing strategies with trace correlation +- **Quality metrics** tied to specific routing decisions + +See the [complete example README](https://github.com/vllm-project/semantic-router/tree/main/examples/distributed-tracing/README.md) for more details. + ### Future Enhancements The tracing implementation is designed to support future integration with vLLM backends: -1. **Trace context propagation** to vLLM -2. **Correlated spans** across router and engine -3. **End-to-end latency** analysis -4. **Token-level timing** from vLLM - -Stay tuned for updates on vLLM integration! +1. **Trace context propagation** to vLLM (already supported via HTTP headers) +2. **Native vLLM OTEL support** for correlated spans +3. **Token-level timing** from vLLM generation +4. **Structured logging** correlation with trace IDs ## References