Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
339 changes: 339 additions & 0 deletions examples/distributed-tracing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,339 @@
# OpenTelemetry Distributed Tracing Examples

This directory contains examples demonstrating end-to-end distributed tracing with vLLM Semantic Router using OpenTelemetry.

## Overview

These examples show how to:

- **Auto-instrument OpenAI Python client** for automatic trace creation
- **Propagate trace context** from client → router → vLLM backends
- **Visualize request flows** in Jaeger UI
- **Debug performance issues** with detailed span timings
- **Correlate errors** across service boundaries

## Architecture

```
┌─────────────────┐ traceparent ┌──────────────────┐ traceparent ┌────────────────┐
│ OpenAI Client │ ──────────────────> │ Semantic Router │ ──────────────────> │ vLLM Backend │
│ (Python App) │ HTTP Headers │ (ExtProc) │ HTTP Headers │ (Optional) │
└─────────────────┘ └──────────────────┘ └────────────────┘
│ │ │
│ │ │
└────────────────────────────────────────┴────────────────────────────────────────┘
┌──────────────────┐
│ Jaeger/Tempo │
│ (OTLP Collector)│
└──────────────────┘
```

## Files

- **`openai_client_tracing.py`** - Python example with OpenAI client auto-instrumentation
- **`docker-compose.yml`** - Complete tracing stack with Jaeger
- **`router-config.yaml`** - Router configuration with tracing enabled
- **`requirements.txt`** - Python dependencies for the example
- **`README.md`** - This file

## Quick Start

### Prerequisites

1. **Docker and Docker Compose** installed
2. **Python 3.8+** installed
3. **Semantic Router image** built (or use from registry)

### Step 1: Start the Tracing Stack

Start Jaeger and the semantic router with tracing enabled:

```bash
# From this directory
docker-compose up -d
```

This starts:
- **Jaeger** on ports 16686 (UI) and 4317 (OTLP gRPC)
- **Semantic Router** on port 8000 with tracing enabled

Verify services are running:

```bash
docker-compose ps
```

### Step 2: Install Python Dependencies

Install the required Python packages:

```bash
pip install -r requirements.txt
```

### Step 3: Run the Example

Run the Python example that makes requests to the router:

```bash
python openai_client_tracing.py
```

The example will:
1. Initialize OpenTelemetry tracing
2. Auto-instrument the OpenAI client
3. Make several example requests with different query types
4. Send traces to Jaeger

### Step 4: View Traces in Jaeger

Open the Jaeger UI in your browser:

```
http://localhost:16686
```

1. Select **Service**: `openai-client-example`
2. Click **Find Traces**
3. Click on a trace to see the detailed timeline

You should see traces with spans from:
- `openai-client-example` (the Python client)
- `vllm-semantic-router` (the router processing)
- Individual operations (classification, routing, etc.)

## Example Trace Visualization

A typical trace will show:

```
Trace: example_1_auto_routing (2.3s total)
├── openai.chat.completions.create (2.3s)
│ └── HTTP POST /v1/chat/completions (2.2s)
│ ├── semantic_router.request.received (2.2s)
│ │ ├── semantic_router.classification (45ms) [category=science]
│ │ ├── semantic_router.cache.lookup (3ms) [cache_miss=true]
│ │ ├── semantic_router.routing.decision (23ms) [selected=llama-3.1-70b]
│ │ └── semantic_router.upstream.request (2.1s)
│ │ └── vllm.generate (2.0s) [tokens=156]
```

## Configuration Options

### Environment Variables

The example supports these environment variables:

- **`SEMANTIC_ROUTER_URL`** - Router URL (default: `http://localhost:8000`)
- **`OTLP_ENDPOINT`** - OTLP collector endpoint (default: `http://localhost:4317`)
- **`OPENAI_API_KEY`** - API key (default: `dummy-key-for-local-testing`)

Example with custom configuration:

```bash
export SEMANTIC_ROUTER_URL="http://my-router:8000"
export OTLP_ENDPOINT="http://tempo:4317"
python openai_client_tracing.py
```

### Router Configuration

Edit `router-config.yaml` to customize tracing behavior:

```yaml
observability:
tracing:
enabled: true
exporter:
type: "otlp"
endpoint: "jaeger:4317"
sampling:
type: "always_on" # or "probabilistic"
rate: 1.0 # sample rate for probabilistic
```

**Sampling strategies:**

- **`always_on`** - Sample 100% of requests (development/debugging)
- **`always_off`** - Disable sampling (emergency)
- **`probabilistic`** - Sample a percentage (production)

**Production recommendation:**
```yaml
sampling:
type: "probabilistic"
rate: 0.1 # Sample 10% of requests
```

## Advanced Usage

### Custom Spans

Add custom spans to track specific operations:

```python
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("my_operation") as span:
span.set_attribute("custom.attribute", "value")
# Your code here
```

### Multiple Backends

Configure the router to route to different vLLM backends and see how traces flow:

```yaml
models:
- name: "llama-3.1-8b"
category: "general"
endpoints:
- name: "backend-1"
address: "http://vllm-1:8000"

- name: "llama-3.1-70b"
category: "reasoning"
endpoints:
- name: "backend-2"
address: "http://vllm-2:8000"
```

### Alternative Tracing Backends

#### Grafana Tempo

Replace Jaeger with Tempo in `docker-compose.yml`:

```yaml
services:
tempo:
image: grafana/tempo:latest
ports:
- "4317:4317"
- "3200:3200"
command: ["-config.file=/etc/tempo.yaml"]
```

Update router config:

```yaml
exporter:
endpoint: "tempo:4317"
```

#### Datadog

Use Datadog OTLP endpoint:

```yaml
exporter:
type: "otlp"
endpoint: "https://otlp.datadoghq.com"
insecure: false
```

## Troubleshooting

### Traces Not Appearing

1. **Check services are running:**
```bash
docker-compose ps
```

2. **Check router logs for tracing initialization:**
```bash
docker-compose logs semantic-router | grep -i tracing
```

3. **Verify OTLP endpoint connectivity:**
```bash
telnet localhost 4317
```

4. **Check sampling rate:**
- Ensure `always_on` for development
- Increase `rate` if using probabilistic sampling

### Connection Refused Errors

If the Python example can't connect to the router:

```bash
# Check router is accessible
curl http://localhost:8000/health

# Check Docker network
docker network inspect distributed-tracing_tracing-network
```

### High Memory Usage

If you see high memory usage:

1. **Reduce sampling rate:**
```yaml
sampling:
type: "probabilistic"
rate: 0.01 # 1% sampling
```

2. **Check batch export settings** in the application code

## Best Practices

1. **Start with stdout exporter** to verify tracing works before using OTLP
2. **Use probabilistic sampling** in production (10% is a good starting point)
3. **Add meaningful attributes** to spans for debugging
4. **Monitor exporter health** and track export failures
5. **Correlate traces with logs** using the same service name
6. **Set appropriate timeout values** for span export

## Production Deployment

For production, consider:

1. **Use TLS** for OTLP endpoint:
```yaml
exporter:
endpoint: "otlp-collector.prod.svc:4317"
insecure: false
```

2. **Tune sampling rate** based on traffic:
- High traffic: 0.01-0.1 (1-10%)
- Medium traffic: 0.1-0.5 (10-50%)
- Low traffic: 0.5-1.0 (50-100%)

3. **Use a dedicated OTLP collector** (not Jaeger directly)

4. **Set resource limits** in Kubernetes:
```yaml
resources:
limits:
memory: "2Gi"
cpu: "1000m"
```

## Additional Resources

- [OpenTelemetry Python Documentation](https://opentelemetry.io/docs/instrumentation/python/)
- [OpenAI Instrumentation](https://github.com/traceloop/openllmetry)
- [Jaeger Documentation](https://www.jaegertracing.io/docs/)
- [vLLM Semantic Router Tracing Guide](../../website/docs/tutorials/observability/distributed-tracing.md)
- [W3C Trace Context](https://www.w3.org/TR/trace-context/)

## Support

For questions or issues:
- Create an issue in the repository
- Join the `#semantic-router` channel in vLLM Slack
- Check the [troubleshooting guide](../../website/docs/troubleshooting/)

## License

Apache 2.0 - See LICENSE file in the repository root.
Loading
Loading