|
| 1 | +# Distributed Tracing Feature Summary |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This feature implements comprehensive distributed tracing support for vLLM Semantic Router using OpenTelemetry, providing enterprise-grade observability for the request processing pipeline. |
| 6 | + |
| 7 | +## Implementation Details |
| 8 | + |
| 9 | +### Components Added |
| 10 | + |
| 11 | +#### 1. Core Tracing Infrastructure (`pkg/observability/`) |
| 12 | +- **tracing.go**: OpenTelemetry SDK integration |
| 13 | + - Tracer provider initialization with OTLP and stdout exporters |
| 14 | + - Configurable sampling strategies (always_on, always_off, probabilistic) |
| 15 | + - Graceful shutdown handling |
| 16 | + - Resource attributes with service metadata |
| 17 | + |
| 18 | +- **propagation.go**: W3C Trace Context propagation |
| 19 | + - Header injection for upstream requests |
| 20 | + - Header extraction from incoming requests |
| 21 | + - Context management utilities |
| 22 | + |
| 23 | +- **tracing_test.go**: Comprehensive unit tests |
| 24 | + - Configuration validation |
| 25 | + - Span creation and attribute setting |
| 26 | + - Context propagation |
| 27 | + - Error recording |
| 28 | + |
| 29 | +#### 2. Configuration (`pkg/config/`) |
| 30 | +- Extended RouterConfig with ObservabilityConfig |
| 31 | +- TracingConfig structure with exporter, sampling, and resource settings |
| 32 | +- Environment-specific configuration examples |
| 33 | + |
| 34 | +#### 3. Instrumentation (`pkg/extproc/`) |
| 35 | +- Request headers span with trace context extraction |
| 36 | +- Classification operation spans with timing |
| 37 | +- PII detection spans with detection results |
| 38 | +- Jailbreak detection spans with security actions |
| 39 | +- Cache lookup spans with hit/miss status |
| 40 | +- Routing decision spans with model selection reasoning |
| 41 | +- Backend selection spans with endpoint information |
| 42 | +- System prompt injection spans |
| 43 | + |
| 44 | +### Span Hierarchy |
| 45 | + |
| 46 | +``` |
| 47 | +semantic_router.request.received (root) |
| 48 | +├─ semantic_router.classification |
| 49 | +├─ semantic_router.security.pii_detection |
| 50 | +├─ semantic_router.security.jailbreak_detection |
| 51 | +├─ semantic_router.cache.lookup |
| 52 | +├─ semantic_router.routing.decision |
| 53 | +├─ semantic_router.backend.selection |
| 54 | +├─ semantic_router.system_prompt.injection |
| 55 | +└─ semantic_router.upstream.request |
| 56 | +``` |
| 57 | + |
| 58 | +### Span Attributes |
| 59 | + |
| 60 | +Following OpenInference semantic conventions: |
| 61 | + |
| 62 | +**Request Metadata:** |
| 63 | +- `request.id`, `user.id`, `session.id` |
| 64 | +- `http.method`, `http.path` |
| 65 | + |
| 66 | +**Model Information:** |
| 67 | +- `model.name`, `model.provider`, `model.version` |
| 68 | +- `routing.original_model`, `routing.selected_model` |
| 69 | + |
| 70 | +**Classification:** |
| 71 | +- `category.name`, `category.confidence` |
| 72 | +- `classifier.type`, `classification.time_ms` |
| 73 | + |
| 74 | +**Security:** |
| 75 | +- `pii.detected`, `pii.types`, `pii.detection_time_ms` |
| 76 | +- `jailbreak.detected`, `jailbreak.type`, `security.action` |
| 77 | + |
| 78 | +**Routing:** |
| 79 | +- `routing.strategy`, `routing.reason` |
| 80 | +- `reasoning.enabled`, `reasoning.effort`, `reasoning.family` |
| 81 | + |
| 82 | +**Performance:** |
| 83 | +- `cache.hit`, `cache.lookup_time_ms` |
| 84 | +- `processing.time_ms` |
| 85 | + |
| 86 | +## Configuration |
| 87 | + |
| 88 | +### Minimal (Development) |
| 89 | +```yaml |
| 90 | +observability: |
| 91 | + tracing: |
| 92 | + enabled: true |
| 93 | + exporter: |
| 94 | + type: "stdout" |
| 95 | + sampling: |
| 96 | + type: "always_on" |
| 97 | + resource: |
| 98 | + service_name: "vllm-semantic-router" |
| 99 | +``` |
| 100 | +
|
| 101 | +### Production (OTLP) |
| 102 | +```yaml |
| 103 | +observability: |
| 104 | + tracing: |
| 105 | + enabled: true |
| 106 | + exporter: |
| 107 | + type: "otlp" |
| 108 | + endpoint: "jaeger:4317" |
| 109 | + insecure: false |
| 110 | + sampling: |
| 111 | + type: "probabilistic" |
| 112 | + rate: 0.1 |
| 113 | + resource: |
| 114 | + service_name: "vllm-semantic-router" |
| 115 | + service_version: "v0.1.0" |
| 116 | + deployment_environment: "production" |
| 117 | +``` |
| 118 | +
|
| 119 | +## Performance Impact |
| 120 | +
|
| 121 | +- **Always-on sampling**: ~1-2% latency increase |
| 122 | +- **10% probabilistic**: ~0.1-0.2% latency increase |
| 123 | +- **Async export**: No blocking on span export |
| 124 | +- **Batch processing**: Reduced network overhead |
| 125 | +
|
| 126 | +## Integration Points |
| 127 | +
|
| 128 | +### Current |
| 129 | +- HTTP/gRPC header propagation |
| 130 | +- Structured logging correlation |
| 131 | +- Prometheus metrics alignment |
| 132 | +
|
| 133 | +### Future (vLLM Stack) |
| 134 | +- Trace context forwarding to vLLM backends |
| 135 | +- End-to-end latency tracking |
| 136 | +- Token-level timing correlation |
| 137 | +- Unified observability dashboard |
| 138 | +
|
| 139 | +## Files Changed/Added |
| 140 | +
|
| 141 | +### Core Implementation |
| 142 | +- `src/semantic-router/pkg/observability/tracing.go` (new) |
| 143 | +- `src/semantic-router/pkg/observability/propagation.go` (new) |
| 144 | +- `src/semantic-router/pkg/observability/tracing_test.go` (new) |
| 145 | +- `src/semantic-router/pkg/config/config.go` (modified) |
| 146 | +- `src/semantic-router/pkg/extproc/request_handler.go` (modified) |
| 147 | +- `src/semantic-router/cmd/main.go` (modified) |
| 148 | + |
| 149 | +### Dependencies |
| 150 | +- `src/semantic-router/go.mod` (updated) |
| 151 | +- `src/semantic-router/go.sum` (updated) |
| 152 | + |
| 153 | +### Configuration Examples |
| 154 | +- `config/config.yaml` (updated) |
| 155 | +- `config/config.production.yaml` (new) |
| 156 | +- `config/config.development.yaml` (new) |
| 157 | + |
| 158 | +### Documentation |
| 159 | +- `website/docs/tutorials/observability/distributed-tracing.md` (new) |
| 160 | +- `website/docs/tutorials/observability/tracing-quickstart.md` (new) |
| 161 | +- `README.md` (updated) |
| 162 | + |
| 163 | +### Deployment |
| 164 | +- `deploy/docker-compose.tracing.yaml` (new) |
| 165 | +- `deploy/tracing/README.md` (new) |
| 166 | + |
| 167 | +## Testing |
| 168 | + |
| 169 | +All tests pass with coverage: |
| 170 | +```bash |
| 171 | +cd src/semantic-router |
| 172 | +go test -v ./pkg/observability |
| 173 | +# PASS: All tracing tests |
| 174 | +``` |
| 175 | + |
| 176 | +Test coverage includes: |
| 177 | +- Configuration validation |
| 178 | +- Span creation and lifecycle |
| 179 | +- Attribute setting |
| 180 | +- Error recording |
| 181 | +- Context propagation |
| 182 | +- Noop tracer fallback |
| 183 | + |
| 184 | +## Usage Examples |
| 185 | + |
| 186 | +### Enable stdout tracing |
| 187 | +```bash |
| 188 | +# Update config.yaml |
| 189 | +observability: |
| 190 | + tracing: |
| 191 | + enabled: true |
| 192 | + exporter: |
| 193 | + type: "stdout" |
| 194 | +
|
| 195 | +# Start router |
| 196 | +./semantic-router --config config.yaml |
| 197 | +
|
| 198 | +# Send request - traces printed to console |
| 199 | +``` |
| 200 | + |
| 201 | +### Deploy with Jaeger |
| 202 | +```bash |
| 203 | +# Start Jaeger |
| 204 | +docker run -d -p 4317:4317 -p 16686:16686 \ |
| 205 | + jaegertracing/all-in-one |
| 206 | +
|
| 207 | +# Configure router for OTLP |
| 208 | +# Start router |
| 209 | +# View traces at http://localhost:16686 |
| 210 | +``` |
| 211 | + |
| 212 | +## Benefits |
| 213 | + |
| 214 | +1. **Enhanced Debugging** |
| 215 | + - Trace individual request flows |
| 216 | + - Identify failure points quickly |
| 217 | + - Understand complex routing logic |
| 218 | + |
| 219 | +2. **Performance Optimization** |
| 220 | + - Pinpoint bottlenecks with millisecond precision |
| 221 | + - Compare operation timings |
| 222 | + - Analyze cache effectiveness |
| 223 | + |
| 224 | +3. **Security Monitoring** |
| 225 | + - Track PII detection operations |
| 226 | + - Monitor jailbreak attempts |
| 227 | + - Audit security decisions |
| 228 | + |
| 229 | +4. **Production Readiness** |
| 230 | + - Industry-standard observability |
| 231 | + - Integration with existing tools |
| 232 | + - Minimal performance overhead |
| 233 | + |
| 234 | +## Next Steps |
| 235 | + |
| 236 | +1. **vLLM Integration** |
| 237 | + - Forward trace context to vLLM backends |
| 238 | + - Correlate router and engine spans |
| 239 | + - End-to-end latency tracking |
| 240 | + |
| 241 | +2. **Advanced Features** |
| 242 | + - Custom exporters (Datadog, New Relic) |
| 243 | + - Dynamic sampling rate adjustment |
| 244 | + - Trace-based alerting |
| 245 | + - SLO tracking |
| 246 | + |
| 247 | +3. **Visualization** |
| 248 | + - Pre-built Grafana dashboards |
| 249 | + - Trace-to-metrics correlation |
| 250 | + - Custom trace queries |
| 251 | + |
| 252 | +## References |
| 253 | + |
| 254 | +- [OpenTelemetry Documentation](https://opentelemetry.io/docs/) |
| 255 | +- [OpenInference Semantic Conventions](https://github.com/Arize-ai/openinference) |
| 256 | +- [Jaeger Documentation](https://www.jaegertracing.io/docs/) |
| 257 | +- [W3C Trace Context Specification](https://www.w3.org/TR/trace-context/) |
0 commit comments