-
Notifications
You must be signed in to change notification settings - Fork 268
Add OpenTelemetry Distributed Tracing for Fine-Grained Observability #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenTelemetry Distributed Tracing for Fine-Grained Observability #322
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Co-authored-by: rootfs <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: rootfs <[email protected]>
Co-authored-by: rootfs <[email protected]>
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
@copilot fix the doc errors: 7:18:06 PM: Exhaustive list of all broken links found: |
Co-authored-by: rootfs <[email protected]>
|
|
||
| require ( | ||
| github.com/beorn7/perks v1.0.1 // indirect | ||
| github.com/cenkalti/backoff/v5 v5.0.3 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERROR: go.mod or go.sum files are not tidy in src/semantic-router. Please run 'go mod tidy' in src/semantic-router directory and commit the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot ERROR: go.mod or go.sum files are not tidy in src/semantic-router. Please run 'go mod tidy' in src/semantic-router directory and commit the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit c054384. Ran go mod tidy which properly organized the dependencies - moved OpenTelemetry packages from indirect to direct dependencies and removed unused imports.
| @@ -0,0 +1,110 @@ | |||
| # Quick Start: Distributed Tracing | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 9cde50a. Added blank lines between list items and code blocks to comply with markdown linting rules (MD031/blanks-around-fences).
Co-authored-by: rootfs <[email protected]>
TRACING_FEATURE.md
Outdated
| @@ -0,0 +1,257 @@ | |||
| # Distributed Tracing Feature Summary | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot this file is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in commit 9cde50a.
Co-authored-by: rootfs <[email protected]>
|
@copilot test panicked in createspan: https://github.com/vllm-project/semantic-router/actions/runs/18224466353/job/51892070297?pr=322 |
Co-authored-by: rootfs <[email protected]>
Fixed in commit 06643c0. The OTLP exporter was using |
|
@copilot the panick stack is: [PANICKED] Test Panicked cannot create context from nil parent Full Stack Trace |
Co-authored-by: rootfs <[email protected]>
Fixed in commit 8da86ab. The panic was caused by StartSpan being called with a nil context (when RequestContext.TraceContext was not initialized). Added a nil check to use context.Background() when ctx is nil, preventing the "cannot create context from nil parent" panic. Also added TestStartSpanWithNilContext test case to ensure proper handling. |
Signed-off-by: Huamin Chen <[email protected]>
…llm-project#322) * Initial plan * Add OpenTelemetry tracing infrastructure and basic instrumentation Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing instrumentation and tests Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing documentation and deployment examples Co-authored-by: rootfs <[email protected]> * Update README and add feature summary documentation Co-authored-by: rootfs <[email protected]> * Fix broken documentation link in tracing quickstart guide Co-authored-by: rootfs <[email protected]> * Run go mod tidy to fix go.mod and go.sum Co-authored-by: rootfs <[email protected]> * Fix markdown lint errors and remove TRACING_FEATURE.md Co-authored-by: rootfs <[email protected]> * Fix OTLP exporter to connect asynchronously to prevent test panics Co-authored-by: rootfs <[email protected]> * Fix StartSpan to handle nil context gracefully Co-authored-by: rootfs <[email protected]> * fix lint error Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>
…llm-project#322) * Initial plan * Add OpenTelemetry tracing infrastructure and basic instrumentation Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing instrumentation and tests Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing documentation and deployment examples Co-authored-by: rootfs <[email protected]> * Update README and add feature summary documentation Co-authored-by: rootfs <[email protected]> * Fix broken documentation link in tracing quickstart guide Co-authored-by: rootfs <[email protected]> * Run go mod tidy to fix go.mod and go.sum Co-authored-by: rootfs <[email protected]> * Fix markdown lint errors and remove TRACING_FEATURE.md Co-authored-by: rootfs <[email protected]> * Fix OTLP exporter to connect asynchronously to prevent test panics Co-authored-by: rootfs <[email protected]> * Fix StartSpan to handle nil context gracefully Co-authored-by: rootfs <[email protected]> * fix lint error Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>
…llm-project#322) * Initial plan * Add OpenTelemetry tracing infrastructure and basic instrumentation Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing instrumentation and tests Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing documentation and deployment examples Co-authored-by: rootfs <[email protected]> * Update README and add feature summary documentation Co-authored-by: rootfs <[email protected]> * Fix broken documentation link in tracing quickstart guide Co-authored-by: rootfs <[email protected]> * Run go mod tidy to fix go.mod and go.sum Co-authored-by: rootfs <[email protected]> * Fix markdown lint errors and remove TRACING_FEATURE.md Co-authored-by: rootfs <[email protected]> * Fix OTLP exporter to connect asynchronously to prevent test panics Co-authored-by: rootfs <[email protected]> * Fix StartSpan to handle nil context gracefully Co-authored-by: rootfs <[email protected]> * fix lint error Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>
…llm-project#322) * Initial plan * Add OpenTelemetry tracing infrastructure and basic instrumentation Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing instrumentation and tests Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing documentation and deployment examples Co-authored-by: rootfs <[email protected]> * Update README and add feature summary documentation Co-authored-by: rootfs <[email protected]> * Fix broken documentation link in tracing quickstart guide Co-authored-by: rootfs <[email protected]> * Run go mod tidy to fix go.mod and go.sum Co-authored-by: rootfs <[email protected]> * Fix markdown lint errors and remove TRACING_FEATURE.md Co-authored-by: rootfs <[email protected]> * Fix OTLP exporter to connect asynchronously to prevent test panics Co-authored-by: rootfs <[email protected]> * Fix StartSpan to handle nil context gracefully Co-authored-by: rootfs <[email protected]> * fix lint error Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

Summary
Fixed StartSpan to handle nil context gracefully:
Original prompt
This section details on the original issue you should resolve
<issue_title>Distributed Tracing Support for Fine-Grained Observability</issue_title>
<issue_description>### Is your feature request related to a problem? Please describe.
Currently, vLLM Semantic Router provides basic observability through Prometheus metrics and structured logging. However, these approaches have limitations when it comes to understanding the complete request lifecycle across distributed components:
This becomes especially problematic when:
Describe the solution you'd like
Implement comprehensive distributed tracing support using industry-standard OpenTelemetry instrumentation, leveraging either:
Key Implementation Requirements:
1. Core Tracing Infrastructure
2. Instrumentation Points
Instrument the following critical paths with spans:
Request Processing Pipeline:
semantic_router.request.received- Entry point spansemantic_router.classification- Category classification with model name and confidencesemantic_router.security.pii_detection- PII detection with resultssemantic_router.security.jailbreak_detection- Jailbreak detection with resultssemantic_router.cache.lookup- Semantic cache operationssemantic_router.routing.decision- Model selection logic with reasoningsemantic_router.backend.selection- Endpoint selectionsemantic_router.upstream.request- Forwarding to vLLM backendsemantic_router.response.processing- Response handlingSpan Attributes (following OpenInference conventions):
request_id,user_id,session_idmodel.name,model.provider,model.versioncategory.name,category.confidence,classifier.typerouting.strategy,routing.reason,original_model,selected_modelpii.detected,jailbreak.detected,security.actiontoken.count.prompt,token.count.completion,cache.hitreasoning.enabled,reasoning.effort,reasoning.family3. Integration with vLLM Production Stack
4. Configuration
Add tracing configuration to
config.yaml:5. Visualization and Analysis
6. Performance Considerations
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.