-
Notifications
You must be signed in to change notification settings - Fork 238
feat: Modern Dashboard MVP #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
Focusing on the o11y stack for Local and Docker Compose paths at the moment. This still needs more iterations, so I’ll keep it as a draft for now. Apologies for the complexity and the number of files affected — I’m working hard to refine it and make the solution as elegant as possible. I’d really appreciate any suggestions or feedback from the community! |
coooooool! i think maybe first of the priority is to manage the install and configuration of vLLM-SR easily? And then observability like embedded the grafana dashboard and jaeger tracing. |
@Xunzhuo Thanks! Agreed — install and config are key. I’ve worked on observability for a while, made it easier with a quick MVP dashboard. Config in particular is complex and requires more careful iterations, but I’ll keep working on it. |
Cool, i need to point out that the dashboard for vLLM-SR is not something like Grafana Dashboard, the goal is to build the admin console for managing the vLLM-SR. |
but no worries, keep moving forward, nice work! |
Got u! |
take this as an example : ) url: https://www.demo.litellm.ai/ui |
inspiring! This will completely change the UX |
Yep, keep doing your magics 🪄 |
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: JaredforReal <[email protected]>
3827332
to
e6ff731
Compare
can u share screenshot |
![]() @Xunzhuo, I have a little problem using openwebui-pipe in Dashboard. Can't find a + or import right here. |
The screenshot in PR description is updated. |
u need to install openweb ui pipline and add pipeline address |
will work on it |
can you check the log of semantic router? |
(base) jared@Jared:~/vllm-project/semantic-router$ curl -v http://localhost:11434/v1/chat/completions -H "Content-Type: application/jso
n" -d '{
"messages": [{"role": "user", "content": "Hi! What is 2 + 2?"}],
"stream": false
}'
* Uses proxy env variable no_proxy == '172.31.*,172.30.*,172.29.*,172.28.*,172.27.*,172.26.*,172.25.*,172.24.*,172.23.*,172.22.*,172.21.*,172.20.*,172.19.*,172.18.*,172.17.*,172.16.*,10.*,192.168.*,127.*,localhost,<local>'
* Trying 127.0.0.1:11434...
* Connected to localhost (127.0.0.1) port 11434 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:11434
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 94
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Sat, 11 Oct 2025 08:46:49 GMT
< server: uvicorn
< content-length: 597
< content-type: application/json
<
* Connection #0 to host localhost left intact
{"id":"chatcmpl-f0db6df8b3de4bad92f80f38a81a4a9d","object":"chat.completion","created":1760172414,"model":"phi4","choices":[{"index":0,"message":{"role":"assistant","content":"2 + 2 equals 4.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":17,"total_tokens":26,"completion_tokens":9,"prompt_tokens_details":null},"prompt_logprobs":null," (base) jared@Jared:~/vllm-project/semantic-router$ docker exec -it semantic-router sh -c 'curl -sS http://172.17.0.1:11434/health' docker exec -it semantic-router sh -c 'curl -sS http://172.17.0.1:11434/health'
curl: (7) Failed to connect to 172.17.0.1 port 11434: Connection refused (base) jared@Jared:~/vllm-project/semantic-router$ curl -v http://localhost:8801/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Hi! What is 2 + 2?"}],"stream":false}'
* Uses proxy env variable no_proxy == '172.31.*,172.30.*,172.29.*,172.28.*,172.27.*,172.26.*,172.25.*,172.24.*,172.23.*,172.22.*,172.21.*,172.20.*,172.19.*,172.18.*,172.17.*,172.16.*,10.*,192.168.*,127.*,localhost,<local>'
* Trying 127.0.0.1:8801...
* Connected to localhost (127.0.0.1) port 8801 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8801
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 91
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< content-length: 167
< content-type: text/plain
< date: Sat, 11 Oct 2025 08:47:39 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed con (base) jared@Jared:~/vllm-project/semantic-router$ docker logs semantic-router 2>&1 | tail -30
{"level":"info","ts":"2025-10-11T08:40:03.065611652Z","caller":"observability/logging.go:140","msg":"Starting insecure LLM Router ExtProc server on port 50051..."}
{"level":"info","ts":"2025-10-11T08:40:03.065725968Z","caller":"observability/logging.go:140","msg":"Found global classification service on attempt 1/5"}
{"level":"info","ts":"2025-10-11T08:40:03.066670389Z","caller":"observability/logging.go:140","msg":"System prompt configuration endpoints disabled for security"}
{"level":"info","ts":"2025-10-11T08:40:03.066646981Z","caller":"observability/logging.go:136","msg":"config_watcher_error","stage":"create_watcher","error":"too many open files","event":"config_watcher_error"}
{"level":"info","ts":"2025-10-11T08:40:03.066725679Z","caller":"observability/logging.go:140","msg":"Classification API server listening on port 8080"}
{"level":"info","ts":"2025-10-11T08:40:24.532769936Z","caller":"observability/logging.go:140","msg":"Started processing a new request"}
{"level":"info","ts":"2025-10-11T08:40:24.536033144Z","caller":"observability/logging.go:140","msg":"Received request headers"}
{"level":"info","ts":"2025-10-11T08:40:24.537751317Z","caller":"observability/logging.go:140","msg":"Received request body {\n \"model\": \"auto\",\n \"messages\": [{\"role\": \"user\", \"content\": \"Hi! What is 2 + 2?\"}],\n \"stream\": false\n }"}
{"level":"info","ts":"2025-10-11T08:40:24.538240988Z","caller":"observability/logging.go:140","msg":"Original model: auto"}
{"level":"info","ts":"2025-10-11T08:40:24.83565221Z","caller":"observability/logging.go:140","msg":"Jailbreak classification result: {0 0.9999995}"}
{"level":"info","ts":"2025-10-11T08:40:24.835747565Z","caller":"observability/logging.go:140","msg":"BENIGN: 'benign' (confidence: 1.000, threshold: 0.700)"}
{"level":"info","ts":"2025-10-11T08:40:24.835761373Z","caller":"observability/logging.go:140","msg":"No jailbreak detected in request content"}
{"level":"info","ts":"2025-10-11T08:40:25.402714953Z","caller":"observability/logging.go:140","msg":"Using Auto Model Selection"}
{"level":"info","ts":"2025-10-11T08:40:25.715911914Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9579"}
{"level":"info","ts":"2025-10-11T08:40:25.715991793Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-11T08:40:25.716009374Z","caller":"observability/logging.go:140","msg":"Selected model phi4 for category math with score 0.6000"}
{"level":"info","ts":"2025-10-11T08:40:25.988612972Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9579"}
{"level":"info","ts":"2025-10-11T08:40:25.98871329Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-11T08:40:25.988725742Z","caller":"observability/logging.go:140","msg":"No PII policy found for model phi4, allowing request"}
{"level":"info","ts":"2025-10-11T08:40:25.988733235Z","caller":"observability/logging.go:140","msg":"Routing to model: phi4"}
{"level":"info","ts":"2025-10-11T08:40:26.234778194Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9579, entropy_available=true"}
{"level":"info","ts":"2025-10-11T08:40:26.234937043Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math), reasoning_decision: use=true, confidence=0.910, reason=very_low_uncertainty_trust_classification"}
{"level":"info","ts":"2025-10-11T08:40:26.234957925Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision: category='math', confidence=0.958, use_reasoning=true, reason=very_low_uncertainty_trust_classification, strategy=trust_top_category"}
{"level":"info","ts":"2025-10-11T08:40:26.234977547Z","caller":"observability/logging.go:140","msg":"Top predicted categories: [{math 0.9578654} {chemistry 0.011327983} {psychology 0.007727393}]"}
{"level":"info","ts":"2025-10-11T08:40:26.234989112Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision for this query: true on [phi4] model (confidence: 0.910, reason: very_low_uncertainty_trust_classification)"}
{"level":"info","ts":"2025-10-11T08:40:26.235015167Z","caller":"observability/logging.go:140","msg":"Selected endpoint address: 172.17.0.1:11434 for model: phi4"}
{"level":"info","ts":"2025-10-11T08:40:26.238834164Z","caller":"observability/logging.go:140","msg":"No reasoning support for model: phi4 (no reasoning family configured)"}
{"level":"info","ts":"2025-10-11T08:40:26.238939029Z","caller":"observability/logging.go:140","msg":"Use new model: phi4"}
{"level":"info","ts":"2025-10-11T08:40:26.239001262Z","caller":"observability/logging.go:136","msg":"routing_decision","reasoning_effort":"high","event":"routing_decision","request_id":"4bc3c450-065b-4cbe-bc21-9888ea6bb84f","selected_model":"phi4","category":"math","selected_endpoint":"172.17.0.1:11434","routing_latency_ms":1701,"reason_code":"auto_routing","original_model":"auto","reasoning_enabled":true}
{"level":"info","ts":"2025-10-11T08:40:26.24164372Z","caller":"observability/logging.go:140","msg":"Stream ended gracefully"} I can access the vLLM port directly, but I cannot do so through semantic-router in Docker Compose. |
nope, maybe it should be docker network issue, like you debugging, you cannot access the 172.17.0.1:11434, make sure the docker network is configured approperly |
try to make sure the container can access the external ip, like doing manually curl, when it is passed, the envoy should be accessed that as well |
Signed-off-by: JaredforReal <[email protected]>
Configuration Viewer Implemented. The MVP version based on Docker Compose is roughly finished. I'll update troubleshooting when I figure out the docker network problem |
wonderful! |
can we separate the configuration to different sub pages?
|
Then you will get a lot of cool columns at the left side |
Is Observability meaning Monitoring? |
Nope it is one related configuration, monitoring is the dashboard for grafana and tracing(later) |
In this PR the separated configuration is just to read, in later, we need to support configure. |
Get this now |
Signed-off-by: JaredforReal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us 🚀🚀
Thanks! 😃 |
@JaredforReal do u have slack? |
yes |
@JaredforReal this is really cool! thanks for making this happen so quickly! As a followup, would you mind adding an auth factory to support additional auth? |
@rootfs got u |
sorry to be chatty :D would be great to customize the system prompt injection on the UI too. |
![]() |
yes @JaredforReal, the injection you saw in open webui, it is user defined sys prompt, what vllm sr offered is automatically injection based on the domain and intent, it is configurable, that is what I mentioned above, all of the configurations should be configured in the console not just to view it |
In # Categories with system prompts for different domains
categories:
- name: math
description: "Mathematical queries, calculations, and problem solving"
system_prompt: "You are a mathematics expert. Always provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way. When solving equations, break down each step and explain the reasoning behind it."
model_scores:
- model: openai/gpt-oss-20b
score: 0.9
use_reasoning: true
- name: computer science
description: "Programming, algorithms, software engineering, and technical topics"
system_prompt: "You are a computer science expert with deep knowledge of algorithms, data structures, programming languages, and software engineering best practices. Provide clear, practical solutions with well-commented code examples when helpful. Always consider performance, readability, and maintainability."
model_scores:
- model: openai/gpt-oss-20b
score: 0.8
use_reasoning: true I got what you said. It's really a good idea, will work on it! @Xunzhuo @rootfs |
What type of PR is this?
feat: Modern Dashboard MVP
What this PR does / why we need it:
make docker-compose-up
to start full stack(semantic-router + envoy + grafana + prometheus + dashboard + openwebui).Which issue(s) this PR fixes:
Fixes #325
Current Progress: Not functional yet.


