Skip to content

Conversation

JaredforReal
Copy link
Collaborator

@JaredforReal JaredforReal commented Oct 10, 2025

What type of PR is this?
feat: Modern Dashboard MVP

What this PR does / why we need it:

  • React + Typescript SPA Frontend to provide Modern UI and Dark/Light theme.
  • Using iframe to integrate Grafana data panels and OpenWeb UI Playground.
  • make docker-compose-up to start full stack(semantic-router + envoy + grafana + prometheus + dashboard + openwebui).
  • Config Viewer MVP

Which issue(s) this PR fixes:
Fixes #325

Current Progress: Not functional yet.
image
image
image

Copy link

netlify bot commented Oct 10, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit bdae576
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68ea3baf2aee1e0008a7a50a
😎 Deploy Preview https://deploy-preview-388--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Oct 10, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • dashboard/.dockerignore
  • dashboard/README.md
  • dashboard/backend/.gitkeep
  • dashboard/backend/Dockerfile
  • dashboard/backend/go.mod
  • dashboard/backend/go.sum
  • dashboard/backend/main.go
  • dashboard/deploy/kubernetes/.gitkeep
  • dashboard/deploy/kubernetes/deployment.yaml
  • dashboard/deploy/local/.gitkeep
  • dashboard/frontend/index.html
  • dashboard/frontend/package-lock.json
  • dashboard/frontend/package.json
  • dashboard/frontend/public/vllm.png
  • dashboard/frontend/src/App.tsx
  • dashboard/frontend/src/components/ConfigNav.module.css
  • dashboard/frontend/src/components/ConfigNav.tsx
  • dashboard/frontend/src/components/Layout.module.css
  • dashboard/frontend/src/components/Layout.tsx
  • dashboard/frontend/src/index.css
  • dashboard/frontend/src/main.tsx
  • dashboard/frontend/src/pages/ConfigPage.module.css
  • dashboard/frontend/src/pages/ConfigPage.tsx
  • dashboard/frontend/src/pages/MonitoringPage.module.css
  • dashboard/frontend/src/pages/MonitoringPage.tsx
  • dashboard/frontend/src/pages/PlaygroundPage.module.css
  • dashboard/frontend/src/pages/PlaygroundPage.tsx
  • dashboard/frontend/src/vite-env.d.ts
  • dashboard/frontend/tsconfig.json
  • dashboard/frontend/tsconfig.node.json
  • dashboard/frontend/vite.config.ts
  • .gitignore
  • Dockerfile.extproc

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/docker-compose/docker-compose.yml

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/linter.mk
  • tools/openwebui-pipe/vllm_semantic_router_pipe.py

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@JaredforReal
Copy link
Collaborator Author

JaredforReal commented Oct 10, 2025

Focusing on the o11y stack for Local and Docker Compose paths at the moment. This still needs more iterations, so I’ll keep it as a draft for now. Apologies for the complexity and the number of files affected — I’m working hard to refine it and make the solution as elegant as possible. I’d really appreciate any suggestions or feedback from the community!

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 10, 2025

coooooool! i think maybe first of the priority is to manage the install and configuration of vLLM-SR easily? And then observability like embedded the grafana dashboard and jaeger tracing.

@Xunzhuo Xunzhuo added this to the v0.1 milestone Oct 10, 2025
@JaredforReal
Copy link
Collaborator Author

JaredforReal commented Oct 10, 2025

coooooool! i think maybe first of the priority is to manage the install and configuration of vLLM-SR easily? And then observability like embedded the grafana dashboard and jaeger tracing.

@Xunzhuo Thanks! Agreed — install and config are key. I’ve worked on observability for a while, made it easier with a quick MVP dashboard. Config in particular is complex and requires more careful iterations, but I’ll keep working on it.

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 10, 2025

Cool, i need to point out that the dashboard for vLLM-SR is not something like Grafana Dashboard, the goal is to build the admin console for managing the vLLM-SR.

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 10, 2025

but no worries, keep moving forward, nice work!

@JaredforReal
Copy link
Collaborator Author

Got u!

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 10, 2025

take this as an example : )

url: https://www.demo.litellm.ai/ui
user: admin
pwd: sk-1234

@JaredforReal
Copy link
Collaborator Author

take this as an example : )

url: https://www.demo.litellm.ai/ui user: admin pwd: sk-1234

inspiring! This will completely change the UX

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 10, 2025

Yep, keep doing your magics 🪄

@JaredforReal JaredforReal marked this pull request as ready for review October 11, 2025 05:26
@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

can u share screenshot

@JaredforReal
Copy link
Collaborator Author

image

@Xunzhuo, I have a little problem using openwebui-pipe in Dashboard. Can't find a + or import right here.

@JaredforReal
Copy link
Collaborator Author

can u share screenshot

The screenshot in PR description is updated.

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

u need to install openweb ui pipline and add pipeline address

@JaredforReal
Copy link
Collaborator Author

will work on it

@JaredforReal JaredforReal marked this pull request as draft October 11, 2025 05:37
@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

can you check the log of semantic router?

@JaredforReal
Copy link
Collaborator Author

JaredforReal commented Oct 11, 2025

(base) jared@Jared:~/vllm-project/semantic-router$ curl -v http://localhost:11434/v1/chat/completions   -H "Content-Type: application/jso
n"   -d '{
    "messages": [{"role": "user", "content": "Hi! What is 2 + 2?"}],
    "stream": false                                                 
  }'
* Uses proxy env variable no_proxy == '172.31.*,172.30.*,172.29.*,172.28.*,172.27.*,172.26.*,172.25.*,172.24.*,172.23.*,172.22.*,172.21.*,172.20.*,172.19.*,172.18.*,172.17.*,172.16.*,10.*,192.168.*,127.*,localhost,<local>'
*   Trying 127.0.0.1:11434...
* Connected to localhost (127.0.0.1) port 11434 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:11434
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 94
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Sat, 11 Oct 2025 08:46:49 GMT
< server: uvicorn
< content-length: 597
< content-type: application/json
< 
* Connection #0 to host localhost left intact
{"id":"chatcmpl-f0db6df8b3de4bad92f80f38a81a4a9d","object":"chat.completion","created":1760172414,"model":"phi4","choices":[{"index":0,"message":{"role":"assistant","content":"2 + 2 equals 4.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":17,"total_tokens":26,"completion_tokens":9,"prompt_tokens_details":null},"prompt_logprobs":null,"
(base) jared@Jared:~/vllm-project/semantic-router$ docker exec -it semantic-router sh -c 'curl -sS http://172.17.0.1:11434/health'                            docker exec -it semantic-router sh -c 'curl -sS http://172.17.0.1:11434/health'
curl: (7) Failed to connect to 172.17.0.1 port 11434: Connection refused
(base) jared@Jared:~/vllm-project/semantic-router$ curl -v http://localhost:8801/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Hi! What is 2 + 2?"}],"stream":false}'
* Uses proxy env variable no_proxy == '172.31.*,172.30.*,172.29.*,172.28.*,172.27.*,172.26.*,172.25.*,172.24.*,172.23.*,172.22.*,172.21.*,172.20.*,172.19.*,172.18.*,172.17.*,172.16.*,10.*,192.168.*,127.*,localhost,<local>'
*   Trying 127.0.0.1:8801...
* Connected to localhost (127.0.0.1) port 8801 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: localhost:8801
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 91
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< content-length: 167
< content-type: text/plain
< date: Sat, 11 Oct 2025 08:47:39 GMT
< server: envoy
< 
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed con
(base) jared@Jared:~/vllm-project/semantic-router$ docker logs semantic-router 2>&1 | tail -30
{"level":"info","ts":"2025-10-11T08:40:03.065611652Z","caller":"observability/logging.go:140","msg":"Starting insecure LLM Router ExtProc server on port 50051..."}
{"level":"info","ts":"2025-10-11T08:40:03.065725968Z","caller":"observability/logging.go:140","msg":"Found global classification service on attempt 1/5"}
{"level":"info","ts":"2025-10-11T08:40:03.066670389Z","caller":"observability/logging.go:140","msg":"System prompt configuration endpoints disabled for security"}
{"level":"info","ts":"2025-10-11T08:40:03.066646981Z","caller":"observability/logging.go:136","msg":"config_watcher_error","stage":"create_watcher","error":"too many open files","event":"config_watcher_error"}
{"level":"info","ts":"2025-10-11T08:40:03.066725679Z","caller":"observability/logging.go:140","msg":"Classification API server listening on port 8080"}
{"level":"info","ts":"2025-10-11T08:40:24.532769936Z","caller":"observability/logging.go:140","msg":"Started processing a new request"}
{"level":"info","ts":"2025-10-11T08:40:24.536033144Z","caller":"observability/logging.go:140","msg":"Received request headers"}
{"level":"info","ts":"2025-10-11T08:40:24.537751317Z","caller":"observability/logging.go:140","msg":"Received request body {\n    \"model\": \"auto\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hi! What is 2 + 2?\"}],\n    \"stream\": false\n  }"}
{"level":"info","ts":"2025-10-11T08:40:24.538240988Z","caller":"observability/logging.go:140","msg":"Original model: auto"}
{"level":"info","ts":"2025-10-11T08:40:24.83565221Z","caller":"observability/logging.go:140","msg":"Jailbreak classification result: {0 0.9999995}"}
{"level":"info","ts":"2025-10-11T08:40:24.835747565Z","caller":"observability/logging.go:140","msg":"BENIGN: 'benign' (confidence: 1.000, threshold: 0.700)"}
{"level":"info","ts":"2025-10-11T08:40:24.835761373Z","caller":"observability/logging.go:140","msg":"No jailbreak detected in request content"}
{"level":"info","ts":"2025-10-11T08:40:25.402714953Z","caller":"observability/logging.go:140","msg":"Using Auto Model Selection"}
{"level":"info","ts":"2025-10-11T08:40:25.715911914Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9579"}
{"level":"info","ts":"2025-10-11T08:40:25.715991793Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-11T08:40:25.716009374Z","caller":"observability/logging.go:140","msg":"Selected model phi4 for category math with score 0.6000"}
{"level":"info","ts":"2025-10-11T08:40:25.988612972Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9579"}
{"level":"info","ts":"2025-10-11T08:40:25.98871329Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-11T08:40:25.988725742Z","caller":"observability/logging.go:140","msg":"No PII policy found for model phi4, allowing request"}
{"level":"info","ts":"2025-10-11T08:40:25.988733235Z","caller":"observability/logging.go:140","msg":"Routing to model: phi4"}
{"level":"info","ts":"2025-10-11T08:40:26.234778194Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9579, entropy_available=true"}
{"level":"info","ts":"2025-10-11T08:40:26.234937043Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math), reasoning_decision: use=true, confidence=0.910, reason=very_low_uncertainty_trust_classification"}
{"level":"info","ts":"2025-10-11T08:40:26.234957925Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision: category='math', confidence=0.958, use_reasoning=true, reason=very_low_uncertainty_trust_classification, strategy=trust_top_category"}
{"level":"info","ts":"2025-10-11T08:40:26.234977547Z","caller":"observability/logging.go:140","msg":"Top predicted categories: [{math 0.9578654} {chemistry 0.011327983} {psychology 0.007727393}]"}
{"level":"info","ts":"2025-10-11T08:40:26.234989112Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision for this query: true on [phi4] model (confidence: 0.910, reason: very_low_uncertainty_trust_classification)"}
{"level":"info","ts":"2025-10-11T08:40:26.235015167Z","caller":"observability/logging.go:140","msg":"Selected endpoint address: 172.17.0.1:11434 for model: phi4"}
{"level":"info","ts":"2025-10-11T08:40:26.238834164Z","caller":"observability/logging.go:140","msg":"No reasoning support for model: phi4 (no reasoning family configured)"}
{"level":"info","ts":"2025-10-11T08:40:26.238939029Z","caller":"observability/logging.go:140","msg":"Use new model: phi4"}
{"level":"info","ts":"2025-10-11T08:40:26.239001262Z","caller":"observability/logging.go:136","msg":"routing_decision","reasoning_effort":"high","event":"routing_decision","request_id":"4bc3c450-065b-4cbe-bc21-9888ea6bb84f","selected_model":"phi4","category":"math","selected_endpoint":"172.17.0.1:11434","routing_latency_ms":1701,"reason_code":"auto_routing","original_model":"auto","reasoning_enabled":true}
{"level":"info","ts":"2025-10-11T08:40:26.24164372Z","caller":"observability/logging.go:140","msg":"Stream ended gracefully"}

I can access the vLLM port directly, but I cannot do so through semantic-router in Docker Compose.
I'm using phi4 and Qwen3-0.6B, is it a reasoning family problem?

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

nope, maybe it should be docker network issue, like you debugging, you cannot access the 172.17.0.1:11434, make sure the docker network is configured approperly

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

try to make sure the container can access the external ip, like doing manually curl, when it is passed, the envoy should be accessed that as well

Signed-off-by: JaredforReal <[email protected]>
@JaredforReal
Copy link
Collaborator Author

Configuration Viewer Implemented. The MVP version based on Docker Compose is roughly finished. I'll update troubleshooting when I figure out the docker network problem

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

wonderful!

@JaredforReal
Copy link
Collaborator Author

It can be merged now. I will create a follow-up issue to ask for suggestions, show what I am struggling with, and discuss future plans.
WDYT?@Xunzhuo @rootfs

@JaredforReal JaredforReal marked this pull request as ready for review October 11, 2025 10:34
@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

can we separate the configuration to different sub pages?

  1. Models:
    1. user defined model
    2. user defined endpoints
  2. Prompt Guard:
    1. configuration for pii modernBERT
    2. configuration for jailbreak modernBERT
  3. Similarity Cache:
    1. configuration for similarly BERT
  4. Intelligent Routing:
    1. configuration for categories, with if enable reasoning, if injected system prompt
    2. Reasoning Family
    3. Configurations for Classify BERT Model(in-tree, out-tree)
  5. Tools Selection:
    1. Configurations for Tools
    2. Tools DB
  6. Observability

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

Then you will get a lot of cool columns at the left side

@JaredforReal
Copy link
Collaborator Author

Is Observability meaning Monitoring?

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

Nope it is one related configuration, monitoring is the dashboard for grafana and tracing(later)

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

In this PR the separated configuration is just to read, in later, we need to support configure.

@JaredforReal
Copy link
Collaborator Author

Nope it is one related configuration, monitoring is the dashboard for grafana and tracing(later)

Get this now

@JaredforReal
Copy link
Collaborator Author

PixPin_2025-10-11_19-10-35

@Xunzhuo

Signed-off-by: JaredforReal <[email protected]>
Copy link
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us 🚀🚀

@Xunzhuo Xunzhuo merged commit ed81750 into vllm-project:main Oct 11, 2025
16 of 17 checks passed
@JaredforReal
Copy link
Collaborator Author

JaredforReal commented Oct 11, 2025

Thanks! 😃

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

@JaredforReal do u have slack?

@JaredforReal
Copy link
Collaborator Author

yes

@rootfs
Copy link
Collaborator

rootfs commented Oct 11, 2025

@JaredforReal this is really cool! thanks for making this happen so quickly! As a followup, would you mind adding an auth factory to support additional auth?

@JaredforReal
Copy link
Collaborator Author

@rootfs got u

@rootfs
Copy link
Collaborator

rootfs commented Oct 11, 2025

sorry to be chatty :D

would be great to customize the system prompt injection on the UI too.

@JaredforReal
Copy link
Collaborator Author

image we can inject system prompt in OpenWeb UI. Do we need another one? @rootfs

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 11, 2025

yes @JaredforReal, the injection you saw in open webui, it is user defined sys prompt, what vllm sr offered is automatically injection based on the domain and intent, it is configurable, that is what I mentioned above, all of the configurations should be configured in the console not just to view it

@JaredforReal
Copy link
Collaborator Author

JaredforReal commented Oct 11, 2025

In config/examples/system_prompt_example.yaml

# Categories with system prompts for different domains
categories:
  - name: math
    description: "Mathematical queries, calculations, and problem solving"
    system_prompt: "You are a mathematics expert. Always provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way. When solving equations, break down each step and explain the reasoning behind it."
    model_scores:
      - model: openai/gpt-oss-20b
        score: 0.9
        use_reasoning: true

  - name: computer science
    description: "Programming, algorithms, software engineering, and technical topics"
    system_prompt: "You are a computer science expert with deep knowledge of algorithms, data structures, programming languages, and software engineering best practices. Provide clear, practical solutions with well-commented code examples when helpful. Always consider performance, readability, and maintainability."
    model_scores:
      - model: openai/gpt-oss-20b
        score: 0.8
        use_reasoning: true

I got what you said. It's really a good idea, will work on it! @Xunzhuo @rootfs

joyful-ii-V-I pushed a commit to joyful-ii-V-I/semantic-router that referenced this pull request Oct 13, 2025
@JaredforReal JaredforReal deleted the dashboard branch October 14, 2025 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Modern Dashboard MVP Plan

3 participants