You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[2025/10/16] We established the [vLLM Semantic Router Youtube Channel](https://www.youtube.com/@vLLMSemanticRouter) ✨.
22
+
-[2025/10/15] We announced the [vLLM Semantic Router Dashboard](https://www.youtube.com/watch?v=E2IirN8PsFw) 🚀.
23
+
-[2025/10/12] Our paper [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
24
+
-[2025/10/08] We announced the integration with [vLLM Production Stack](https://github.com/vllm-project/production-stack) Tean 👋.
25
+
-[2025/10/01] We supported to deploy on [Kubernetes](https://vllm-semantic-router.com/docs/installation/kubernetes/) 🌊.
26
+
-[2025/09/15] We reached 1000 stars on GitHub! 🔥
27
+
-[2025/09/01] We released the project officially: [vLLM Semantic Router: Next Phase in LLM inference](https://blog.vllm.ai/2025/09/11/semantic-router.html) 🚀.
For detailed installation and configuration instructions, see the [Complete Documentation](https://vllm-semantic-router.com/docs/installation/).
93
118
94
-
### What This Starts By Default
95
-
96
-
`make docker-compose-up` now launches the full stack including a lightweight local OpenAI-compatible model server powered by **llm-katan** (serving the small model `Qwen/Qwen3-0.6B` under the alias `qwen3`). The semantic router is configured to route classification & default generations to this local endpoint out-of-the-box. This gives you an entirely self-contained experience (no external API keys required) while still letting you add remote / larger models later.
97
-
98
-
### Core Mode (Without Local Model)
99
-
100
-
If you only want the core semantic-router + Envoy + observability stack (and will point to external OpenAI-compatible endpoints yourself):
101
-
102
-
```bash
103
-
make docker-compose-up-core
104
-
```
105
-
106
-
### Prerequisite Model Download (Speeds Up First Run)
107
-
108
-
The existing model bootstrap targets now also pre-download the small llm-katan model so the first `docker-compose-up` avoids an on-demand Hugging Face fetch.
109
-
110
-
Minimal set (fast):
111
-
112
-
```bash
113
-
make models-download-minimal
114
-
```
115
-
116
-
Full set:
117
-
118
-
```bash
119
-
make models-download
120
-
```
121
-
122
-
Both create a stamp file once `Qwen/Qwen3-0.6B` is present to keep subsequent runs idempotent.
123
-
124
119
## Documentation 📖
125
120
126
121
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
0 commit comments