vllm-project
diff --git a/‎README.md‎
Lines changed: 13 additions & 18 deletions b/‎README.md‎
Lines changed: 13 additions & 18 deletions
diff --git a/‎website/static/img/architecture.png‎
199 KB b/‎website/static/img/architecture.png‎
199 KB
diff --git a/‎website/static/img/flow.png‎
657 KB b/‎website/static/img/flow.png‎
657 KB
diff --git a/‎website/static/img/mom-overview.png‎
741 KB b/‎website/static/img/mom-overview.png‎
741 KB
@@ -16,6 +16,7 @@
 
 *Latest News* 🔥
 
+- [2025/10/26] We reached 2000 stars on GitHub! 🔥
 - [2025/10/21] We announced the [2025 Q4 Roadmap: Journey to Iris](https://vllm-semantic-router.com/blog/q4-roadmap-iris) 📅.
 - [2025/10/16] We established the [vLLM Semantic Router Youtube Channel](https://www.youtube.com/@vLLMSemanticRouter) ✨.
 - [2025/10/15] We announced the [vLLM Semantic Router Dashboard](https://www.youtube.com/watch?v=E2IirN8PsFw) 🚀.
@@ -25,13 +26,6 @@
 - [2025/09/15] We reached 1000 stars on GitHub! 🔥
 - [2025/09/01] We released the project officially: [vLLM Semantic Router: Next Phase in LLM inference](https://blog.vllm.ai/2025/09/11/semantic-router.html) 🚀.
 
-<!-- <details>
-<summary>Previous News 🔥</summary>
-
-- 
-
-</details> -->
-
 ---
 
 ## Innovations ✨
@@ -44,30 +38,36 @@
 
 An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
 
-This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives *within* a model, this system selects the best *entire model* for the nature of the task.
+![](./website/static/img/mom-overview.png)
+
+Conceptually similar to Mixture-of-Experts (MoE) which lives *within* a model, this system selects the best *entire model* for the nature of the task.
 
 As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks:
 
 ![Model Accuracy](./website/static/img/category_accuracies.png)
 
-The screenshot below shows the LLM Router dashboard in Grafana.
-
-![LLM Router Dashboard](./website/static/img/grafana_screenshot.png)
-
 The router is implemented in two ways:
 
 - Golang (with Rust FFI based on the [candle](https://github.com/huggingface/candle) rust ML framework)
 - Python
 Benchmarking will be conducted to determine the best implementation.
 
+#### Request Flow
+
+![architecture](./website/static/img/flow.png)
+
 #### Auto-Selection of Tools
 
 Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM.
 
-#### Category-Specific System Prompts
+#### Domain Aware System Prompts
 
 Automatically inject specialized system prompts based on query classification, ensuring optimal model behavior for different domains (math, coding, business, etc.) without manual prompt engineering.
 
+#### Domain Aware Similarity Caching ⚡️
+
+Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.
+
 ### Enterprise Security 🔒
 
 #### PII detection
@@ -78,10 +78,6 @@ Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the p
 
 Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.
 
-### Similarity Caching ⚡️
-
-Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.
-
 ### Distributed Tracing 🔍
 
 Comprehensive observability with OpenTelemetry distributed tracing provides fine-grained visibility into the request processing pipeline.
@@ -128,7 +124,6 @@ The documentation includes:
 - **[Model Training](https://vllm-semantic-router.com/docs/training/training-overview/)** - How classification models work
 - **[API Reference](https://vllm-semantic-router.com/docs/api/router/)** - Complete API documentation
 - **[Dashboard](https://vllm-semantic-router.com/docs/overview/dashboard)** - vLLM Semantic Router Dashboard
-- **[Distributed Tracing](https://vllm-semantic-router.com/docs/tutorials/observability/distributed-tracing/)** - Observability and debugging guide
 
 ## Community 👋