project: clean-up and improve docs

Xunzhuo · Xunzhuo · commit 92385f0495b7 · 2025-08-27T14:06:08.000+08:00
Signed-off-by: bitliu &lt;bitliu@tencent.com&gt;
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -103,6 +103,24 @@ make build
    make test-jailbreak-classifier
    ```
 
+### Manual Testing
+
+Test different routing scenarios:
+
+```bash
+# Test model auto-selection
+make test-prompt
+
+# Test PII detection
+make test-pii
+
+# Test prompt guard (jailbreak detection)
+make test-prompt-guard
+
+# Test tools auto-selection
+make test-tools
+```
+
 ### End-to-End Tests
 
 Ensure both Envoy and the router are running, then:
@@ -121,23 +139,14 @@ python e2e-tests/run_all_tests.py --pattern "0*-*.py"
 python e2e-tests/run_all_tests.py --check-only
 ```
 
-### Manual Testing
+The test suite includes:
 
-Test different routing scenarios:
-
-```bash
-# Test model auto-selection
-make test-prompt
-
-# Test PII detection
-make test-pii
-
-# Test prompt guard (jailbreak detection)
-make test-prompt-guard
-
-# Test tools auto-selection
-make test-tools
-```
++ Basic client request tests
++ Envoy ExtProc interaction tests
++ Router classification tests
++ Semantic cache tests
++ Category-specific tests
++ Metrics validation tests
 
 ## Development Workflow
 
diff --git a/OWNER b/OWNER
@@ -1,3 +1,3 @@
-# Root directory owners
+# Root directory Owners
 @rootfs
 @Xunzhuo
diff --git a/README.md b/README.md
@@ -15,18 +15,46 @@
 
 ## Overview
 
+```mermaid
+graph TB
+    Client[Client Request] --> Router[vLLM Semantic Router]
+    
+    subgraph "Intent Understanding"
+        direction LR
+        PII[PII Detector] 
+        Jailbreak[Jailbreak Guard]
+        Category[Category Classifier]
+        Cache[Semantic Cache]
+    end
+    
+    Router --> PII
+    Router --> Jailbreak  
+    Router --> Category
+    Router --> Cache
+    
+    PII --> Decision{Security Check}
+    Jailbreak --> Decision
+    Decision -->|Block| Block[Block Request]
+    Decision -->|Pass| Category
+    Category --> Models[Route to Specialized Model]
+    Cache -->|Hit| FastResponse[Return Cached Response]
+    
+    Models --> Math[Math Model]
+    Models --> Creative[Creative Model] 
+    Models --> Code[Code Model]
+    Models --> General[General Model]
+```
+
 ### Auto-Selection of Models
 
-An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent.
+An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
 
 This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives *within* a model, this system selects the best *entire model* for the nature of the task.
 
 As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks:
 
 ![Model Accuracy](./docs/category_accuracies.png)
 
-The detailed design doc can be found [here](https://docs.google.com/document/d/1BwwRxdf74GuCdG1veSApzMRMJhXeUWcw0wH9YRAmgGw/edit?usp=sharing).
-
 The screenshot below shows the LLM Router dashboard in Grafana.
 
 ![LLM Router Dashboard](./docs/grafana_screenshot.png)
@@ -61,96 +89,3 @@ The documentation includes:
 - **[System Architecture](https://llm-semantic-router.readthedocs.io/en/latest/architecture/system-architecture/)** - Technical deep dive
 - **[Model Training](https://llm-semantic-router.readthedocs.io/en/latest/training/training-overview/)** - How classification models work
 - **[API Reference](https://llm-semantic-router.readthedocs.io/en/latest/api/router/)** - Complete API documentation
-
-## Quick Usage
-
-### Prerequisites
-
-- Rust
-- Envoy
-- Huggingface CLI
-
-### Run the Envoy Proxy
-
-This listens for incoming requests and uses the ExtProc filter.
-```bash
-make run-envoy
-```
-
-### Download the models
-
-```bash
-make download-models
-```
-
-### Run the Semantic Router (Go Implementation)
-
-This builds the Rust binding and the Go router, then starts the ExtProc gRPC server that Envoy communicates with.
-```bash
-make run-router
-```
-
-Once both Envoy and the router are running, you can test the routing logic using predefined prompts:
-
-```bash
-# Test the tools auto-selection
-make test-tools
-
-# Test the auto-selection of model
-make test-prompt
-
-# Test the prompt guard
-make test-prompt-guard
-
-# Test the PII detection
-make test-pii
-```
-
-This will send curl requests simulating different types of user prompts (Math, Creative Writing, General) to the Envoy endpoint (`http://localhost:8801`). The router should direct these to the appropriate backend model configured in `config/config.yaml`.
-
-## Testing
-
-A comprehensive test suite is available to validate the functionality of the Semantic Router. The tests follow the data flow through the system, from client request to routing decision.
-
-### Prerequisites
-
-Install test dependencies:
-```bash
-pip install -r tests/requirements.txt
-```
-
-### Running Tests
-
-Make sure both the Envoy proxy and Router are running:
-```bash
-make run-envoy  # In one terminal
-make run-router  # In another terminal
-```
-### Running e2e Tests
-Run all tests in sequence:
-```bash
-python e2e-tests/run_all_tests.py
-```
-
-Run a specific test:
-```bash
-python e2e-tests/00-client-request-test.py
-```
-
-Run only tests matching a pattern:
-```bash
-python e2e-tests/run_all_tests.py --pattern "0*-*.py"
-```
-
-Check if services are running without running tests:
-```bash
-python e2e-tests/run_all_tests.py --check-only
-```
-
-The test suite includes:
-- Basic client request tests
-- Envoy ExtProc interaction tests
-- Router classification tests
-- Semantic cache tests
-- Category-specific tests
-- Metrics validation tests

-Original file line number
+Diff line change
@@ @@ -1,3 +1,3 @@ @@
 -# Root directory owners
 +# Root directory Owners
 @rootfs
 @Xunzhuo