|
15 | 15 |
|
16 | 16 | ## Overview |
17 | 17 |
|
| 18 | +```mermaid |
| 19 | +graph TB |
| 20 | + Client[Client Request] --> Router[vLLM Semantic Router] |
| 21 | + |
| 22 | + subgraph "Intent Understanding" |
| 23 | + direction LR |
| 24 | + PII[PII Detector] |
| 25 | + Jailbreak[Jailbreak Guard] |
| 26 | + Category[Category Classifier] |
| 27 | + Cache[Semantic Cache] |
| 28 | + end |
| 29 | + |
| 30 | + Router --> PII |
| 31 | + Router --> Jailbreak |
| 32 | + Router --> Category |
| 33 | + Router --> Cache |
| 34 | + |
| 35 | + PII --> Decision{Security Check} |
| 36 | + Jailbreak --> Decision |
| 37 | + Decision -->|Block| Block[Block Request] |
| 38 | + Decision -->|Pass| Category |
| 39 | + Category --> Models[Route to Specialized Model] |
| 40 | + Cache -->|Hit| FastResponse[Return Cached Response] |
| 41 | + |
| 42 | + Models --> Math[Math Model] |
| 43 | + Models --> Creative[Creative Model] |
| 44 | + Models --> Code[Code Model] |
| 45 | + Models --> General[General Model] |
| 46 | +``` |
| 47 | + |
18 | 48 | ### Auto-Selection of Models |
19 | 49 |
|
20 | | -An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent. |
| 50 | +An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools). |
21 | 51 |
|
22 | 52 | This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives *within* a model, this system selects the best *entire model* for the nature of the task. |
23 | 53 |
|
24 | 54 | As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks: |
25 | 55 |
|
26 | 56 |  |
27 | 57 |
|
28 | | -The detailed design doc can be found [here](https://docs.google.com/document/d/1BwwRxdf74GuCdG1veSApzMRMJhXeUWcw0wH9YRAmgGw/edit?usp=sharing). |
29 | | - |
30 | 58 | The screenshot below shows the LLM Router dashboard in Grafana. |
31 | 59 |
|
32 | 60 |  |
@@ -61,96 +89,3 @@ The documentation includes: |
61 | 89 | - **[System Architecture](https://llm-semantic-router.readthedocs.io/en/latest/architecture/system-architecture/)** - Technical deep dive |
62 | 90 | - **[Model Training](https://llm-semantic-router.readthedocs.io/en/latest/training/training-overview/)** - How classification models work |
63 | 91 | - **[API Reference](https://llm-semantic-router.readthedocs.io/en/latest/api/router/)** - Complete API documentation |
64 | | - |
65 | | -## Quick Usage |
66 | | - |
67 | | -### Prerequisites |
68 | | - |
69 | | -- Rust |
70 | | -- Envoy |
71 | | -- Huggingface CLI |
72 | | - |
73 | | -### Run the Envoy Proxy |
74 | | - |
75 | | -This listens for incoming requests and uses the ExtProc filter. |
76 | | -```bash |
77 | | -make run-envoy |
78 | | -``` |
79 | | - |
80 | | -### Download the models |
81 | | - |
82 | | -```bash |
83 | | -make download-models |
84 | | -``` |
85 | | - |
86 | | -### Run the Semantic Router (Go Implementation) |
87 | | - |
88 | | -This builds the Rust binding and the Go router, then starts the ExtProc gRPC server that Envoy communicates with. |
89 | | -```bash |
90 | | -make run-router |
91 | | -``` |
92 | | - |
93 | | -Once both Envoy and the router are running, you can test the routing logic using predefined prompts: |
94 | | - |
95 | | -```bash |
96 | | -# Test the tools auto-selection |
97 | | -make test-tools |
98 | | - |
99 | | -# Test the auto-selection of model |
100 | | -make test-prompt |
101 | | - |
102 | | -# Test the prompt guard |
103 | | -make test-prompt-guard |
104 | | - |
105 | | -# Test the PII detection |
106 | | -make test-pii |
107 | | -``` |
108 | | - |
109 | | -This will send curl requests simulating different types of user prompts (Math, Creative Writing, General) to the Envoy endpoint (`http://localhost:8801`). The router should direct these to the appropriate backend model configured in `config/config.yaml`. |
110 | | - |
111 | | -## Testing |
112 | | - |
113 | | -A comprehensive test suite is available to validate the functionality of the Semantic Router. The tests follow the data flow through the system, from client request to routing decision. |
114 | | - |
115 | | -### Prerequisites |
116 | | - |
117 | | -Install test dependencies: |
118 | | -```bash |
119 | | -pip install -r tests/requirements.txt |
120 | | -``` |
121 | | - |
122 | | -### Running Tests |
123 | | - |
124 | | -Make sure both the Envoy proxy and Router are running: |
125 | | -```bash |
126 | | -make run-envoy # In one terminal |
127 | | -make run-router # In another terminal |
128 | | -``` |
129 | | -### Running e2e Tests |
130 | | -Run all tests in sequence: |
131 | | -```bash |
132 | | -python e2e-tests/run_all_tests.py |
133 | | -``` |
134 | | - |
135 | | -Run a specific test: |
136 | | -```bash |
137 | | -python e2e-tests/00-client-request-test.py |
138 | | -``` |
139 | | - |
140 | | -Run only tests matching a pattern: |
141 | | -```bash |
142 | | -python e2e-tests/run_all_tests.py --pattern "0*-*.py" |
143 | | -``` |
144 | | - |
145 | | -Check if services are running without running tests: |
146 | | -```bash |
147 | | -python e2e-tests/run_all_tests.py --check-only |
148 | | -``` |
149 | | - |
150 | | -The test suite includes: |
151 | | -- Basic client request tests |
152 | | -- Envoy ExtProc interaction tests |
153 | | -- Router classification tests |
154 | | -- Semantic cache tests |
155 | | -- Category-specific tests |
156 | | -- Metrics validation tests |
0 commit comments