Skip to content

Commit c67e031

Browse files
committed
docs(tutorial): add minimal LoRA routing guide
Add comprehensive tutorial for intent-aware LoRA routing including: - vLLM server setup with LoRA adapters - Minimal configuration example - Request/response examples with router logs - Verification steps and configuration validation - Benefits and next steps This provides users with a quick-start guide to enable LoRA routing in their semantic router deployment. Signed-off-by: bitliu <[email protected]>
1 parent 95e0055 commit c67e031

File tree

1 file changed

+251
-0
lines changed

1 file changed

+251
-0
lines changed
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
# LoRA Routing
2+
3+
This guide shows how to enable intent-aware LoRA (Low-Rank Adaptation) routing in the Semantic Router:
4+
5+
- Minimal configuration for LoRA routing
6+
- vLLM server setup with LoRA adapters
7+
- Example request/response showing automatic LoRA selection
8+
- Verification steps
9+
10+
## Prerequisites
11+
12+
- A running vLLM server with LoRA support enabled
13+
- LoRA adapter files (fine-tuned for specific domains)
14+
- Envoy + the router (see [Start the router](../../getting-started/quickstart.md) section)
15+
16+
## 1. Start vLLM with LoRA Adapters
17+
18+
First, start your vLLM server with LoRA support enabled:
19+
20+
```bash
21+
vllm serve meta-llama/Llama-2-7b-hf \
22+
--enable-lora \
23+
--lora-modules \
24+
technical-lora=/path/to/technical-adapter \
25+
medical-lora=/path/to/medical-adapter \
26+
legal-lora=/path/to/legal-adapter \
27+
--host 0.0.0.0 \
28+
--port 8000
29+
```
30+
31+
**Key flags**:
32+
33+
- `--enable-lora`: Enables LoRA adapter support
34+
- `--lora-modules`: Registers LoRA adapters with their names and paths
35+
- Format: `adapter-name=/path/to/adapter`
36+
37+
## 2. Minimal Configuration
38+
39+
Put this in `config/config.yaml` (or merge into your existing config):
40+
41+
```yaml
42+
# Category classifier (required for intent detection)
43+
classifier:
44+
category_model:
45+
model_id: "models/category_classifier_modernbert-base_model"
46+
use_modernbert: true
47+
threshold: 0.6
48+
use_cpu: true
49+
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
50+
51+
# vLLM endpoint hosting your base model + LoRA adapters
52+
vllm_endpoints:
53+
- name: "vllm-primary"
54+
address: "127.0.0.1"
55+
port: 8000
56+
weight: 1
57+
58+
# Define base model and available LoRA adapters
59+
model_config:
60+
"llama2-7b":
61+
reasoning_family: "llama2"
62+
preferred_endpoints: ["vllm-primary"]
63+
# IMPORTANT: Define all available LoRA adapters here
64+
loras:
65+
- name: "technical-lora"
66+
description: "Optimized for programming and technical questions"
67+
- name: "medical-lora"
68+
description: "Specialized for medical and healthcare domain"
69+
- name: "legal-lora"
70+
description: "Fine-tuned for legal questions"
71+
72+
# Default model for fallback
73+
default_model: "llama2-7b"
74+
75+
# Categories with LoRA routing
76+
categories:
77+
- name: "technical"
78+
description: "Programming, software engineering, and technical questions"
79+
system_prompt: "You are an expert software engineer."
80+
model_scores:
81+
- model: "llama2-7b" # Base model name
82+
lora_name: "technical-lora" # LoRA adapter to use
83+
score: 1.0
84+
use_reasoning: true
85+
reasoning_effort: "medium"
86+
87+
- name: "medical"
88+
description: "Medical and healthcare questions"
89+
system_prompt: "You are a medical expert."
90+
model_scores:
91+
- model: "llama2-7b"
92+
lora_name: "medical-lora" # Different LoRA for medical
93+
score: 1.0
94+
use_reasoning: true
95+
reasoning_effort: "high"
96+
97+
- name: "legal"
98+
description: "Legal questions and law-related topics"
99+
system_prompt: "You are a legal expert."
100+
model_scores:
101+
- model: "llama2-7b"
102+
lora_name: "legal-lora" # Different LoRA for legal
103+
score: 1.0
104+
use_reasoning: true
105+
reasoning_effort: "high"
106+
107+
- name: "general"
108+
description: "General questions"
109+
system_prompt: "You are a helpful assistant."
110+
model_scores:
111+
- model: "llama2-7b" # No lora_name = uses base model
112+
score: 0.8
113+
use_reasoning: false
114+
```
115+
116+
## 3. How It Works
117+
118+
```mermaid
119+
graph TB
120+
A[User Query] --> B[Semantic Router]
121+
B --> C[Category Classifier]
122+
123+
C --> D{Classified Category}
124+
D -->|Technical| E[technical-lora]
125+
D -->|Medical| F[medical-lora]
126+
D -->|Legal| G[legal-lora]
127+
D -->|General| H[llama2-7b base]
128+
129+
E --> I[vLLM Server]
130+
F --> I
131+
G --> I
132+
H --> I
133+
134+
I --> J[Response]
135+
```
136+
137+
**Flow**:
138+
139+
1. User sends a query to the router
140+
2. Category classifier detects the intent (e.g., "technical")
141+
3. Router looks up the best `ModelScore` for that category
142+
4. If `lora_name` is specified, it becomes the final model name
143+
5. Request is sent to vLLM with `model="technical-lora"`
144+
6. vLLM routes to the appropriate LoRA adapter
145+
7. Response is returned to the user
146+
147+
## 4. Example Request/Response
148+
149+
### Technical Query (Routes to technical-lora)
150+
151+
**Request**:
152+
153+
```bash
154+
curl -X POST http://localhost:8080/v1/chat/completions \
155+
-H "Content-Type: application/json" \
156+
-d '{
157+
"model": "llama2-7b",
158+
"messages": [
159+
{
160+
"role": "user",
161+
"content": "How do I implement a binary search tree in Python?"
162+
}
163+
]
164+
}'
165+
```
166+
167+
**Router Logs** (showing LoRA selection):
168+
169+
```log
170+
[INFO] Classified query into category: technical (confidence: 0.92)
171+
[INFO] Selected model for category 'technical': technical-lora (score: 1.0)
172+
[DEBUG] Using LoRA adapter 'technical-lora' for base model 'llama2-7b'
173+
[INFO] Forwarding request to vLLM with model='technical-lora'
174+
```
175+
176+
### Medical Query (Routes to medical-lora)
177+
178+
**Request**:
179+
180+
```bash
181+
curl -X POST http://localhost:8080/v1/chat/completions \
182+
-H "Content-Type: application/json" \
183+
-d '{
184+
"model": "llama2-7b",
185+
"messages": [
186+
{
187+
"role": "user",
188+
"content": "What are the symptoms of type 2 diabetes?"
189+
}
190+
]
191+
}'
192+
```
193+
194+
**Router Logs**:
195+
196+
```log
197+
[INFO] Classified query into category: medical (confidence: 0.89)
198+
[INFO] Selected model for category 'medical': medical-lora (score: 1.0)
199+
[DEBUG] Using LoRA adapter 'medical-lora' for base model 'llama2-7b'
200+
[INFO] Forwarding request to vLLM with model='medical-lora'
201+
```
202+
203+
## 5. Verification
204+
205+
### Check LoRA Adapter Selection
206+
207+
Enable debug logging to see which LoRA adapter is selected:
208+
209+
```bash
210+
# In your router startup command, add:
211+
--log-level debug
212+
```
213+
214+
Look for log lines like:
215+
216+
```log
217+
[DEBUG] Using LoRA adapter 'technical-lora' for base model 'llama2-7b'
218+
```
219+
220+
### Test Category Classification
221+
222+
Send test queries and verify they're classified correctly:
223+
224+
```bash
225+
# Technical query
226+
curl -X POST http://localhost:8080/v1/chat/completions \
227+
-H "Content-Type: application/json" \
228+
-d '{"model": "llama2-7b", "messages": [{"role": "user", "content": "Explain async/await in JavaScript"}]}'
229+
230+
# Medical query
231+
curl -X POST http://localhost:8080/v1/chat/completions \
232+
-H "Content-Type: application/json" \
233+
-d '{"model": "llama2-7b", "messages": [{"role": "user", "content": "What causes high blood pressure?"}]}'
234+
```
235+
236+
Check the router logs to confirm the correct LoRA adapter is selected for each query.
237+
238+
239+
## Benefits
240+
241+
- **Domain Expertise**: Each LoRA adapter is fine-tuned for specific domains
242+
- **Cost Efficiency**: Share base model weights across adapters (lower memory usage)
243+
- **Easy A/B Testing**: Compare adapter versions by adjusting scores
244+
- **Flexible Deployment**: Add/remove adapters without restarting the router
245+
- **Automatic Selection**: Users don't need to know which adapter to use
246+
247+
## Next Steps
248+
249+
- See [complete LoRA routing example](https://github.com/vllm-project/semantic-router/blob/main/config/intelligent-routing/in-tree/lora_routing_example.yaml)
250+
- Learn about [category configuration](../../overview/categories/configuration.md#lora_name-optional)
251+
- Explore [reasoning routing](./reasoning.md) to combine with LoRA adapters

0 commit comments

Comments
 (0)