Skip to content

Commit 90c20c3

Browse files
committed
docs(tutorial): add minimal LoRA routing guide
Signed-off-by: bitliu <[email protected]>
1 parent 95e0055 commit 90c20c3

File tree

2 files changed

+178
-0
lines changed

2 files changed

+178
-0
lines changed
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# LoRA Routing
2+
3+
This guide shows how to enable intent-aware LoRA (Low-Rank Adaptation) routing in the Semantic Router:
4+
5+
- Minimal configuration for LoRA routing
6+
- vLLM server setup with LoRA adapters
7+
- Example request/response showing automatic LoRA selection
8+
- Verification steps
9+
10+
## Prerequisites
11+
12+
- A running vLLM server with LoRA support enabled
13+
- LoRA adapter files (fine-tuned for specific domains)
14+
- Envoy + the router (see [Start the router](../../getting-started/quickstart.md) section)
15+
16+
## 1. Start vLLM with LoRA Adapters
17+
18+
First, start your vLLM server with LoRA support enabled:
19+
20+
```bash
21+
vllm serve meta-llama/Llama-2-7b-hf \
22+
--enable-lora \
23+
--lora-modules \
24+
technical-lora=/path/to/technical-adapter \
25+
medical-lora=/path/to/medical-adapter \
26+
legal-lora=/path/to/legal-adapter \
27+
--host 0.0.0.0 \
28+
--port 8000
29+
```
30+
31+
**Key flags**:
32+
33+
- `--enable-lora`: Enables LoRA adapter support
34+
- `--lora-modules`: Registers LoRA adapters with their names and paths
35+
- Format: `adapter-name=/path/to/adapter`
36+
37+
## 2. Minimal Configuration
38+
39+
Put this in `config/config.yaml` (or merge into your existing config):
40+
41+
```yaml
42+
# Category classifier (required for intent detection)
43+
classifier:
44+
category_model:
45+
model_id: "models/category_classifier_modernbert-base_model"
46+
use_modernbert: true
47+
threshold: 0.6
48+
use_cpu: true
49+
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
50+
51+
# vLLM endpoint hosting your base model + LoRA adapters
52+
vllm_endpoints:
53+
- name: "vllm-primary"
54+
address: "127.0.0.1"
55+
port: 8000
56+
weight: 1
57+
58+
# Define base model and available LoRA adapters
59+
model_config:
60+
"llama2-7b":
61+
reasoning_family: "llama2"
62+
preferred_endpoints: ["vllm-primary"]
63+
# IMPORTANT: Define all available LoRA adapters here
64+
loras:
65+
- name: "technical-lora"
66+
description: "Optimized for programming and technical questions"
67+
- name: "medical-lora"
68+
description: "Specialized for medical and healthcare domain"
69+
- name: "legal-lora"
70+
description: "Fine-tuned for legal questions"
71+
72+
# Default model for fallback
73+
default_model: "llama2-7b"
74+
75+
# Categories with LoRA routing
76+
categories:
77+
- name: "technical"
78+
description: "Programming, software engineering, and technical questions"
79+
system_prompt: "You are an expert software engineer."
80+
model_scores:
81+
- model: "llama2-7b" # Base model name
82+
lora_name: "technical-lora" # LoRA adapter to use
83+
score: 1.0
84+
use_reasoning: true
85+
reasoning_effort: "medium"
86+
87+
- name: "medical"
88+
description: "Medical and healthcare questions"
89+
system_prompt: "You are a medical expert."
90+
model_scores:
91+
- model: "llama2-7b"
92+
lora_name: "medical-lora" # Different LoRA for medical
93+
score: 1.0
94+
use_reasoning: true
95+
reasoning_effort: "high"
96+
97+
- name: "legal"
98+
description: "Legal questions and law-related topics"
99+
system_prompt: "You are a legal expert."
100+
model_scores:
101+
- model: "llama2-7b"
102+
lora_name: "legal-lora" # Different LoRA for legal
103+
score: 1.0
104+
use_reasoning: true
105+
reasoning_effort: "high"
106+
107+
- name: "general"
108+
description: "General questions"
109+
system_prompt: "You are a helpful assistant."
110+
model_scores:
111+
- model: "llama2-7b" # No lora_name = uses base model
112+
score: 0.8
113+
use_reasoning: false
114+
```
115+
116+
## 3. How It Works
117+
118+
```mermaid
119+
graph TB
120+
A[User Query] --> B[Semantic Router]
121+
B --> C[Category Classifier]
122+
123+
C --> D{Classified Category}
124+
D -->|Technical| E[technical-lora]
125+
D -->|Medical| F[medical-lora]
126+
D -->|Legal| G[legal-lora]
127+
D -->|General| H[llama2-7b base]
128+
129+
E --> I[vLLM Server]
130+
F --> I
131+
G --> I
132+
H --> I
133+
134+
I --> J[Response]
135+
```
136+
137+
**Flow**:
138+
139+
1. User sends a query to the router
140+
2. Category classifier detects the intent (e.g., "technical")
141+
3. Router looks up the best `ModelScore` for that category
142+
4. If `lora_name` is specified, it becomes the final model name
143+
5. Request is sent to vLLM with `model="technical-lora"`
144+
6. vLLM routes to the appropriate LoRA adapter
145+
7. Response is returned to the user
146+
147+
### Test Domain Aware LoRA Routing
148+
149+
Send test queries and verify they're classified correctly:
150+
151+
```bash
152+
# Technical query
153+
curl -X POST http://localhost:8080/v1/chat/completions \
154+
-H "Content-Type: application/json" \
155+
-d '{"model": "MoM", "messages": [{"role": "user", "content": "Explain async/await in JavaScript"}]}'
156+
157+
# Medical query
158+
curl -X POST http://localhost:8080/v1/chat/completions \
159+
-H "Content-Type: application/json" \
160+
-d '{"model": "MoM", "messages": [{"role": "user", "content": "What causes high blood pressure?"}]}'
161+
```
162+
163+
Check the router logs to confirm the correct LoRA adapter is selected for each query.
164+
165+
166+
## Benefits
167+
168+
- **Domain Expertise**: Each LoRA adapter is fine-tuned for specific domains
169+
- **Cost Efficiency**: Share base model weights across adapters (lower memory usage)
170+
- **Easy A/B Testing**: Compare adapter versions by adjusting scores
171+
- **Flexible Deployment**: Add/remove adapters without restarting the router
172+
- **Automatic Selection**: Users don't need to know which adapter to use
173+
174+
## Next Steps
175+
176+
- See [complete LoRA routing example](https://github.com/vllm-project/semantic-router/blob/main/config/intelligent-routing/in-tree/lora_routing_example.yaml)
177+
- Learn about [category configuration](../../overview/categories/configuration.md#lora_name-optional)
178+
- Explore [reasoning routing](./reasoning.md) to combine with LoRA adapters

0 commit comments

Comments
 (0)