Skip to content

Commit 95e0055

Browse files
committed
feat(router): add intent-aware LoRA routing support
- Add LoRAAdapter struct to define available LoRA adapters per model - Add lora_name field to ModelScore for specifying LoRA adapter - Implement validation to ensure lora_name references defined LoRAs - Update model selection logic to use LoRA name when specified - Add comprehensive example configuration and documentation - Update README to reflect LoRA adapter routing capability This enables semantic router to route requests to different LoRA adapters based on classified intent/category, allowing domain-specific fine-tuned models to be selected automatically. Fixes: #545 Signed-off-by: bitliu <[email protected]>
1 parent 4fbee46 commit 95e0055

File tree

6 files changed

+283
-9
lines changed

6 files changed

+283
-9
lines changed

README.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@
3535

3636
### Intelligent Routing 🧠
3737

38-
#### Auto-Reasoning and Auto-Selection of Models
38+
#### Auto-Selection of Models and LoRA Adapters
3939

40-
An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
40+
An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models or LoRA adapters from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
4141

4242
![mom-overview](./website/static/img/mom-overview.png)
4343

@@ -79,10 +79,6 @@ Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the p
7979

8080
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.
8181

82-
### Distributed Tracing 🔍
83-
84-
Comprehensive observability with OpenTelemetry distributed tracing provides fine-grained visibility into the request processing pipeline.
85-
8682
### vLLM Semantic Router Dashboard 💬
8783

8884
Watch the quick demo of the dashboard below:
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Example configuration for Intent-Aware LoRA Routing
2+
# This demonstrates how to use the lora_name field to route requests to different
3+
# LoRA adapters based on the classified intent/category.
4+
#
5+
# Prerequisites:
6+
# 1. vLLM server must be started with --enable-lora flag
7+
# 2. LoRA adapters must be registered at server startup using --lora-modules
8+
# Example: vllm serve meta-llama/Llama-2-7b-hf \
9+
# --enable-lora \
10+
# --lora-modules technical-lora=/path/to/technical-adapter \
11+
# medical-lora=/path/to/medical-adapter \
12+
# legal-lora=/path/to/legal-adapter
13+
#
14+
# How it works:
15+
# - When a request is classified into a category (e.g., "technical")
16+
# - The router selects the best ModelScore for that category
17+
# - If the ModelScore has a lora_name specified, that name is used as the final model name
18+
# - The request is sent to vLLM with model="technical-lora" instead of model="llama2-7b"
19+
# - vLLM automatically routes to the appropriate LoRA adapter
20+
21+
bert_model:
22+
model_id: models/all-MiniLM-L12-v2
23+
threshold: 0.6
24+
use_cpu: true
25+
26+
# vLLM Endpoints Configuration
27+
vllm_endpoints:
28+
- name: "vllm-primary"
29+
address: "172.28.0.20"
30+
port: 8002
31+
weight: 1
32+
33+
# Base model configuration
34+
# IMPORTANT: LoRA adapters must be defined here before they can be referenced in model_scores
35+
model_config:
36+
"llama2-7b":
37+
reasoning_family: "llama2"
38+
preferred_endpoints: ["vllm-primary"]
39+
pii_policy:
40+
allow_by_default: true
41+
# Define available LoRA adapters for this model
42+
# These names must match the LoRA modules registered with vLLM at startup
43+
loras:
44+
- name: "technical-lora"
45+
description: "Optimized for programming and technical questions"
46+
- name: "medical-lora"
47+
description: "Specialized for medical and healthcare domain"
48+
- name: "legal-lora"
49+
description: "Fine-tuned for legal questions and law-related topics"
50+
51+
# Classifier configuration
52+
classifier:
53+
category_model:
54+
model_id: "models/category_classifier_modernbert-base_model"
55+
use_modernbert: true
56+
threshold: 0.6
57+
use_cpu: true
58+
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
59+
60+
# Categories with LoRA routing
61+
categories:
62+
- name: technical
63+
description: "Programming, software engineering, and technical questions"
64+
system_prompt: "You are an expert software engineer with deep knowledge of programming languages, algorithms, system design, and best practices. Provide clear, accurate technical guidance with code examples when appropriate."
65+
model_scores:
66+
- model: llama2-7b # Base model name (for endpoint selection and PII policy)
67+
lora_name: technical-lora # LoRA adapter name (used as final model name in request)
68+
score: 1.0
69+
use_reasoning: true
70+
reasoning_effort: medium
71+
72+
- name: medical
73+
description: "Medical and healthcare questions"
74+
system_prompt: "You are a medical expert with comprehensive knowledge of anatomy, physiology, diseases, treatments, and healthcare practices. Provide accurate medical information while emphasizing that responses are for educational purposes only and not a substitute for professional medical advice."
75+
model_scores:
76+
- model: llama2-7b
77+
lora_name: medical-lora # Different LoRA adapter for medical domain
78+
score: 1.0
79+
use_reasoning: true
80+
reasoning_effort: high
81+
82+
- name: legal
83+
description: "Legal questions and law-related topics"
84+
system_prompt: "You are a legal expert with knowledge of legal principles, case law, and statutory interpretation. Provide accurate legal information while clearly stating that responses are for informational purposes only and do not constitute legal advice."
85+
model_scores:
86+
- model: llama2-7b
87+
lora_name: legal-lora # Different LoRA adapter for legal domain
88+
score: 1.0
89+
use_reasoning: true
90+
reasoning_effort: high
91+
92+
- name: general
93+
description: "General questions that don't fit specific domains"
94+
system_prompt: "You are a helpful AI assistant with broad knowledge across many topics. Provide clear, accurate, and helpful responses."
95+
model_scores:
96+
- model: llama2-7b # No lora_name specified - uses base model
97+
score: 0.8
98+
use_reasoning: false
99+
100+
# Default model for fallback
101+
default_model: llama2-7b
102+
103+
# Benefits of LoRA Routing:
104+
# 1. Domain-Specific Expertise: Each LoRA adapter is fine-tuned for specific domains
105+
# 2. Cost Efficiency: Share base model weights across adapters, reducing memory footprint
106+
# 3. Easy A/B Testing: Gradually roll out new adapters by adjusting scores
107+
# 4. Flexible Deployment: Add/remove adapters without restarting the router
108+
# 5. Performance: vLLM efficiently serves multiple LoRA adapters with minimal overhead
109+
#
110+
# Use Cases:
111+
# - Multi-domain chatbots (technical support, medical advice, legal information)
112+
# - Task-specific optimization (code generation, summarization, translation)
113+
# - Language-specific adapters for multilingual systems
114+
# - Customer-specific adapters for personalized experiences
115+
# - Version testing (compare different adapter versions)
116+

src/semantic-router/pkg/classification/classifier.go

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -984,7 +984,15 @@ func (c *Classifier) selectBestModelInternal(cat *config.Category, modelFilter f
984984
if modelFilter != nil && !modelFilter(model) {
985985
return
986986
}
987-
c.updateBestModel(modelScore.Score, model, &bestScore, &bestModel)
987+
// Use LoRA name if specified, otherwise use the base model name
988+
// This enables intent-aware LoRA routing where the final model name
989+
// in the request becomes the LoRA adapter name
990+
finalModelName := model
991+
if modelScore.LoRAName != "" {
992+
finalModelName = modelScore.LoRAName
993+
logging.Debugf("Using LoRA adapter '%s' for base model '%s'", finalModelName, model)
994+
}
995+
c.updateBestModel(modelScore.Score, finalModelName, &bestScore, &bestModel)
988996
})
989997

990998
return bestModel, bestScore
@@ -1024,13 +1032,19 @@ func (c *Classifier) SelectBestModelFromList(candidateModels []string, categoryN
10241032
}
10251033

10261034
// GetModelsForCategory returns all models that are configured for the given category
1035+
// If a ModelScore has a LoRAName specified, the LoRA name is returned instead of the base model name
10271036
func (c *Classifier) GetModelsForCategory(categoryName string) []string {
10281037
var models []string
10291038

10301039
for _, category := range c.Config.Categories {
10311040
if strings.EqualFold(category.Name, categoryName) {
10321041
for _, modelScore := range category.ModelScores {
1033-
models = append(models, modelScore.Model)
1042+
// Use LoRA name if specified, otherwise use the base model name
1043+
if modelScore.LoRAName != "" {
1044+
models = append(models, modelScore.LoRAName)
1045+
} else {
1046+
models = append(models, modelScore.Model)
1047+
}
10341048
}
10351049
break
10361050
}

src/semantic-router/pkg/config/config.go

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,18 @@ type ModelParams struct {
373373
// Reasoning family for this model (e.g., "deepseek", "qwen3", "gpt-oss")
374374
// If empty, the model doesn't support reasoning mode
375375
ReasoningFamily string `yaml:"reasoning_family,omitempty"`
376+
377+
// LoRA adapters available for this model
378+
// These must be registered with vLLM using --lora-modules flag
379+
LoRAs []LoRAAdapter `yaml:"loras,omitempty"`
380+
}
381+
382+
// LoRAAdapter represents a LoRA adapter configuration for a model
383+
type LoRAAdapter struct {
384+
// Name of the LoRA adapter (must match the name registered with vLLM)
385+
Name string `yaml:"name"`
386+
// Description of what this LoRA adapter is optimized for
387+
Description string `yaml:"description,omitempty"`
376388
}
377389

378390
// ReasoningFamilyConfig defines how a reasoning family handles reasoning mode
@@ -426,6 +438,11 @@ type Category struct {
426438
type ModelScore struct {
427439
Model string `yaml:"model"`
428440
Score float64 `yaml:"score"`
441+
// Optional LoRA adapter name - when specified, this LoRA adapter name will be used
442+
// as the final model name in requests instead of the base model name.
443+
// This enables intent-aware LoRA routing where different LoRA adapters can be
444+
// selected based on the classified category.
445+
LoRAName string `yaml:"lora_name,omitempty"`
429446
// Reasoning mode control on Model Level
430447
ModelReasoningControl `yaml:",inline"`
431448
}

src/semantic-router/pkg/config/validator.go

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,13 @@ func validateConfigStructure(cfg *RouterConfig) error {
102102
if modelScore.UseReasoning == nil {
103103
return fmt.Errorf("category '%s', model '%s': missing required field 'use_reasoning'", category.Name, modelScore.Model)
104104
}
105+
106+
// Validate LoRA name if specified
107+
if modelScore.LoRAName != "" {
108+
if err := validateLoRAName(cfg, modelScore.Model, modelScore.LoRAName); err != nil {
109+
return fmt.Errorf("category '%s', model '%s': %w", category.Name, modelScore.Model, err)
110+
}
111+
}
105112
}
106113
}
107114

@@ -112,3 +119,31 @@ func validateConfigStructure(cfg *RouterConfig) error {
112119

113120
return nil
114121
}
122+
123+
// validateLoRAName checks if the specified LoRA name is defined in the model's configuration
124+
func validateLoRAName(cfg *RouterConfig, modelName string, loraName string) error {
125+
// Check if the model exists in model_config
126+
modelParams, exists := cfg.ModelConfig[modelName]
127+
if !exists {
128+
return fmt.Errorf("lora_name '%s' specified but model '%s' is not defined in model_config", loraName, modelName)
129+
}
130+
131+
// Check if the model has any LoRAs defined
132+
if len(modelParams.LoRAs) == 0 {
133+
return fmt.Errorf("lora_name '%s' specified but model '%s' has no loras defined in model_config", loraName, modelName)
134+
}
135+
136+
// Check if the specified LoRA name exists in the model's LoRA list
137+
for _, lora := range modelParams.LoRAs {
138+
if lora.Name == loraName {
139+
return nil // Valid LoRA name found
140+
}
141+
}
142+
143+
// LoRA name not found, provide helpful error message
144+
availableLoRAs := make([]string, len(modelParams.LoRAs))
145+
for i, lora := range modelParams.LoRAs {
146+
availableLoRAs[i] = lora.Name
147+
}
148+
return fmt.Errorf("lora_name '%s' is not defined in model '%s' loras. Available LoRAs: %v", loraName, modelName, availableLoRAs)
149+
}

website/docs/overview/categories/configuration.md

Lines changed: 97 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,62 @@ categories:
223223
- **0.4-0.5**: Adequate capability
224224
- **0.0-0.3**: Poor capability, avoid if possible
225225

226+
#### `lora_name` (Optional)
227+
228+
- **Type**: String
229+
- **Description**: LoRA adapter name to use for this model
230+
- **Purpose**: Enable intent-aware LoRA routing
231+
- **Validation**: Must be defined in the model's `loras` list in `model_config`
232+
233+
When specified, the `lora_name` becomes the final model name in requests to vLLM, enabling automatic routing to LoRA adapters based on classified intent.
234+
235+
```yaml
236+
# First, define available LoRA adapters in model_config
237+
model_config:
238+
"llama2-7b":
239+
reasoning_family: "llama2"
240+
preferred_endpoints: ["vllm-primary"]
241+
loras:
242+
- name: "technical-lora"
243+
description: "Optimized for technical questions"
244+
- name: "medical-lora"
245+
description: "Specialized for medical domain"
246+
247+
# Then reference them in categories
248+
categories:
249+
- name: "technical"
250+
model_scores:
251+
- model: "llama2-7b" # Base model (for endpoint selection)
252+
lora_name: "technical-lora" # LoRA adapter name (final model name)
253+
score: 1.0
254+
```
255+
256+
**How LoRA Routing Works**:
257+
258+
1. LoRA adapters are defined in `model_config` under the base model
259+
2. Request is classified into a category (e.g., "technical")
260+
3. Router selects the best `ModelScore` for that category
261+
4. Configuration validator ensures `lora_name` is defined in model's `loras` list
262+
5. If `lora_name` is specified, it replaces the base model name
263+
6. Request is sent to vLLM with `model="technical-lora"`
264+
7. vLLM automatically routes to the appropriate LoRA adapter
265+
266+
**Prerequisites**:
267+
268+
- vLLM server must be started with `--enable-lora` flag
269+
- LoRA adapters must be registered using `--lora-modules` parameter
270+
- LoRA names must be defined in `model_config` before use in `model_scores`
271+
272+
**Benefits**:
273+
274+
- **Domain Expertise**: Fine-tuned adapters for specific domains
275+
- **Cost Efficiency**: Share base model weights across adapters
276+
- **Easy A/B Testing**: Compare adapter versions by adjusting scores
277+
- **Flexible Deployment**: Add/remove adapters without router restart
278+
- **Configuration Validation**: Prevents typos and missing LoRA definitions
279+
280+
See [LoRA Routing Example](https://github.com/vllm-project/semantic-router/blob/main/config/intelligent-routing/in-tree/lora_routing_example.yaml) for complete configuration.
281+
226282
## Complete Configuration Examples
227283

228284
### Example 1: STEM Category (Reasoning Enabled)
@@ -261,7 +317,47 @@ categories:
261317
score: 0.2
262318
```
263319

264-
### Example 3: Security-Focused Configuration (Jailbreak Protection)
320+
### Example 3: Intent-Aware LoRA Routing
321+
322+
```yaml
323+
# Define LoRA adapters in model_config first
324+
model_config:
325+
"llama2-7b":
326+
reasoning_family: "llama2"
327+
preferred_endpoints: ["vllm-primary"]
328+
loras:
329+
- name: "technical-lora"
330+
description: "Optimized for technical questions"
331+
- name: "medical-lora"
332+
description: "Specialized for medical domain"
333+
334+
# Then reference them in categories
335+
categories:
336+
- name: "technical"
337+
description: "Programming and technical questions"
338+
model_scores:
339+
- model: "llama2-7b"
340+
lora_name: "technical-lora" # Routes to technical LoRA adapter
341+
score: 1.0
342+
use_reasoning: true
343+
344+
- name: "medical"
345+
description: "Medical and healthcare questions"
346+
model_scores:
347+
- model: "llama2-7b"
348+
lora_name: "medical-lora" # Routes to medical LoRA adapter
349+
score: 1.0
350+
use_reasoning: true
351+
352+
- name: "general"
353+
description: "General questions"
354+
model_scores:
355+
- model: "llama2-7b" # No lora_name - uses base model
356+
score: 0.8
357+
use_reasoning: false
358+
```
359+
360+
### Example 4: Security-Focused Configuration (Jailbreak Protection)
265361

266362
```yaml
267363
categories:

0 commit comments

Comments
 (0)