Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,9 @@

### Intelligent Routing 🧠

#### Auto-Reasoning and Auto-Selection of Models
#### Auto-Selection of Models and LoRA Adapters

An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models or LoRA adapters from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).

![mom-overview](./website/static/img/mom-overview.png)

Expand Down Expand Up @@ -79,10 +79,6 @@ Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the p

Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.

### Distributed Tracing 🔍

Comprehensive observability with OpenTelemetry distributed tracing provides fine-grained visibility into the request processing pipeline.

### vLLM Semantic Router Dashboard 💬

Watch the quick demo of the dashboard below:
Expand Down
116 changes: 116 additions & 0 deletions config/intelligent-routing/in-tree/lora_routing.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Example configuration for Intent-Aware LoRA Routing
# This demonstrates how to use the lora_name field to route requests to different
# LoRA adapters based on the classified intent/category.
#
# Prerequisites:
# 1. vLLM server must be started with --enable-lora flag
# 2. LoRA adapters must be registered at server startup using --lora-modules
# Example: vllm serve meta-llama/Llama-2-7b-hf \
# --enable-lora \
# --lora-modules technical-lora=/path/to/technical-adapter \
# medical-lora=/path/to/medical-adapter \
# legal-lora=/path/to/legal-adapter
#
# How it works:
# - When a request is classified into a category (e.g., "technical")
# - The router selects the best ModelScore for that category
# - If the ModelScore has a lora_name specified, that name is used as the final model name
# - The request is sent to vLLM with model="technical-lora" instead of model="llama2-7b"
# - vLLM automatically routes to the appropriate LoRA adapter

bert_model:
model_id: models/all-MiniLM-L12-v2
threshold: 0.6
use_cpu: true

# vLLM Endpoints Configuration
vllm_endpoints:
- name: "vllm-primary"
address: "172.28.0.20"
port: 8002
weight: 1

# Base model configuration
# IMPORTANT: LoRA adapters must be defined here before they can be referenced in model_scores
model_config:
"llama2-7b":
reasoning_family: "llama2"
preferred_endpoints: ["vllm-primary"]
pii_policy:
allow_by_default: true
# Define available LoRA adapters for this model
# These names must match the LoRA modules registered with vLLM at startup
loras:
- name: "technical-lora"
description: "Optimized for programming and technical questions"
- name: "medical-lora"
description: "Specialized for medical and healthcare domain"
- name: "legal-lora"
description: "Fine-tuned for legal questions and law-related topics"

# Classifier configuration
classifier:
category_model:
model_id: "models/category_classifier_modernbert-base_model"
use_modernbert: true
threshold: 0.6
use_cpu: true
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"

# Categories with LoRA routing
categories:
- name: technical
description: "Programming, software engineering, and technical questions"
system_prompt: "You are an expert software engineer with deep knowledge of programming languages, algorithms, system design, and best practices. Provide clear, accurate technical guidance with code examples when appropriate."
model_scores:
- model: llama2-7b # Base model name (for endpoint selection and PII policy)
lora_name: technical-lora # LoRA adapter name (used as final model name in request)
score: 1.0
use_reasoning: true
reasoning_effort: medium

- name: medical
description: "Medical and healthcare questions"
system_prompt: "You are a medical expert with comprehensive knowledge of anatomy, physiology, diseases, treatments, and healthcare practices. Provide accurate medical information while emphasizing that responses are for educational purposes only and not a substitute for professional medical advice."
model_scores:
- model: llama2-7b
lora_name: medical-lora # Different LoRA adapter for medical domain
score: 1.0
use_reasoning: true
reasoning_effort: high

- name: legal
description: "Legal questions and law-related topics"
system_prompt: "You are a legal expert with knowledge of legal principles, case law, and statutory interpretation. Provide accurate legal information while clearly stating that responses are for informational purposes only and do not constitute legal advice."
model_scores:
- model: llama2-7b
lora_name: legal-lora # Different LoRA adapter for legal domain
score: 1.0
use_reasoning: true
reasoning_effort: high

- name: general
description: "General questions that don't fit specific domains"
system_prompt: "You are a helpful AI assistant with broad knowledge across many topics. Provide clear, accurate, and helpful responses."
model_scores:
- model: llama2-7b # No lora_name specified - uses base model
score: 0.8
use_reasoning: false

# Default model for fallback
default_model: llama2-7b

# Benefits of LoRA Routing:
# 1. Domain-Specific Expertise: Each LoRA adapter is fine-tuned for specific domains
# 2. Cost Efficiency: Share base model weights across adapters, reducing memory footprint
# 3. Easy A/B Testing: Gradually roll out new adapters by adjusting scores
# 4. Flexible Deployment: Add/remove adapters without restarting the router
# 5. Performance: vLLM efficiently serves multiple LoRA adapters with minimal overhead
#
# Use Cases:
# - Multi-domain chatbots (technical support, medical advice, legal information)
# - Task-specific optimization (code generation, summarization, translation)
# - Language-specific adapters for multilingual systems
# - Customer-specific adapters for personalized experiences
# - Version testing (compare different adapter versions)

18 changes: 16 additions & 2 deletions src/semantic-router/pkg/classification/classifier.go
Original file line number Diff line number Diff line change
Expand Up @@ -984,7 +984,15 @@ func (c *Classifier) selectBestModelInternal(cat *config.Category, modelFilter f
if modelFilter != nil && !modelFilter(model) {
return
}
c.updateBestModel(modelScore.Score, model, &bestScore, &bestModel)
// Use LoRA name if specified, otherwise use the base model name
// This enables intent-aware LoRA routing where the final model name
// in the request becomes the LoRA adapter name
finalModelName := model
if modelScore.LoRAName != "" {
finalModelName = modelScore.LoRAName
logging.Debugf("Using LoRA adapter '%s' for base model '%s'", finalModelName, model)
}
c.updateBestModel(modelScore.Score, finalModelName, &bestScore, &bestModel)
})

return bestModel, bestScore
Expand Down Expand Up @@ -1024,13 +1032,19 @@ func (c *Classifier) SelectBestModelFromList(candidateModels []string, categoryN
}

// GetModelsForCategory returns all models that are configured for the given category
// If a ModelScore has a LoRAName specified, the LoRA name is returned instead of the base model name
func (c *Classifier) GetModelsForCategory(categoryName string) []string {
var models []string

for _, category := range c.Config.Categories {
if strings.EqualFold(category.Name, categoryName) {
for _, modelScore := range category.ModelScores {
models = append(models, modelScore.Model)
// Use LoRA name if specified, otherwise use the base model name
if modelScore.LoRAName != "" {
models = append(models, modelScore.LoRAName)
} else {
models = append(models, modelScore.Model)
}
}
break
}
Expand Down
17 changes: 17 additions & 0 deletions src/semantic-router/pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,18 @@ type ModelParams struct {
// Reasoning family for this model (e.g., "deepseek", "qwen3", "gpt-oss")
// If empty, the model doesn't support reasoning mode
ReasoningFamily string `yaml:"reasoning_family,omitempty"`

// LoRA adapters available for this model
// These must be registered with vLLM using --lora-modules flag
LoRAs []LoRAAdapter `yaml:"loras,omitempty"`
}

// LoRAAdapter represents a LoRA adapter configuration for a model
type LoRAAdapter struct {
// Name of the LoRA adapter (must match the name registered with vLLM)
Name string `yaml:"name"`
// Description of what this LoRA adapter is optimized for
Description string `yaml:"description,omitempty"`
}

// ReasoningFamilyConfig defines how a reasoning family handles reasoning mode
Expand Down Expand Up @@ -426,6 +438,11 @@ type Category struct {
type ModelScore struct {
Model string `yaml:"model"`
Score float64 `yaml:"score"`
// Optional LoRA adapter name - when specified, this LoRA adapter name will be used
// as the final model name in requests instead of the base model name.
// This enables intent-aware LoRA routing where different LoRA adapters can be
// selected based on the classified category.
LoRAName string `yaml:"lora_name,omitempty"`
// Reasoning mode control on Model Level
ModelReasoningControl `yaml:",inline"`
}
Expand Down
35 changes: 35 additions & 0 deletions src/semantic-router/pkg/config/validator.go
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,13 @@ func validateConfigStructure(cfg *RouterConfig) error {
if modelScore.UseReasoning == nil {
return fmt.Errorf("category '%s', model '%s': missing required field 'use_reasoning'", category.Name, modelScore.Model)
}

// Validate LoRA name if specified
if modelScore.LoRAName != "" {
if err := validateLoRAName(cfg, modelScore.Model, modelScore.LoRAName); err != nil {
return fmt.Errorf("category '%s', model '%s': %w", category.Name, modelScore.Model, err)
}
}
}
}

Expand All @@ -112,3 +119,31 @@ func validateConfigStructure(cfg *RouterConfig) error {

return nil
}

// validateLoRAName checks if the specified LoRA name is defined in the model's configuration
func validateLoRAName(cfg *RouterConfig, modelName string, loraName string) error {
// Check if the model exists in model_config
modelParams, exists := cfg.ModelConfig[modelName]
if !exists {
return fmt.Errorf("lora_name '%s' specified but model '%s' is not defined in model_config", loraName, modelName)
}

// Check if the model has any LoRAs defined
if len(modelParams.LoRAs) == 0 {
return fmt.Errorf("lora_name '%s' specified but model '%s' has no loras defined in model_config", loraName, modelName)
}

// Check if the specified LoRA name exists in the model's LoRA list
for _, lora := range modelParams.LoRAs {
if lora.Name == loraName {
return nil // Valid LoRA name found
}
}

// LoRA name not found, provide helpful error message
availableLoRAs := make([]string, len(modelParams.LoRAs))
for i, lora := range modelParams.LoRAs {
availableLoRAs[i] = lora.Name
}
return fmt.Errorf("lora_name '%s' is not defined in model '%s' loras. Available LoRAs: %v", loraName, modelName, availableLoRAs)
}
98 changes: 97 additions & 1 deletion website/docs/overview/categories/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,62 @@ categories:
- **0.4-0.5**: Adequate capability
- **0.0-0.3**: Poor capability, avoid if possible

#### `lora_name` (Optional)

- **Type**: String
- **Description**: LoRA adapter name to use for this model
- **Purpose**: Enable intent-aware LoRA routing
- **Validation**: Must be defined in the model's `loras` list in `model_config`

When specified, the `lora_name` becomes the final model name in requests to vLLM, enabling automatic routing to LoRA adapters based on classified intent.

```yaml
# First, define available LoRA adapters in model_config
model_config:
"llama2-7b":
reasoning_family: "llama2"
preferred_endpoints: ["vllm-primary"]
loras:
- name: "technical-lora"
description: "Optimized for technical questions"
- name: "medical-lora"
description: "Specialized for medical domain"

# Then reference them in categories
categories:
- name: "technical"
model_scores:
- model: "llama2-7b" # Base model (for endpoint selection)
lora_name: "technical-lora" # LoRA adapter name (final model name)
score: 1.0
```

**How LoRA Routing Works**:

1. LoRA adapters are defined in `model_config` under the base model
2. Request is classified into a category (e.g., "technical")
3. Router selects the best `ModelScore` for that category
4. Configuration validator ensures `lora_name` is defined in model's `loras` list
5. If `lora_name` is specified, it replaces the base model name
6. Request is sent to vLLM with `model="technical-lora"`
7. vLLM automatically routes to the appropriate LoRA adapter

**Prerequisites**:

- vLLM server must be started with `--enable-lora` flag
- LoRA adapters must be registered using `--lora-modules` parameter
- LoRA names must be defined in `model_config` before use in `model_scores`

**Benefits**:

- **Domain Expertise**: Fine-tuned adapters for specific domains
- **Cost Efficiency**: Share base model weights across adapters
- **Easy A/B Testing**: Compare adapter versions by adjusting scores
- **Flexible Deployment**: Add/remove adapters without router restart
- **Configuration Validation**: Prevents typos and missing LoRA definitions

See [LoRA Routing Example](https://github.com/vllm-project/semantic-router/blob/main/config/intelligent-routing/in-tree/lora_routing.yaml) for complete configuration.

## Complete Configuration Examples

### Example 1: STEM Category (Reasoning Enabled)
Expand Down Expand Up @@ -261,7 +317,47 @@ categories:
score: 0.2
```

### Example 3: Security-Focused Configuration (Jailbreak Protection)
### Example 3: Intent-Aware LoRA Routing

```yaml
# Define LoRA adapters in model_config first
model_config:
"llama2-7b":
reasoning_family: "llama2"
preferred_endpoints: ["vllm-primary"]
loras:
- name: "technical-lora"
description: "Optimized for technical questions"
- name: "medical-lora"
description: "Specialized for medical domain"

# Then reference them in categories
categories:
- name: "technical"
description: "Programming and technical questions"
model_scores:
- model: "llama2-7b"
lora_name: "technical-lora" # Routes to technical LoRA adapter
score: 1.0
use_reasoning: true

- name: "medical"
description: "Medical and healthcare questions"
model_scores:
- model: "llama2-7b"
lora_name: "medical-lora" # Routes to medical LoRA adapter
score: 1.0
use_reasoning: true

- name: "general"
description: "General questions"
model_scores:
- model: "llama2-7b" # No lora_name - uses base model
score: 0.8
use_reasoning: false
```

### Example 4: Security-Focused Configuration (Jailbreak Protection)

```yaml
categories:
Expand Down
Loading
Loading