| 
 | 1 | +# LoRA Routing  | 
 | 2 | + | 
 | 3 | +This guide shows how to enable intent-aware LoRA (Low-Rank Adaptation) routing in the Semantic Router:  | 
 | 4 | + | 
 | 5 | +- Minimal configuration for LoRA routing  | 
 | 6 | +- vLLM server setup with LoRA adapters  | 
 | 7 | +- Example request/response showing automatic LoRA selection  | 
 | 8 | +- Verification steps  | 
 | 9 | + | 
 | 10 | +## Prerequisites  | 
 | 11 | + | 
 | 12 | +- A running vLLM server with LoRA support enabled  | 
 | 13 | +- LoRA adapter files (fine-tuned for specific domains)  | 
 | 14 | +- Envoy + the router (see [Start the router](../../getting-started/quickstart.md) section)  | 
 | 15 | + | 
 | 16 | +## 1. Start vLLM with LoRA Adapters  | 
 | 17 | + | 
 | 18 | +First, start your vLLM server with LoRA support enabled:  | 
 | 19 | + | 
 | 20 | +```bash  | 
 | 21 | +vllm serve meta-llama/Llama-2-7b-hf \  | 
 | 22 | +  --enable-lora \  | 
 | 23 | +  --lora-modules \  | 
 | 24 | +    technical-lora=/path/to/technical-adapter \  | 
 | 25 | +    medical-lora=/path/to/medical-adapter \  | 
 | 26 | +    legal-lora=/path/to/legal-adapter \  | 
 | 27 | +  --host 0.0.0.0 \  | 
 | 28 | +  --port 8000  | 
 | 29 | +```  | 
 | 30 | + | 
 | 31 | +**Key flags**:  | 
 | 32 | + | 
 | 33 | +- `--enable-lora`: Enables LoRA adapter support  | 
 | 34 | +- `--lora-modules`: Registers LoRA adapters with their names and paths  | 
 | 35 | +- Format: `adapter-name=/path/to/adapter`  | 
 | 36 | + | 
 | 37 | +## 2. Minimal Configuration  | 
 | 38 | + | 
 | 39 | +Put this in `config/config.yaml` (or merge into your existing config):  | 
 | 40 | + | 
 | 41 | +```yaml  | 
 | 42 | +# Category classifier (required for intent detection)  | 
 | 43 | +classifier:  | 
 | 44 | +  category_model:  | 
 | 45 | +    model_id: "models/category_classifier_modernbert-base_model"  | 
 | 46 | +    use_modernbert: true  | 
 | 47 | +    threshold: 0.6  | 
 | 48 | +    use_cpu: true  | 
 | 49 | +    category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"  | 
 | 50 | + | 
 | 51 | +# vLLM endpoint hosting your base model + LoRA adapters  | 
 | 52 | +vllm_endpoints:  | 
 | 53 | +  - name: "vllm-primary"  | 
 | 54 | +    address: "127.0.0.1"  | 
 | 55 | +    port: 8000  | 
 | 56 | +    weight: 1  | 
 | 57 | + | 
 | 58 | +# Define base model and available LoRA adapters  | 
 | 59 | +model_config:  | 
 | 60 | +  "llama2-7b":  | 
 | 61 | +    reasoning_family: "llama2"  | 
 | 62 | +    preferred_endpoints: ["vllm-primary"]  | 
 | 63 | +    # IMPORTANT: Define all available LoRA adapters here  | 
 | 64 | +    loras:  | 
 | 65 | +      - name: "technical-lora"  | 
 | 66 | +        description: "Optimized for programming and technical questions"  | 
 | 67 | +      - name: "medical-lora"  | 
 | 68 | +        description: "Specialized for medical and healthcare domain"  | 
 | 69 | +      - name: "legal-lora"  | 
 | 70 | +        description: "Fine-tuned for legal questions"  | 
 | 71 | + | 
 | 72 | +# Default model for fallback  | 
 | 73 | +default_model: "llama2-7b"  | 
 | 74 | + | 
 | 75 | +# Categories with LoRA routing  | 
 | 76 | +categories:  | 
 | 77 | +  - name: "technical"  | 
 | 78 | +    description: "Programming, software engineering, and technical questions"  | 
 | 79 | +    system_prompt: "You are an expert software engineer."  | 
 | 80 | +    model_scores:  | 
 | 81 | +      - model: "llama2-7b"           # Base model name  | 
 | 82 | +        lora_name: "technical-lora"  # LoRA adapter to use  | 
 | 83 | +        score: 1.0  | 
 | 84 | +        use_reasoning: true  | 
 | 85 | +        reasoning_effort: "medium"  | 
 | 86 | + | 
 | 87 | +  - name: "medical"  | 
 | 88 | +    description: "Medical and healthcare questions"  | 
 | 89 | +    system_prompt: "You are a medical expert."  | 
 | 90 | +    model_scores:  | 
 | 91 | +      - model: "llama2-7b"  | 
 | 92 | +        lora_name: "medical-lora"    # Different LoRA for medical  | 
 | 93 | +        score: 1.0  | 
 | 94 | +        use_reasoning: true  | 
 | 95 | +        reasoning_effort: "high"  | 
 | 96 | + | 
 | 97 | +  - name: "legal"  | 
 | 98 | +    description: "Legal questions and law-related topics"  | 
 | 99 | +    system_prompt: "You are a legal expert."  | 
 | 100 | +    model_scores:  | 
 | 101 | +      - model: "llama2-7b"  | 
 | 102 | +        lora_name: "legal-lora"      # Different LoRA for legal  | 
 | 103 | +        score: 1.0  | 
 | 104 | +        use_reasoning: true  | 
 | 105 | +        reasoning_effort: "high"  | 
 | 106 | + | 
 | 107 | +  - name: "general"  | 
 | 108 | +    description: "General questions"  | 
 | 109 | +    system_prompt: "You are a helpful assistant."  | 
 | 110 | +    model_scores:  | 
 | 111 | +      - model: "llama2-7b"           # No lora_name = uses base model  | 
 | 112 | +        score: 0.8  | 
 | 113 | +        use_reasoning: false  | 
 | 114 | +```  | 
 | 115 | +
  | 
 | 116 | +## 3. How It Works  | 
 | 117 | +
  | 
 | 118 | +```mermaid  | 
 | 119 | +graph TB  | 
 | 120 | +    A[User Query] --> B[Semantic Router]  | 
 | 121 | +    B --> C[Category Classifier]  | 
 | 122 | +      | 
 | 123 | +    C --> D{Classified Category}  | 
 | 124 | +    D -->|Technical| E[technical-lora]  | 
 | 125 | +    D -->|Medical| F[medical-lora]  | 
 | 126 | +    D -->|Legal| G[legal-lora]  | 
 | 127 | +    D -->|General| H[llama2-7b base]  | 
 | 128 | +      | 
 | 129 | +    E --> I[vLLM Server]  | 
 | 130 | +    F --> I  | 
 | 131 | +    G --> I  | 
 | 132 | +    H --> I  | 
 | 133 | +      | 
 | 134 | +    I --> J[Response]  | 
 | 135 | +```  | 
 | 136 | + | 
 | 137 | +**Flow**:  | 
 | 138 | + | 
 | 139 | +1. User sends a query to the router  | 
 | 140 | +2. Category classifier detects the intent (e.g., "technical")  | 
 | 141 | +3. Router looks up the best `ModelScore` for that category  | 
 | 142 | +4. If `lora_name` is specified, it becomes the final model name  | 
 | 143 | +5. Request is sent to vLLM with `model="technical-lora"`  | 
 | 144 | +6. vLLM routes to the appropriate LoRA adapter  | 
 | 145 | +7. Response is returned to the user  | 
 | 146 | + | 
 | 147 | +## 4. Example Request/Response  | 
 | 148 | + | 
 | 149 | +### Technical Query (Routes to technical-lora)  | 
 | 150 | + | 
 | 151 | +**Request**:  | 
 | 152 | + | 
 | 153 | +```bash  | 
 | 154 | +curl -X POST http://localhost:8080/v1/chat/completions \  | 
 | 155 | +  -H "Content-Type: application/json" \  | 
 | 156 | +  -d '{  | 
 | 157 | +    "model": "llama2-7b",  | 
 | 158 | +    "messages": [  | 
 | 159 | +      {  | 
 | 160 | +        "role": "user",  | 
 | 161 | +        "content": "How do I implement a binary search tree in Python?"  | 
 | 162 | +      }  | 
 | 163 | +    ]  | 
 | 164 | +  }'  | 
 | 165 | +```  | 
 | 166 | + | 
 | 167 | +**Router Logs** (showing LoRA selection):  | 
 | 168 | + | 
 | 169 | +```log  | 
 | 170 | +[INFO] Classified query into category: technical (confidence: 0.92)  | 
 | 171 | +[INFO] Selected model for category 'technical': technical-lora (score: 1.0)  | 
 | 172 | +[DEBUG] Using LoRA adapter 'technical-lora' for base model 'llama2-7b'  | 
 | 173 | +[INFO] Forwarding request to vLLM with model='technical-lora'  | 
 | 174 | +```  | 
 | 175 | + | 
 | 176 | +### Medical Query (Routes to medical-lora)  | 
 | 177 | + | 
 | 178 | +**Request**:  | 
 | 179 | + | 
 | 180 | +```bash  | 
 | 181 | +curl -X POST http://localhost:8080/v1/chat/completions \  | 
 | 182 | +  -H "Content-Type: application/json" \  | 
 | 183 | +  -d '{  | 
 | 184 | +    "model": "llama2-7b",  | 
 | 185 | +    "messages": [  | 
 | 186 | +      {  | 
 | 187 | +        "role": "user",  | 
 | 188 | +        "content": "What are the symptoms of type 2 diabetes?"  | 
 | 189 | +      }  | 
 | 190 | +    ]  | 
 | 191 | +  }'  | 
 | 192 | +```  | 
 | 193 | + | 
 | 194 | +**Router Logs**:  | 
 | 195 | + | 
 | 196 | +```log  | 
 | 197 | +[INFO] Classified query into category: medical (confidence: 0.89)  | 
 | 198 | +[INFO] Selected model for category 'medical': medical-lora (score: 1.0)  | 
 | 199 | +[DEBUG] Using LoRA adapter 'medical-lora' for base model 'llama2-7b'  | 
 | 200 | +[INFO] Forwarding request to vLLM with model='medical-lora'  | 
 | 201 | +```  | 
 | 202 | + | 
 | 203 | +## 5. Verification  | 
 | 204 | + | 
 | 205 | +### Check LoRA Adapter Selection  | 
 | 206 | + | 
 | 207 | +Enable debug logging to see which LoRA adapter is selected:  | 
 | 208 | + | 
 | 209 | +```bash  | 
 | 210 | +# In your router startup command, add:  | 
 | 211 | +--log-level debug  | 
 | 212 | +```  | 
 | 213 | + | 
 | 214 | +Look for log lines like:  | 
 | 215 | + | 
 | 216 | +```log  | 
 | 217 | +[DEBUG] Using LoRA adapter 'technical-lora' for base model 'llama2-7b'  | 
 | 218 | +```  | 
 | 219 | + | 
 | 220 | +### Test Category Classification  | 
 | 221 | + | 
 | 222 | +Send test queries and verify they're classified correctly:  | 
 | 223 | + | 
 | 224 | +```bash  | 
 | 225 | +# Technical query  | 
 | 226 | +curl -X POST http://localhost:8080/v1/chat/completions \  | 
 | 227 | +  -H "Content-Type: application/json" \  | 
 | 228 | +  -d '{"model": "llama2-7b", "messages": [{"role": "user", "content": "Explain async/await in JavaScript"}]}'  | 
 | 229 | + | 
 | 230 | +# Medical query  | 
 | 231 | +curl -X POST http://localhost:8080/v1/chat/completions \  | 
 | 232 | +  -H "Content-Type: application/json" \  | 
 | 233 | +  -d '{"model": "llama2-7b", "messages": [{"role": "user", "content": "What causes high blood pressure?"}]}'  | 
 | 234 | +```  | 
 | 235 | + | 
 | 236 | +Check the router logs to confirm the correct LoRA adapter is selected for each query.  | 
 | 237 | + | 
 | 238 | + | 
 | 239 | +## Benefits  | 
 | 240 | + | 
 | 241 | +- **Domain Expertise**: Each LoRA adapter is fine-tuned for specific domains  | 
 | 242 | +- **Cost Efficiency**: Share base model weights across adapters (lower memory usage)  | 
 | 243 | +- **Easy A/B Testing**: Compare adapter versions by adjusting scores  | 
 | 244 | +- **Flexible Deployment**: Add/remove adapters without restarting the router  | 
 | 245 | +- **Automatic Selection**: Users don't need to know which adapter to use  | 
 | 246 | + | 
 | 247 | +## Next Steps  | 
 | 248 | + | 
 | 249 | +- See [complete LoRA routing example](https://github.com/vllm-project/semantic-router/blob/main/config/intelligent-routing/in-tree/lora_routing_example.yaml)  | 
 | 250 | +- Learn about [category configuration](../../overview/categories/configuration.md#lora_name-optional)  | 
 | 251 | +- Explore [reasoning routing](./reasoning.md) to combine with LoRA adapters  | 
0 commit comments