You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Model configurations - assign reasoning families to specific models
158
-
model_config:
159
-
# Example: DeepSeek model with custom name
160
-
"ds-v31-custom":
161
-
reasoning_family: "deepseek"# This model uses DeepSeek reasoning syntax
162
-
preferred_endpoints: ["endpoint1"]
163
-
164
-
# Example: Qwen3 model with custom name
165
-
"my-qwen3-model":
166
-
reasoning_family: "qwen3"# This model uses Qwen3 reasoning syntax
167
-
preferred_endpoints: ["endpoint2"]
168
-
169
-
# Example: Model without reasoning support
170
-
"phi4":
171
-
# No reasoning_family field - this model doesn't support reasoning mode
172
-
preferred_endpoints: ["endpoint1"]
173
168
```
174
169
170
+
Assign reasoning families inside the same `model_config` block above—use `reasoning_family` per model (see `ds-v31-custom` and `my-qwen3-model` in the example). Models without reasoning syntax simply omit the field (e.g., `phi4`).
171
+
175
172
## Configuration Recipes (presets)
176
173
177
174
We provide curated, versioned presets you can use directly or as a starting point:
@@ -205,6 +202,23 @@ vllm_endpoints:
205
202
model_config:
206
203
"llama2-7b": # Model name - must match vLLM --served-model-name
207
204
preferred_endpoints: ["my_endpoint"]
205
+
"qwen3": # Another model served by the same endpoint
206
+
preferred_endpoints: ["my_endpoint"]
207
+
```
208
+
209
+
### Example: Llama / Qwen Backend Configuration
210
+
211
+
```yaml
212
+
vllm_endpoints:
213
+
- name: "local-vllm"
214
+
address: "127.0.0.1"
215
+
port: 8000
216
+
217
+
model_config:
218
+
"llama2-7b":
219
+
preferred_endpoints: ["local-vllm"]
220
+
"qwen3":
221
+
preferred_endpoints: ["local-vllm"]
208
222
```
209
223
210
224
#### Address Format Requirements
@@ -240,20 +254,19 @@ address: "127.0.0.1:8080" # ❌ Use separate 'port' field
240
254
241
255
#### Model Name Consistency
242
256
243
-
The model names in the `models` array must **exactly match** the `--served-model-name` parameter used when starting your vLLM server:
257
+
Model names in `model_config` must **exactly match** the `--served-model-name` parameter used when starting your vLLM server:
Copy file name to clipboardExpand all lines: website/docs/installation/installation.md
+75-20Lines changed: 75 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,10 +14,10 @@ No GPU required - the router runs efficiently on CPU using optimized BERT models
14
14
15
15
Semantic Router depends on the following software:
16
16
17
-
-**Go**: V1.24.1 or higher (matches the module requirements)
18
-
-**Rust**: V1.90.0 or higher (for Candle bindings)
19
-
-**Python**: V3.8 or higher (for model downloads)
20
-
-**HuggingFace CLI**: Required for fetching models (`pip install huggingface_hub`)
17
+
-**Go**: v1.24.1 or higher (matches the module requirements)
18
+
-**Rust**: v1.90.0 or higher (for Candle bindings)
19
+
-**Python**: v3.8 or higher (for model downloads)
20
+
-**HuggingFace CLI**: Required for fetching models
21
21
22
22
## Local Installation
23
23
@@ -102,7 +102,7 @@ This downloads the CPU-optimized BERT models for:
102
102
103
103
### 5. Configure Backend Endpoints
104
104
105
-
Edit `config/config.yaml` to point to your LLM endpoints:
105
+
Edit `config/config.yaml` to point to your vLLM or OpenAI-compatible backend:
106
106
107
107
```yaml
108
108
# Example: Configure your vLLM or Ollama endpoints
@@ -118,6 +118,8 @@ model_config:
118
118
allow_by_default: false # Deny all PII by default
119
119
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON", "GPE", "PHONE_NUMBER"] # Only allow these specific PII types
120
120
preferred_endpoints: ["your-endpoint"]
121
+
122
+
default_model: "your-model-name"
121
123
```
122
124
123
125
:::note[**Important: Address Format Requirements**]
@@ -138,26 +140,57 @@ The `address` field **must** contain a valid IP address (IPv4 or IPv6). Domain n
138
140
:::
139
141
140
142
:::note[**Important: Model Name Consistency**]
141
-
The model name in your configuration **must exactly match** the `--served-model-name` parameter used when starting your vLLM server:
143
+
The model name in `model_config` must **exactly match** the `--served-model-name` used when starting vLLM. If they don't match, the router won't route requests to your model.
144
+
145
+
If `--served-model-name` is not set, you can also use the default `id` returned by `/v1/models` (e.g., `Qwen/Qwen3-1.8B`) as the key in `model_config` and for `default_model`.
For more configuration options, see the [Configuration Guide](configuration.md).
161
194
162
195
## Running the Router
163
196
@@ -192,10 +225,32 @@ curl -X POST http://localhost:8801/v1/chat/completions \
192
225
}'
193
226
```
194
227
195
-
:::tip[VSR Decision Tracking]
196
-
The router automatically adds response headers (`x-vsr-selected-category`, `x-vsr-selected-reasoning`, `x-vsr-selected-model`) to help you understand how requests are being processed. Use `curl -i` to see these headers in action. See [VSR Headers Documentation](../troubleshooting/vsr-headers.md) for details.
228
+
Using `"model": "MoM"` (Mixture of Models) lets the router automatically select the best model based on the query category.
229
+
230
+
:::tip[VSR Decision Headers]
231
+
Use `curl -i` to see routing decision headers (`x-vsr-selected-category`, `x-vsr-selected-model`). See [VSR Headers](../troubleshooting/vsr-headers.md) for details.
197
232
:::
198
233
234
+
### 3. Monitoring (Optional)
235
+
236
+
By default, the router exposes Prometheus metrics at `:9190/metrics`. To disable monitoring:
237
+
238
+
**Option A: CLI flag**
239
+
240
+
```bash
241
+
./bin/router -metrics-port=0
242
+
```
243
+
244
+
**Option B: Configuration**
245
+
246
+
```yaml
247
+
observability:
248
+
metrics:
249
+
enabled: false
250
+
```
251
+
252
+
When disabled, the `/metrics` endpoint won't start, but all other functionality remains unaffected.
253
+
199
254
## Next Steps
200
255
201
256
After successful installation:
@@ -206,7 +261,7 @@ After successful installation:
206
261
207
262
## Getting Help
208
263
209
-
- **Issues**: Report bugs on [GitHub Issues](https://github.com/your-org/semantic-router/issues)
264
+
- **Issues**: Report bugs on [GitHub Issues](https://github.com/vllm-project/semantic-router/issues)
210
265
- **Documentation**: Full documentation at [Read the Docs](https://vllm-semantic-router.com/)
211
266
212
267
You now have a working Semantic Router that runs entirely on CPU and intelligently routes requests to specialized models!
0 commit comments