Skip to content

Commit b62ba97

Browse files
committed
docs(installation): update model_config examples and clarify vLLM backend setup
Signed-off-by: samzong <[email protected]>
1 parent 519d4d8 commit b62ba97

File tree

2 files changed

+111
-43
lines changed

2 files changed

+111
-43
lines changed

website/docs/installation/configuration.md

Lines changed: 36 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,17 @@ model_config:
6060
allow_by_default: true
6161
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON"]
6262
preferred_endpoints: ["endpoint1"]
63+
# Example: DeepSeek model with custom name
64+
"ds-v31-custom":
65+
reasoning_family: "deepseek" # Uses DeepSeek reasoning syntax
66+
preferred_endpoints: ["endpoint1"]
67+
# Example: Qwen3 model with custom name
68+
"my-qwen3-model":
69+
reasoning_family: "qwen3" # Uses Qwen3 reasoning syntax
70+
preferred_endpoints: ["endpoint2"]
71+
# Example: Model without reasoning support
72+
"phi4":
73+
preferred_endpoints: ["endpoint1"]
6374

6475
# Classification models
6576
classifier:
@@ -154,24 +165,10 @@ reasoning_families:
154165
# Global default reasoning effort level
155166
default_reasoning_effort: "medium"
156167

157-
# Model configurations - assign reasoning families to specific models
158-
model_config:
159-
# Example: DeepSeek model with custom name
160-
"ds-v31-custom":
161-
reasoning_family: "deepseek" # This model uses DeepSeek reasoning syntax
162-
preferred_endpoints: ["endpoint1"]
163-
164-
# Example: Qwen3 model with custom name
165-
"my-qwen3-model":
166-
reasoning_family: "qwen3" # This model uses Qwen3 reasoning syntax
167-
preferred_endpoints: ["endpoint2"]
168-
169-
# Example: Model without reasoning support
170-
"phi4":
171-
# No reasoning_family field - this model doesn't support reasoning mode
172-
preferred_endpoints: ["endpoint1"]
173168
```
174169
170+
Assign reasoning families inside the same `model_config` block above—use `reasoning_family` per model (see `ds-v31-custom` and `my-qwen3-model` in the example). Models without reasoning syntax simply omit the field (e.g., `phi4`).
171+
175172
## Configuration Recipes (presets)
176173

177174
We provide curated, versioned presets you can use directly or as a starting point:
@@ -205,6 +202,23 @@ vllm_endpoints:
205202
model_config:
206203
"llama2-7b": # Model name - must match vLLM --served-model-name
207204
preferred_endpoints: ["my_endpoint"]
205+
"qwen3": # Another model served by the same endpoint
206+
preferred_endpoints: ["my_endpoint"]
207+
```
208+
209+
### Example: Llama / Qwen Backend Configuration
210+
211+
```yaml
212+
vllm_endpoints:
213+
- name: "local-vllm"
214+
address: "127.0.0.1"
215+
port: 8000
216+
217+
model_config:
218+
"llama2-7b":
219+
preferred_endpoints: ["local-vllm"]
220+
"qwen3":
221+
preferred_endpoints: ["local-vllm"]
208222
```
209223

210224
#### Address Format Requirements
@@ -240,20 +254,19 @@ address: "127.0.0.1:8080" # ❌ Use separate 'port' field
240254

241255
#### Model Name Consistency
242256

243-
The model names in the `models` array must **exactly match** the `--served-model-name` parameter used when starting your vLLM server:
257+
Model names in `model_config` must **exactly match** the `--served-model-name` parameter used when starting your vLLM server:
244258

245259
```bash
246-
# vLLM server command:
247-
vllm serve meta-llama/Llama-2-7b-hf --served-model-name llama2-7b
260+
# vLLM server command (examples):
261+
vllm serve meta-llama/Llama-2-7b-hf --served-model-name llama2-7b --port 8000
262+
vllm serve Qwen/Qwen3-1.8B --served-model-name qwen3 --port 8000
248263
249264
# config.yaml must reference the model in model_config:
250265
model_config:
251266
"llama2-7b": # ✅ Matches --served-model-name
252267
preferred_endpoints: ["your-endpoint"]
253-
254-
vllm_endpoints:
255-
"llama2-7b": # ✅ Matches --served-model-name
256-
# ... configuration
268+
"qwen3": # ✅ Matches --served-model-name
269+
preferred_endpoints: ["your-endpoint"]
257270
```
258271

259272
### Model Settings

website/docs/installation/installation.md

Lines changed: 75 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ No GPU required - the router runs efficiently on CPU using optimized BERT models
1414

1515
Semantic Router depends on the following software:
1616

17-
- **Go**: V1.24.1 or higher (matches the module requirements)
18-
- **Rust**: V1.90.0 or higher (for Candle bindings)
19-
- **Python**: V3.8 or higher (for model downloads)
20-
- **HuggingFace CLI**: Required for fetching models (`pip install huggingface_hub`)
17+
- **Go**: v1.24.1 or higher (matches the module requirements)
18+
- **Rust**: v1.90.0 or higher (for Candle bindings)
19+
- **Python**: v3.8 or higher (for model downloads)
20+
- **HuggingFace CLI**: Required for fetching models
2121

2222
## Local Installation
2323

@@ -102,7 +102,7 @@ This downloads the CPU-optimized BERT models for:
102102

103103
### 5. Configure Backend Endpoints
104104

105-
Edit `config/config.yaml` to point to your LLM endpoints:
105+
Edit `config/config.yaml` to point to your vLLM or OpenAI-compatible backend:
106106

107107
```yaml
108108
# Example: Configure your vLLM or Ollama endpoints
@@ -118,6 +118,8 @@ model_config:
118118
allow_by_default: false # Deny all PII by default
119119
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON", "GPE", "PHONE_NUMBER"] # Only allow these specific PII types
120120
preferred_endpoints: ["your-endpoint"]
121+
122+
default_model: "your-model-name"
121123
```
122124
123125
:::note[**Important: Address Format Requirements**]
@@ -138,26 +140,57 @@ The `address` field **must** contain a valid IP address (IPv4 or IPv6). Domain n
138140
:::
139141

140142
:::note[**Important: Model Name Consistency**]
141-
The model name in your configuration **must exactly match** the `--served-model-name` parameter used when starting your vLLM server:
143+
The model name in `model_config` must **exactly match** the `--served-model-name` used when starting vLLM. If they don't match, the router won't route requests to your model.
144+
145+
If `--served-model-name` is not set, you can also use the default `id` returned by `/v1/models` (e.g., `Qwen/Qwen3-1.8B`) as the key in `model_config` and for `default_model`.
146+
:::
147+
148+
#### Example: Llama Model
142149

143150
```bash
144-
# When starting vLLM server:
145-
vllm serve microsoft/phi-4 --port 11434 --served-model-name your-model-name
151+
# Start vLLM with Llama
152+
vllm serve meta-llama/Llama-2-7b-hf --port 8000 --served-model-name llama2-7b
153+
```
154+
155+
```yaml
156+
# config.yaml
157+
vllm_endpoints:
158+
- name: "llama-endpoint"
159+
address: "127.0.0.1"
160+
port: 8000
161+
weight: 1
146162
147-
# The config.yaml must reference the model in model_config:
148163
model_config:
149-
"your-model-name": # ✅ Must match --served-model-name
150-
preferred_endpoints: ["your-endpoint"]
164+
"llama2-7b": # Must match --served-model-name
165+
preferred_endpoints: ["llama-endpoint"]
151166
152-
vllm_endpoints:
153-
"your-model-name": # ✅ Must match --served-model-name
154-
# ... configuration
167+
default_model: "llama2-7b"
155168
```
156169

157-
If these names don't match, the router won't be able to route requests to your model.
170+
#### Example: Qwen Model
158171

159-
The default configuration includes example endpoints that you should update for your setup.
160-
:::
172+
```bash
173+
# Start vLLM with Qwen
174+
vllm serve Qwen/Qwen3-1.8B --port 8000 --served-model-name qwen3
175+
```
176+
177+
```yaml
178+
# config.yaml
179+
vllm_endpoints:
180+
- name: "qwen-endpoint"
181+
address: "127.0.0.1"
182+
port: 8000
183+
weight: 1
184+
185+
model_config:
186+
"qwen3": # Must match --served-model-name
187+
reasoning_family: "qwen3" # Enable Qwen3 reasoning syntax
188+
preferred_endpoints: ["qwen-endpoint"]
189+
190+
default_model: "qwen3"
191+
```
192+
193+
For more configuration options, see the [Configuration Guide](configuration.md).
161194

162195
## Running the Router
163196

@@ -192,10 +225,32 @@ curl -X POST http://localhost:8801/v1/chat/completions \
192225
}'
193226
```
194227

195-
:::tip[VSR Decision Tracking]
196-
The router automatically adds response headers (`x-vsr-selected-category`, `x-vsr-selected-reasoning`, `x-vsr-selected-model`) to help you understand how requests are being processed. Use `curl -i` to see these headers in action. See [VSR Headers Documentation](../troubleshooting/vsr-headers.md) for details.
228+
Using `"model": "MoM"` (Mixture of Models) lets the router automatically select the best model based on the query category.
229+
230+
:::tip[VSR Decision Headers]
231+
Use `curl -i` to see routing decision headers (`x-vsr-selected-category`, `x-vsr-selected-model`). See [VSR Headers](../troubleshooting/vsr-headers.md) for details.
197232
:::
198233

234+
### 3. Monitoring (Optional)
235+
236+
By default, the router exposes Prometheus metrics at `:9190/metrics`. To disable monitoring:
237+
238+
**Option A: CLI flag**
239+
240+
```bash
241+
./bin/router -metrics-port=0
242+
```
243+
244+
**Option B: Configuration**
245+
246+
```yaml
247+
observability:
248+
metrics:
249+
enabled: false
250+
```
251+
252+
When disabled, the `/metrics` endpoint won't start, but all other functionality remains unaffected.
253+
199254
## Next Steps
200255

201256
After successful installation:
@@ -206,7 +261,7 @@ After successful installation:
206261

207262
## Getting Help
208263

209-
- **Issues**: Report bugs on [GitHub Issues](https://github.com/your-org/semantic-router/issues)
264+
- **Issues**: Report bugs on [GitHub Issues](https://github.com/vllm-project/semantic-router/issues)
210265
- **Documentation**: Full documentation at [Read the Docs](https://vllm-semantic-router.com/)
211266

212267
You now have a working Semantic Router that runs entirely on CPU and intelligently routes requests to specialized models!

0 commit comments

Comments
 (0)