Skip to content

Commit a017191

Browse files
committed
fix: fix reasoning-routing-quickstart md styel
Signed-off-by: yuluo-yx <[email protected]>
1 parent f47db68 commit a017191

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

website/docs/getting-started/reasoning-routing-quickstart.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
# Reasoning Routing Quickstart
22

33
This short guide shows how to enable and verify “reasoning routing” in the Semantic Router:
4+
45
- Minimal config.yaml fields you need
56
- Example request/response (OpenAI-compatible)
67
- A comprehensive evaluation command you can run
78

89
Prerequisites
10+
911
- A running OpenAI-compatible backend for your models (e.g., vLLM or any OpenAI-compatible server). It must be reachable at the addresses you configure under vllm_endpoints (address:port).
1012
- Envoy + the router (see Start the router section)
1113

1214
1) Minimal configuration
1315
Put this in config/config.yaml (or merge into your existing config). It defines:
16+
1417
- Categories that require reasoning (e.g., math)
1518
- Reasoning families for model syntax differences (DeepSeek/Qwen3 use chat_template_kwargs; GPT-OSS/GPT use reasoning_effort)
1619
- Which concrete models use which reasoning family
@@ -84,6 +87,7 @@ default_model: qwen3-30b
8487
```
8588
8689
Notes
90+
8791
- Reasoning is controlled by categories.use_reasoning and optionally categories.reasoning_effort.
8892
- A model only gets reasoning fields if it has a model_config.&lt;MODEL&gt;.reasoning_family that maps to a reasoning_families entry.
8993
- DeepSeek/Qwen3 (chat_template_kwargs): the router injects chat_template_kwargs only when reasoning is enabled. When disabled, no chat_template_kwargs are added.
@@ -93,6 +97,7 @@ Notes
9397
9498
2) Start the router
9599
Option A: Local build + Envoy
100+
96101
- Download classifier models and mappings (required)
97102
- make download-models
98103
- Build and run the router
@@ -102,13 +107,15 @@ Option A: Local build + Envoy
102107
- func-e run --config-path config/envoy.yaml --component-log-level "ext_proc:trace,router:trace,http:trace"
103108
104109
Option B: Docker Compose
110+
105111
- docker compose up -d
106112
- Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router)
107113
108114
Note: Ensure your OpenAI-compatible backend is running and reachable (e.g., http://127.0.0.1:8000) so that vllm_endpoints address:port matches a live server. Without a running backend, routing will fail at the Envoy hop.
109115
110116
3) Send example requests
111117
Math (reasoning should be ON and effort high)
118+
112119
```bash
113120
curl -sS http://localhost:8801/v1/chat/completions \
114121
-H "Content-Type: application/json" \
@@ -122,6 +129,7 @@ curl -sS http://localhost:8801/v1/chat/completions \
122129
```
123130

124131
General (reasoning should be OFF)
132+
125133
```bash
126134
curl -sS http://localhost:8801/v1/chat/completions \
127135
-H "Content-Type: application/json" \
@@ -136,10 +144,12 @@ curl -sS http://localhost:8801/v1/chat/completions \
136144

137145
Verify routing via response headers
138146
The router does not inject routing metadata into the JSON body. Instead, inspect the response headers added by the router:
147+
139148
- X-Selected-Model
140149
- X-Semantic-Destination-Endpoint
141150

142151
Example:
152+
143153
```bash
144154
curl -i http://localhost:8801/v1/chat/completions \
145155
-H "Content-Type: application/json" \
@@ -159,6 +169,7 @@ curl -i http://localhost:8801/v1/chat/completions \
159169
You can benchmark the router vs a direct vLLM endpoint across categories using the included script. This runs a ReasoningBench based on MMLU-Pro and produces summaries and plots.
160170

161171
Quick start (router + vLLM):
172+
162173
```bash
163174
SAMPLES_PER_CATEGORY=25 \
164175
CONCURRENT_REQUESTS=4 \
@@ -168,6 +179,7 @@ VLLM_MODELS="openai/gpt-oss-20b" \
168179
```
169180

170181
Router-only benchmark:
182+
171183
```bash
172184
BENCHMARK_ROUTER_ONLY=true \
173185
SAMPLES_PER_CATEGORY=25 \
@@ -177,6 +189,7 @@ ROUTER_MODELS="auto" \
177189
```
178190

179191
Direct invocation (advanced):
192+
180193
```bash
181194
python bench/router_reason_bench.py \
182195
--run-router \
@@ -191,8 +204,8 @@ python bench/router_reason_bench.py \
191204
```
192205

193206
Tips
207+
194208
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see classifier.category_model.threshold) and that the target model has a reasoning_family.
195209
- For models without a reasoning_family, the router will not inject reasoning fields even when the category requires reasoning (this is by design to avoid invalid requests).
196210
- You can override the effort per category via categories.reasoning_effort or set a global default via default_reasoning_effort.
197211
- Ensure your OpenAI-compatible backend is reachable at the configured vllm_endpoints (address:port). If it’s not running, routing will fail even though the router and Envoy are up.
198-

0 commit comments

Comments
 (0)