Skip to content

Commit 927f1bc

Browse files
docs: add more details
Signed-off-by: Jintao Zhang <[email protected]>
1 parent 975797c commit 927f1bc

File tree

1 file changed

+40
-29
lines changed

1 file changed

+40
-29
lines changed

website/docs/getting-started/reasoning-routing-quickstart.md

Lines changed: 40 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,26 @@ This short guide shows how to enable and verify “reasoning routing” in the S
66
- A comprehensive evaluation command you can run
77

88
Prerequisites
9-
- A running OpenAI-compatible backend for your models (e.g., vLLM, OpenAI-compatible server)
9+
- A running OpenAI-compatible backend for your models (e.g., vLLM or any OpenAI-compatible server). It must be reachable at the addresses you configure under vllm_endpoints (address:port).
1010
- Envoy + the router (see Start the router section)
1111

1212
1) Minimal configuration
1313
Put this in config/config.yaml (or merge into your existing config). It defines:
1414
- Categories that require reasoning (e.g., math)
1515
- Reasoning families for model syntax differences (DeepSeek/Qwen3 use chat_template_kwargs; GPT-OSS/GPT use reasoning_effort)
1616
- Which concrete models use which reasoning family
17+
- The classifier (required for category detection; without it, reasoning will not be enabled)
1718

1819
```yaml
20+
# Category classifier (required for reasoning to trigger)
21+
classifier:
22+
category_model:
23+
model_id: "models/category_classifier_modernbert-base_model"
24+
use_modernbert: true
25+
threshold: 0.6
26+
use_cpu: true
27+
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
28+
1929
# vLLM endpoints that host your models
2030
vllm_endpoints:
2131
- name: "endpoint1"
@@ -83,12 +93,15 @@ default_model: qwen3-7b
8393
8494
Notes
8595
- Reasoning is controlled by categories.use_reasoning and optionally categories.reasoning_effort.
86-
- A model only gets reasoning fields if it has a model_config.<MODEL>.reasoning_family that maps to a reasoning_families entry.
87-
- DeepSeek/Qwen3: router sets chat_template_kwargs: { parameter: true } when reasoning is enabled.
88-
- GPT/GPT-OSS: router sets reasoning_effort to the category/default effort when reasoning is enabled.
96+
- A model only gets reasoning fields if it has a model_config.&lt;MODEL&gt;.reasoning_family that maps to a reasoning_families entry.
97+
- DeepSeek/Qwen3 (chat_template_kwargs): the router injects chat_template_kwargs only when reasoning is enabled. When disabled, no chat_template_kwargs are added.
98+
- GPT/GPT-OSS (reasoning_effort): when reasoning is enabled, the router sets reasoning_effort based on the category (fallback to default_reasoning_effort). When reasoning is disabled, if the request already contains reasoning_effort and the model’s family type is reasoning_effort, the router preserves the original value; otherwise it is absent.
99+
- For more stable classification, you can add category descriptions in config and keep them semantically distinctive.
89100
90101
2) Start the router
91102
Option A: Local build + Envoy
103+
- Download classifier models and mappings (required)
104+
- make download-models
92105
- Build and run the router
93106
- make build
94107
- make run-router
@@ -99,6 +112,8 @@ Option B: Docker Compose
99112
- docker compose up -d
100113
- Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router)
101114
115+
Note: Ensure your OpenAI-compatible backend is running and reachable (e.g., http://127.0.0.1:8000) so that vllm_endpoints address:port matches a live server. Without a running backend, routing will fail at the Envoy hop.
116+
102117
3) Send example requests
103118
Math (reasoning should be ON and effort high)
104119
```bash
@@ -126,30 +141,25 @@ curl -sS http://localhost:8801/v1/chat/completions \
126141
}' | jq
127142
```
128143

129-
Example response (shape)
130-
The exact fields depend on your backend. The router keeps the OpenAI-compatible shape and may add metadata.
131-
132-
```json
133-
{
134-
"id": "chatcmpl-...",
135-
"object": "chat.completion",
136-
"created": 1726000000,
137-
"model": "openai/gpt-oss-20b",
138-
"choices": [
139-
{
140-
"index": 0,
141-
"message": { "role": "assistant", "content": "The derivative is 3x^2 + 4x - 5." },
142-
"finish_reason": "stop"
143-
}
144-
],
145-
"usage": { "prompt_tokens": 85, "completion_tokens": 43, "total_tokens": 128 },
146-
"routing_metadata": {
147-
"category": "math",
148-
"selected_model": "openai/gpt-oss-20b",
149-
"reasoning_enabled": true,
150-
"reasoning_effort": "high"
151-
}
152-
}
144+
Verify routing via response headers
145+
The router does not inject routing metadata into the JSON body. Instead, inspect the response headers added by the router:
146+
- X-Selected-Model
147+
- X-Semantic-Destination-Endpoint
148+
149+
Example:
150+
```bash
151+
curl -i http://localhost:8801/v1/chat/completions \
152+
-H "Content-Type: application/json" \
153+
-d '{
154+
"model": "auto",
155+
"messages": [
156+
{"role": "system", "content": "You are a math teacher."},
157+
{"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}
158+
]
159+
}'
160+
# In the response headers, look for:
161+
# X-Selected-Model: <your-selected-model>
162+
# X-Semantic-Destination-Endpoint: <address:port>
153163
```
154164

155165
4) Run a comprehensive evaluation
@@ -188,7 +198,8 @@ python bench/router_reason_bench.py \
188198
```
189199

190200
Tips
191-
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see categories.threshold in your setup) and that the target model has a reasoning_family.
201+
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see classifier.category_model.threshold) and that the target model has a reasoning_family.
192202
- For models without a reasoning_family, the router will not inject reasoning fields even when the category requires reasoning (this is by design to avoid invalid requests).
193203
- You can override the effort per category via categories.reasoning_effort or set a global default via default_reasoning_effort.
204+
- Ensure your OpenAI-compatible backend is reachable at the configured vllm_endpoints (address:port). If it’s not running, routing will fail even though the router and Envoy are up.
194205

0 commit comments

Comments
 (0)