Skip to content

Commit ea63386

Browse files
docs: reasoning quickstart (#110)
* docs: reasoning quickstart Signed-off-by: Jintao Zhang <[email protected]> * docs: add more details Signed-off-by: Jintao Zhang <[email protected]> * docs: use deepseek-v31 and qwen3-30b in reasoning examples Signed-off-by: Jintao Zhang <[email protected]> --------- Signed-off-by: Jintao Zhang <[email protected]>
1 parent 7128765 commit ea63386

File tree

2 files changed

+199
-0
lines changed

2 files changed

+199
-0
lines changed
Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Reasoning Routing Quickstart
2+
3+
This short guide shows how to enable and verify “reasoning routing” in the Semantic Router:
4+
- Minimal config.yaml fields you need
5+
- Example request/response (OpenAI-compatible)
6+
- A comprehensive evaluation command you can run
7+
8+
Prerequisites
9+
- A running OpenAI-compatible backend for your models (e.g., vLLM or any OpenAI-compatible server). It must be reachable at the addresses you configure under vllm_endpoints (address:port).
10+
- Envoy + the router (see Start the router section)
11+
12+
1) Minimal configuration
13+
Put this in config/config.yaml (or merge into your existing config). It defines:
14+
- Categories that require reasoning (e.g., math)
15+
- Reasoning families for model syntax differences (DeepSeek/Qwen3 use chat_template_kwargs; GPT-OSS/GPT use reasoning_effort)
16+
- Which concrete models use which reasoning family
17+
- The classifier (required for category detection; without it, reasoning will not be enabled)
18+
19+
```yaml
20+
# Category classifier (required for reasoning to trigger)
21+
classifier:
22+
category_model:
23+
model_id: "models/category_classifier_modernbert-base_model"
24+
use_modernbert: true
25+
threshold: 0.6
26+
use_cpu: true
27+
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
28+
29+
# vLLM endpoints that host your models
30+
vllm_endpoints:
31+
- name: "endpoint1"
32+
address: "127.0.0.1"
33+
port: 8000
34+
models: ["deepseek-v31", "qwen3-30b", "openai/gpt-oss-20b"]
35+
weight: 1
36+
37+
# Reasoning family configurations (how to express reasoning for a family)
38+
reasoning_families:
39+
deepseek:
40+
type: "chat_template_kwargs"
41+
parameter: "thinking"
42+
qwen3:
43+
type: "chat_template_kwargs"
44+
parameter: "enable_thinking"
45+
gpt-oss:
46+
type: "reasoning_effort"
47+
parameter: "reasoning_effort"
48+
gpt:
49+
type: "reasoning_effort"
50+
parameter: "reasoning_effort"
51+
52+
# Default effort used when a category doesn’t specify one
53+
default_reasoning_effort: medium # low | medium | high
54+
55+
# Map concrete model names to a reasoning family
56+
model_config:
57+
"deepseek-v31":
58+
reasoning_family: "deepseek"
59+
preferred_endpoints: ["endpoint1"]
60+
"qwen3-30b":
61+
reasoning_family: "qwen3"
62+
preferred_endpoints: ["endpoint1"]
63+
"openai/gpt-oss-20b":
64+
reasoning_family: "gpt-oss"
65+
preferred_endpoints: ["endpoint1"]
66+
67+
# Categories: which kinds of queries require reasoning and at what effort
68+
categories:
69+
- name: math
70+
use_reasoning: true
71+
reasoning_effort: high # overrides default_reasoning_effort
72+
reasoning_description: "Mathematical problems require step-by-step reasoning"
73+
model_scores:
74+
- model: openai/gpt-oss-20b
75+
score: 1.0
76+
- model: deepseek-v31
77+
score: 0.8
78+
- model: qwen3-30b
79+
score: 0.8
80+
81+
82+
# A safe default when no category is confidently selected
83+
default_model: qwen3-30b
84+
```
85+
86+
Notes
87+
- Reasoning is controlled by categories.use_reasoning and optionally categories.reasoning_effort.
88+
- A model only gets reasoning fields if it has a model_config.&lt;MODEL&gt;.reasoning_family that maps to a reasoning_families entry.
89+
- DeepSeek/Qwen3 (chat_template_kwargs): the router injects chat_template_kwargs only when reasoning is enabled. When disabled, no chat_template_kwargs are added.
90+
- GPT/GPT-OSS (reasoning_effort): when reasoning is enabled, the router sets reasoning_effort based on the category (fallback to default_reasoning_effort). When reasoning is disabled, if the request already contains reasoning_effort and the model’s family type is reasoning_effort, the router preserves the original value; otherwise it is absent.
91+
- Category descriptions (for example, description and reasoning_description) are informational only today; they do not affect routing or classification.
92+
- Categories must be from MMLU-Pro at the moment; avoid free-form categories like "general". If you want generic categories, consider opening an issue to map them to MMLU-Pro.
93+
94+
2) Start the router
95+
Option A: Local build + Envoy
96+
- Download classifier models and mappings (required)
97+
- make download-models
98+
- Build and run the router
99+
- make build
100+
- make run-router
101+
- Start Envoy (install func-e once with make prepare-envoy if needed)
102+
- func-e run --config-path config/envoy.yaml --component-log-level "ext_proc:trace,router:trace,http:trace"
103+
104+
Option B: Docker Compose
105+
- docker compose up -d
106+
- Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router)
107+
108+
Note: Ensure your OpenAI-compatible backend is running and reachable (e.g., http://127.0.0.1:8000) so that vllm_endpoints address:port matches a live server. Without a running backend, routing will fail at the Envoy hop.
109+
110+
3) Send example requests
111+
Math (reasoning should be ON and effort high)
112+
```bash
113+
curl -sS http://localhost:8801/v1/chat/completions \
114+
-H "Content-Type: application/json" \
115+
-d '{
116+
"model": "auto",
117+
"messages": [
118+
{"role": "system", "content": "You are a math teacher."},
119+
{"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}
120+
]
121+
}' | jq
122+
```
123+
124+
General (reasoning should be OFF)
125+
```bash
126+
curl -sS http://localhost:8801/v1/chat/completions \
127+
-H "Content-Type: application/json" \
128+
-d '{
129+
"model": "auto",
130+
"messages": [
131+
{"role": "system", "content": "You are a helpful assistant."},
132+
{"role": "user", "content": "Who are you?"}
133+
]
134+
}' | jq
135+
```
136+
137+
Verify routing via response headers
138+
The router does not inject routing metadata into the JSON body. Instead, inspect the response headers added by the router:
139+
- X-Selected-Model
140+
- X-Semantic-Destination-Endpoint
141+
142+
Example:
143+
```bash
144+
curl -i http://localhost:8801/v1/chat/completions \
145+
-H "Content-Type: application/json" \
146+
-d '{
147+
"model": "auto",
148+
"messages": [
149+
{"role": "system", "content": "You are a math teacher."},
150+
{"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}
151+
]
152+
}'
153+
# In the response headers, look for:
154+
# X-Selected-Model: <your-selected-model>
155+
# X-Semantic-Destination-Endpoint: <address:port>
156+
```
157+
158+
4) Run a comprehensive evaluation
159+
You can benchmark the router vs a direct vLLM endpoint across categories using the included script. This runs a ReasoningBench based on MMLU-Pro and produces summaries and plots.
160+
161+
Quick start (router + vLLM):
162+
```bash
163+
SAMPLES_PER_CATEGORY=25 \
164+
CONCURRENT_REQUESTS=4 \
165+
ROUTER_MODELS="auto" \
166+
VLLM_MODELS="openai/gpt-oss-20b" \
167+
./bench/run_bench.sh
168+
```
169+
170+
Router-only benchmark:
171+
```bash
172+
BENCHMARK_ROUTER_ONLY=true \
173+
SAMPLES_PER_CATEGORY=25 \
174+
CONCURRENT_REQUESTS=4 \
175+
ROUTER_MODELS="auto" \
176+
./bench/run_bench.sh
177+
```
178+
179+
Direct invocation (advanced):
180+
```bash
181+
python bench/router_reason_bench.py \
182+
--run-router \
183+
--router-endpoint http://localhost:8801/v1 \
184+
--router-models auto \
185+
--run-vllm \
186+
--vllm-endpoint http://localhost:8000/v1 \
187+
--vllm-models openai/gpt-oss-20b \
188+
--samples-per-category 25 \
189+
--concurrent-requests 4 \
190+
--output-dir results/reasonbench
191+
```
192+
193+
Tips
194+
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see classifier.category_model.threshold) and that the target model has a reasoning_family.
195+
- For models without a reasoning_family, the router will not inject reasoning fields even when the category requires reasoning (this is by design to avoid invalid requests).
196+
- You can override the effort per category via categories.reasoning_effort or set a global default via default_reasoning_effort.
197+
- Ensure your OpenAI-compatible backend is reachable at the configured vllm_endpoints (address:port). If it’s not running, routing will fail even though the router and Envoy are up.
198+

website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ const sidebars = {
4646
items: [
4747
'getting-started/installation',
4848
'getting-started/configuration',
49+
'getting-started/reasoning-routing-quickstart',
4950
],
5051
},
5152
{

0 commit comments

Comments
 (0)