Skip to content

Commit 975797c

Browse files
docs: reasoning quickstart
Signed-off-by: Jintao Zhang <[email protected]>
1 parent ced6a8e commit 975797c

File tree

2 files changed

+195
-0
lines changed

2 files changed

+195
-0
lines changed
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Reasoning Routing Quickstart
2+
3+
This short guide shows how to enable and verify “reasoning routing” in the Semantic Router:
4+
- Minimal config.yaml fields you need
5+
- Example request/response (OpenAI-compatible)
6+
- A comprehensive evaluation command you can run
7+
8+
Prerequisites
9+
- A running OpenAI-compatible backend for your models (e.g., vLLM, OpenAI-compatible server)
10+
- Envoy + the router (see Start the router section)
11+
12+
1) Minimal configuration
13+
Put this in config/config.yaml (or merge into your existing config). It defines:
14+
- Categories that require reasoning (e.g., math)
15+
- Reasoning families for model syntax differences (DeepSeek/Qwen3 use chat_template_kwargs; GPT-OSS/GPT use reasoning_effort)
16+
- Which concrete models use which reasoning family
17+
18+
```yaml
19+
# vLLM endpoints that host your models
20+
vllm_endpoints:
21+
- name: "endpoint1"
22+
address: "127.0.0.1"
23+
port: 8000
24+
models: ["deepseek-v3", "qwen3-7b", "openai/gpt-oss-20b"]
25+
weight: 1
26+
27+
# Reasoning family configurations (how to express reasoning for a family)
28+
reasoning_families:
29+
deepseek:
30+
type: "chat_template_kwargs"
31+
parameter: "thinking"
32+
qwen3:
33+
type: "chat_template_kwargs"
34+
parameter: "enable_thinking"
35+
gpt-oss:
36+
type: "reasoning_effort"
37+
parameter: "reasoning_effort"
38+
gpt:
39+
type: "reasoning_effort"
40+
parameter: "reasoning_effort"
41+
42+
# Default effort used when a category doesn’t specify one
43+
default_reasoning_effort: medium # low | medium | high
44+
45+
# Map concrete model names to a reasoning family
46+
model_config:
47+
"deepseek-v3":
48+
reasoning_family: "deepseek"
49+
preferred_endpoints: ["endpoint1"]
50+
"qwen3-7b":
51+
reasoning_family: "qwen3"
52+
preferred_endpoints: ["endpoint1"]
53+
"openai/gpt-oss-20b":
54+
reasoning_family: "gpt-oss"
55+
preferred_endpoints: ["endpoint1"]
56+
57+
# Categories: which kinds of queries require reasoning and at what effort
58+
categories:
59+
- name: math
60+
use_reasoning: true
61+
reasoning_effort: high # overrides default_reasoning_effort
62+
reasoning_description: "Mathematical problems require step-by-step reasoning"
63+
model_scores:
64+
- model: openai/gpt-oss-20b
65+
score: 1.0
66+
- model: deepseek-v3
67+
score: 0.8
68+
- model: qwen3-7b
69+
score: 0.8
70+
71+
- name: general
72+
use_reasoning: false
73+
reasoning_description: "General chit-chat doesn’t need reasoning"
74+
model_scores:
75+
- model: qwen3-7b
76+
score: 1.0
77+
- model: deepseek-v3
78+
score: 0.8
79+
80+
# A safe default when no category is confidently selected
81+
default_model: qwen3-7b
82+
```
83+
84+
Notes
85+
- Reasoning is controlled by categories.use_reasoning and optionally categories.reasoning_effort.
86+
- A model only gets reasoning fields if it has a model_config.<MODEL>.reasoning_family that maps to a reasoning_families entry.
87+
- DeepSeek/Qwen3: router sets chat_template_kwargs: { parameter: true } when reasoning is enabled.
88+
- GPT/GPT-OSS: router sets reasoning_effort to the category/default effort when reasoning is enabled.
89+
90+
2) Start the router
91+
Option A: Local build + Envoy
92+
- Build and run the router
93+
- make build
94+
- make run-router
95+
- Start Envoy (install func-e once with make prepare-envoy if needed)
96+
- func-e run --config-path config/envoy.yaml --component-log-level "ext_proc:trace,router:trace,http:trace"
97+
98+
Option B: Docker Compose
99+
- docker compose up -d
100+
- Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router)
101+
102+
3) Send example requests
103+
Math (reasoning should be ON and effort high)
104+
```bash
105+
curl -sS http://localhost:8801/v1/chat/completions \
106+
-H "Content-Type: application/json" \
107+
-d '{
108+
"model": "auto",
109+
"messages": [
110+
{"role": "system", "content": "You are a math teacher."},
111+
{"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}
112+
]
113+
}' | jq
114+
```
115+
116+
General (reasoning should be OFF)
117+
```bash
118+
curl -sS http://localhost:8801/v1/chat/completions \
119+
-H "Content-Type: application/json" \
120+
-d '{
121+
"model": "auto",
122+
"messages": [
123+
{"role": "system", "content": "You are a helpful assistant."},
124+
{"role": "user", "content": "Who are you?"}
125+
]
126+
}' | jq
127+
```
128+
129+
Example response (shape)
130+
The exact fields depend on your backend. The router keeps the OpenAI-compatible shape and may add metadata.
131+
132+
```json
133+
{
134+
"id": "chatcmpl-...",
135+
"object": "chat.completion",
136+
"created": 1726000000,
137+
"model": "openai/gpt-oss-20b",
138+
"choices": [
139+
{
140+
"index": 0,
141+
"message": { "role": "assistant", "content": "The derivative is 3x^2 + 4x - 5." },
142+
"finish_reason": "stop"
143+
}
144+
],
145+
"usage": { "prompt_tokens": 85, "completion_tokens": 43, "total_tokens": 128 },
146+
"routing_metadata": {
147+
"category": "math",
148+
"selected_model": "openai/gpt-oss-20b",
149+
"reasoning_enabled": true,
150+
"reasoning_effort": "high"
151+
}
152+
}
153+
```
154+
155+
4) Run a comprehensive evaluation
156+
You can benchmark the router vs a direct vLLM endpoint across categories using the included script. This runs a ReasoningBench based on MMLU-Pro and produces summaries and plots.
157+
158+
Quick start (router + vLLM):
159+
```bash
160+
SAMPLES_PER_CATEGORY=25 \
161+
CONCURRENT_REQUESTS=4 \
162+
ROUTER_MODELS="auto" \
163+
VLLM_MODELS="openai/gpt-oss-20b" \
164+
./bench/run_bench.sh
165+
```
166+
167+
Router-only benchmark:
168+
```bash
169+
BENCHMARK_ROUTER_ONLY=true \
170+
SAMPLES_PER_CATEGORY=25 \
171+
CONCURRENT_REQUESTS=4 \
172+
ROUTER_MODELS="auto" \
173+
./bench/run_bench.sh
174+
```
175+
176+
Direct invocation (advanced):
177+
```bash
178+
python bench/router_reason_bench.py \
179+
--run-router \
180+
--router-endpoint http://localhost:8801/v1 \
181+
--router-models auto \
182+
--run-vllm \
183+
--vllm-endpoint http://localhost:8000/v1 \
184+
--vllm-models openai/gpt-oss-20b \
185+
--samples-per-category 25 \
186+
--concurrent-requests 4 \
187+
--output-dir results/reasonbench
188+
```
189+
190+
Tips
191+
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see categories.threshold in your setup) and that the target model has a reasoning_family.
192+
- For models without a reasoning_family, the router will not inject reasoning fields even when the category requires reasoning (this is by design to avoid invalid requests).
193+
- You can override the effort per category via categories.reasoning_effort or set a global default via default_reasoning_effort.
194+

website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ const sidebars = {
4646
items: [
4747
'getting-started/installation',
4848
'getting-started/configuration',
49+
'getting-started/reasoning-routing-quickstart',
4950
],
5051
},
5152
{

0 commit comments

Comments
 (0)