You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/getting-started/reasoning-routing-quickstart.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,19 @@
1
1
# Reasoning Routing Quickstart
2
2
3
3
This short guide shows how to enable and verify “reasoning routing” in the Semantic Router:
4
+
4
5
- Minimal config.yaml fields you need
5
6
- Example request/response (OpenAI-compatible)
6
7
- A comprehensive evaluation command you can run
7
8
8
9
Prerequisites
10
+
9
11
- A running OpenAI-compatible backend for your models (e.g., vLLM or any OpenAI-compatible server). It must be reachable at the addresses you configure under vllm_endpoints (address:port).
10
12
- Envoy + the router (see Start the router section)
11
13
12
14
1) Minimal configuration
13
15
Put this in config/config.yaml (or merge into your existing config). It defines:
16
+
14
17
- Categories that require reasoning (e.g., math)
15
18
- Reasoning families for model syntax differences (DeepSeek/Qwen3 use chat_template_kwargs; GPT-OSS/GPT use reasoning_effort)
16
19
- Which concrete models use which reasoning family
@@ -84,6 +87,7 @@ default_model: qwen3-30b
84
87
```
85
88
86
89
Notes
90
+
87
91
- Reasoning is controlled by categories.use_reasoning and optionally categories.reasoning_effort.
88
92
- A model only gets reasoning fields if it has a model_config.<MODEL>.reasoning_family that maps to a reasoning_families entry.
89
93
- DeepSeek/Qwen3 (chat_template_kwargs): the router injects chat_template_kwargs only when reasoning is enabled. When disabled, no chat_template_kwargs are added.
@@ -93,6 +97,7 @@ Notes
93
97
94
98
2) Start the router
95
99
Option A: Local build + Envoy
100
+
96
101
- Download classifier models and mappings (required)
97
102
- make download-models
98
103
- Build and run the router
@@ -102,13 +107,15 @@ Option A: Local build + Envoy
102
107
- func-e run --config-path config/envoy.yaml --component-log-level "ext_proc:trace,router:trace,http:trace"
103
108
104
109
Option B: Docker Compose
110
+
105
111
- docker compose up -d
106
112
- Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router)
107
113
108
114
Note: Ensure your OpenAI-compatible backend is running and reachable (e.g., http://127.0.0.1:8000) so that vllm_endpoints address:port matches a live server. Without a running backend, routing will fail at the Envoy hop.
You can benchmark the router vs a direct vLLM endpoint across categories using the included script. This runs a ReasoningBench based on MMLU-Pro and produces summaries and plots.
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see classifier.category_model.threshold) and that the target model has a reasoning_family.
195
209
- For models without a reasoning_family, the router will not inject reasoning fields even when the category requires reasoning (this is by design to avoid invalid requests).
196
210
- You can override the effort per category via categories.reasoning_effort or set a global default via default_reasoning_effort.
197
211
- Ensure your OpenAI-compatible backend is reachable at the configured vllm_endpoints (address:port). If it’s not running, routing will fail even though the router and Envoy are up.
0 commit comments