You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/docs/getting-started/reasoning-routing-quickstart.md
+40-29Lines changed: 40 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,16 +6,26 @@ This short guide shows how to enable and verify “reasoning routing” in the S
6
6
- A comprehensive evaluation command you can run
7
7
8
8
Prerequisites
9
-
- A running OpenAI-compatible backend for your models (e.g., vLLM, OpenAI-compatible server)
9
+
- A running OpenAI-compatible backend for your models (e.g., vLLM or any OpenAI-compatible server). It must be reachable at the addresses you configure under vllm_endpoints (address:port).
10
10
- Envoy + the router (see Start the router section)
11
11
12
12
1) Minimal configuration
13
13
Put this in config/config.yaml (or merge into your existing config). It defines:
14
14
- Categories that require reasoning (e.g., math)
15
15
- Reasoning families for model syntax differences (DeepSeek/Qwen3 use chat_template_kwargs; GPT-OSS/GPT use reasoning_effort)
16
16
- Which concrete models use which reasoning family
17
+
- The classifier (required for category detection; without it, reasoning will not be enabled)
17
18
18
19
```yaml
20
+
# Category classifier (required for reasoning to trigger)
- Reasoning is controlled by categories.use_reasoning and optionally categories.reasoning_effort.
86
-
- A model only gets reasoning fields if it has a model_config.<MODEL>.reasoning_family that maps to a reasoning_families entry.
87
-
- DeepSeek/Qwen3: router sets chat_template_kwargs: { parameter: true } when reasoning is enabled.
88
-
- GPT/GPT-OSS: router sets reasoning_effort to the category/default effort when reasoning is enabled.
96
+
- A model only gets reasoning fields if it has a model_config.<MODEL>.reasoning_family that maps to a reasoning_families entry.
97
+
- DeepSeek/Qwen3 (chat_template_kwargs): the router injects chat_template_kwargs only when reasoning is enabled. When disabled, no chat_template_kwargs are added.
98
+
- GPT/GPT-OSS (reasoning_effort): when reasoning is enabled, the router sets reasoning_effort based on the category (fallback to default_reasoning_effort). When reasoning is disabled, if the request already contains reasoning_effort and the model’s family type is reasoning_effort, the router preserves the original value; otherwise it is absent.
99
+
- For more stable classification, you can add category descriptions in config and keep them semantically distinctive.
89
100
90
101
2) Start the router
91
102
Option A: Local build + Envoy
103
+
- Download classifier models and mappings (required)
104
+
- make download-models
92
105
- Build and run the router
93
106
- make build
94
107
- make run-router
@@ -99,6 +112,8 @@ Option B: Docker Compose
99
112
- docker compose up -d
100
113
- Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router)
101
114
115
+
Note: Ensure your OpenAI-compatible backend is running and reachable (e.g., http://127.0.0.1:8000) so that vllm_endpoints address:port matches a live server. Without a running backend, routing will fail at the Envoy hop.
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see categories.threshold in your setup) and that the target model has a reasoning_family.
201
+
- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see classifier.category_model.threshold) and that the target model has a reasoning_family.
192
202
- For models without a reasoning_family, the router will not inject reasoning fields even when the category requires reasoning (this is by design to avoid invalid requests).
193
203
- You can override the effort per category via categories.reasoning_effort or set a global default via default_reasoning_effort.
204
+
- Ensure your OpenAI-compatible backend is reachable at the configured vllm_endpoints (address:port). If it’s not running, routing will fail even though the router and Envoy are up.
0 commit comments