|  | 
|  | 1 | +# Reasoning Routing Quickstart | 
|  | 2 | + | 
|  | 3 | +This short guide shows how to enable and verify “reasoning routing” in the Semantic Router: | 
|  | 4 | +- Minimal config.yaml fields you need | 
|  | 5 | +- Example request/response (OpenAI-compatible) | 
|  | 6 | +- A comprehensive evaluation command you can run | 
|  | 7 | + | 
|  | 8 | +Prerequisites | 
|  | 9 | +- A running OpenAI-compatible backend for your models (e.g., vLLM, OpenAI-compatible server) | 
|  | 10 | +- Envoy + the router (see Start the router section) | 
|  | 11 | + | 
|  | 12 | +1) Minimal configuration | 
|  | 13 | +Put this in config/config.yaml (or merge into your existing config). It defines: | 
|  | 14 | +- Categories that require reasoning (e.g., math) | 
|  | 15 | +- Reasoning families for model syntax differences (DeepSeek/Qwen3 use chat_template_kwargs; GPT-OSS/GPT use reasoning_effort) | 
|  | 16 | +- Which concrete models use which reasoning family | 
|  | 17 | + | 
|  | 18 | +```yaml | 
|  | 19 | +# vLLM endpoints that host your models | 
|  | 20 | +vllm_endpoints: | 
|  | 21 | +  - name: "endpoint1" | 
|  | 22 | +    address: "127.0.0.1" | 
|  | 23 | +    port: 8000 | 
|  | 24 | +    models: ["deepseek-v3", "qwen3-7b", "openai/gpt-oss-20b"] | 
|  | 25 | +    weight: 1 | 
|  | 26 | + | 
|  | 27 | +# Reasoning family configurations (how to express reasoning for a family) | 
|  | 28 | +reasoning_families: | 
|  | 29 | +  deepseek: | 
|  | 30 | +    type: "chat_template_kwargs" | 
|  | 31 | +    parameter: "thinking" | 
|  | 32 | +  qwen3: | 
|  | 33 | +    type: "chat_template_kwargs" | 
|  | 34 | +    parameter: "enable_thinking" | 
|  | 35 | +  gpt-oss: | 
|  | 36 | +    type: "reasoning_effort" | 
|  | 37 | +    parameter: "reasoning_effort" | 
|  | 38 | +  gpt: | 
|  | 39 | +    type: "reasoning_effort" | 
|  | 40 | +    parameter: "reasoning_effort" | 
|  | 41 | + | 
|  | 42 | +# Default effort used when a category doesn’t specify one | 
|  | 43 | +default_reasoning_effort: medium  # low | medium | high | 
|  | 44 | + | 
|  | 45 | +# Map concrete model names to a reasoning family | 
|  | 46 | +model_config: | 
|  | 47 | +  "deepseek-v3": | 
|  | 48 | +    reasoning_family: "deepseek" | 
|  | 49 | +    preferred_endpoints: ["endpoint1"] | 
|  | 50 | +  "qwen3-7b": | 
|  | 51 | +    reasoning_family: "qwen3" | 
|  | 52 | +    preferred_endpoints: ["endpoint1"] | 
|  | 53 | +  "openai/gpt-oss-20b": | 
|  | 54 | +    reasoning_family: "gpt-oss" | 
|  | 55 | +    preferred_endpoints: ["endpoint1"] | 
|  | 56 | + | 
|  | 57 | +# Categories: which kinds of queries require reasoning and at what effort | 
|  | 58 | +categories: | 
|  | 59 | +- name: math | 
|  | 60 | +  use_reasoning: true | 
|  | 61 | +  reasoning_effort: high  # overrides default_reasoning_effort | 
|  | 62 | +  reasoning_description: "Mathematical problems require step-by-step reasoning" | 
|  | 63 | +  model_scores: | 
|  | 64 | +  - model: openai/gpt-oss-20b | 
|  | 65 | +    score: 1.0 | 
|  | 66 | +  - model: deepseek-v3 | 
|  | 67 | +    score: 0.8 | 
|  | 68 | +  - model: qwen3-7b | 
|  | 69 | +    score: 0.8 | 
|  | 70 | + | 
|  | 71 | +- name: general | 
|  | 72 | +  use_reasoning: false | 
|  | 73 | +  reasoning_description: "General chit-chat doesn’t need reasoning" | 
|  | 74 | +  model_scores: | 
|  | 75 | +  - model: qwen3-7b | 
|  | 76 | +    score: 1.0 | 
|  | 77 | +  - model: deepseek-v3 | 
|  | 78 | +    score: 0.8 | 
|  | 79 | + | 
|  | 80 | +# A safe default when no category is confidently selected | 
|  | 81 | +default_model: qwen3-7b | 
|  | 82 | +``` | 
|  | 83 | +
 | 
|  | 84 | +Notes | 
|  | 85 | +- Reasoning is controlled by categories.use_reasoning and optionally categories.reasoning_effort. | 
|  | 86 | +- A model only gets reasoning fields if it has a model_config.<MODEL>.reasoning_family that maps to a reasoning_families entry. | 
|  | 87 | +- DeepSeek/Qwen3: router sets chat_template_kwargs: { parameter: true } when reasoning is enabled. | 
|  | 88 | +- GPT/GPT-OSS: router sets reasoning_effort to the category/default effort when reasoning is enabled. | 
|  | 89 | +
 | 
|  | 90 | +2) Start the router | 
|  | 91 | +Option A: Local build + Envoy | 
|  | 92 | +- Build and run the router | 
|  | 93 | +  - make build | 
|  | 94 | +  - make run-router | 
|  | 95 | +- Start Envoy (install func-e once with make prepare-envoy if needed) | 
|  | 96 | +  - func-e run --config-path config/envoy.yaml --component-log-level "ext_proc:trace,router:trace,http:trace" | 
|  | 97 | +
 | 
|  | 98 | +Option B: Docker Compose | 
|  | 99 | +- docker compose up -d | 
|  | 100 | +  - Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router) | 
|  | 101 | +
 | 
|  | 102 | +3) Send example requests | 
|  | 103 | +Math (reasoning should be ON and effort high) | 
|  | 104 | +```bash | 
|  | 105 | +curl -sS http://localhost:8801/v1/chat/completions \ | 
|  | 106 | +  -H "Content-Type: application/json" \ | 
|  | 107 | +  -d '{ | 
|  | 108 | +    "model": "auto", | 
|  | 109 | +    "messages": [ | 
|  | 110 | +      {"role": "system", "content": "You are a math teacher."}, | 
|  | 111 | +      {"role": "user",   "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"} | 
|  | 112 | +    ] | 
|  | 113 | +  }' | jq | 
|  | 114 | +``` | 
|  | 115 | + | 
|  | 116 | +General (reasoning should be OFF) | 
|  | 117 | +```bash | 
|  | 118 | +curl -sS http://localhost:8801/v1/chat/completions \ | 
|  | 119 | +  -H "Content-Type: application/json" \ | 
|  | 120 | +  -d '{ | 
|  | 121 | +    "model": "auto", | 
|  | 122 | +    "messages": [ | 
|  | 123 | +      {"role": "system", "content": "You are a helpful assistant."}, | 
|  | 124 | +      {"role": "user",   "content": "Who are you?"} | 
|  | 125 | +    ] | 
|  | 126 | +  }' | jq | 
|  | 127 | +``` | 
|  | 128 | + | 
|  | 129 | +Example response (shape) | 
|  | 130 | +The exact fields depend on your backend. The router keeps the OpenAI-compatible shape and may add metadata. | 
|  | 131 | + | 
|  | 132 | +```json | 
|  | 133 | +{ | 
|  | 134 | +  "id": "chatcmpl-...", | 
|  | 135 | +  "object": "chat.completion", | 
|  | 136 | +  "created": 1726000000, | 
|  | 137 | +  "model": "openai/gpt-oss-20b", | 
|  | 138 | +  "choices": [ | 
|  | 139 | +    { | 
|  | 140 | +      "index": 0, | 
|  | 141 | +      "message": { "role": "assistant", "content": "The derivative is 3x^2 + 4x - 5." }, | 
|  | 142 | +      "finish_reason": "stop" | 
|  | 143 | +    } | 
|  | 144 | +  ], | 
|  | 145 | +  "usage": { "prompt_tokens": 85, "completion_tokens": 43, "total_tokens": 128 }, | 
|  | 146 | +  "routing_metadata": { | 
|  | 147 | +    "category": "math", | 
|  | 148 | +    "selected_model": "openai/gpt-oss-20b", | 
|  | 149 | +    "reasoning_enabled": true, | 
|  | 150 | +    "reasoning_effort": "high" | 
|  | 151 | +  } | 
|  | 152 | +} | 
|  | 153 | +``` | 
|  | 154 | + | 
|  | 155 | +4) Run a comprehensive evaluation | 
|  | 156 | +You can benchmark the router vs a direct vLLM endpoint across categories using the included script. This runs a ReasoningBench based on MMLU-Pro and produces summaries and plots. | 
|  | 157 | + | 
|  | 158 | +Quick start (router + vLLM): | 
|  | 159 | +```bash | 
|  | 160 | +SAMPLES_PER_CATEGORY=25 \ | 
|  | 161 | +CONCURRENT_REQUESTS=4 \ | 
|  | 162 | +ROUTER_MODELS="auto" \ | 
|  | 163 | +VLLM_MODELS="openai/gpt-oss-20b" \ | 
|  | 164 | +./bench/run_bench.sh | 
|  | 165 | +``` | 
|  | 166 | + | 
|  | 167 | +Router-only benchmark: | 
|  | 168 | +```bash | 
|  | 169 | +BENCHMARK_ROUTER_ONLY=true \ | 
|  | 170 | +SAMPLES_PER_CATEGORY=25 \ | 
|  | 171 | +CONCURRENT_REQUESTS=4 \ | 
|  | 172 | +ROUTER_MODELS="auto" \ | 
|  | 173 | +./bench/run_bench.sh | 
|  | 174 | +``` | 
|  | 175 | + | 
|  | 176 | +Direct invocation (advanced): | 
|  | 177 | +```bash | 
|  | 178 | +python bench/router_reason_bench.py \ | 
|  | 179 | +  --run-router \ | 
|  | 180 | +  --router-endpoint http://localhost:8801/v1 \ | 
|  | 181 | +  --router-models auto \ | 
|  | 182 | +  --run-vllm \ | 
|  | 183 | +  --vllm-endpoint http://localhost:8000/v1 \ | 
|  | 184 | +  --vllm-models openai/gpt-oss-20b \ | 
|  | 185 | +  --samples-per-category 25 \ | 
|  | 186 | +  --concurrent-requests 4 \ | 
|  | 187 | +  --output-dir results/reasonbench | 
|  | 188 | +``` | 
|  | 189 | + | 
|  | 190 | +Tips | 
|  | 191 | +- If your math request doesn’t enable reasoning, confirm the classifier assigns the "math" category with sufficient confidence (see categories.threshold in your setup) and that the target model has a reasoning_family. | 
|  | 192 | +- For models without a reasoning_family, the router will not inject reasoning fields even when the category requires reasoning (this is by design to avoid invalid requests). | 
|  | 193 | +- You can override the effort per category via categories.reasoning_effort or set a global default via default_reasoning_effort. | 
|  | 194 | + | 
0 commit comments