docs: update

Xunzhuo · Xunzhuo · commit 1e18c1f33bc0 · 2025-09-01T11:48:39.000+08:00
Signed-off-by: bitliu &lt;bitliu@tencent.com&gt;
diff --git a/_posts/2025-09-01-semantic-router.md b/_posts/2025-09-01-semantic-router.md
@@ -9,7 +9,7 @@ image: /assets/logos/vllm-logo-text-light.png
 
 ## **Industry Status: Inference ≠ The More, The Better**
 
-Over the past year, **hybrid inference / automatic routing** has become one of the hottest topics in the large model industry.
+Over the past year, **Hybrid inference / automatic routing** has become one of the hottest topics in the large model industry.
 
 Take **GPT-5** as an example. Its real breakthrough isn't in the number of parameters, but in the **"automatic routing + thinking quota"**:
 
@@ -39,7 +39,7 @@ In summary: The industry is entering a new era where **"not a single token shoul
 
 ## **Recent Research: vLLM Semantic Router**
 
-Amid the industry's push for "hybrid inference," we focus on the **open-source inference engine vLLM**.
+Amid the industry's push for "Hybrid inference," we focus on the **open-source inference engine vLLM**.
 
 vLLM has become the de facto standard for deploying large models in the industry. However, it lacks "semantic-level fine control." Developers either enable full inference (wasting computation) or disable inference entirely (losing accuracy).
 
@@ -54,7 +54,7 @@ Thus, we propose the **vLLM Semantic Router**, bringing GPT-5's "smart routing"
 2. **Smart Routing**:
 
    * Simple queries → Directly call the non-inference mode for fast responses.
-   
+
    * Complex inference queries → Enable Chain-of-Thought to ensure accuracy.
 
 3. **Rust High-Performance Engine**: Using the HuggingFace Candle framework to achieve high concurrency and zero-copy efficient inference.
@@ -105,7 +105,7 @@ The future competitive focus will no longer be about "whose model is the largest
 
 Thus, the next frontier will be: **Intelligent self-adjusting inference mechanisms**. No need for explicit user switches or hardcoding; instead, the model/system can autonomously decide when to "think deeply" or provide a quick answer.
 
-# **Summary in One Sentence**
+## **Summary in One Sentence**
 
 * **GPT-5**: Uses routing for business, driving widespread intelligence.
 * **vLLM Semantic Router**: Uses semantic routing for efficiency, driving green AI.