docs: update

Xunzhuo · Xunzhuo · commit 52f372ecc214 · 2025-09-01T12:02:03.000+08:00
Signed-off-by: bitliu &lt;bitliu@tencent.com&gt;
diff --git a/_posts/2025-09-01-semantic-router.md b/_posts/2025-09-01-semantic-router.md
@@ -17,13 +17,14 @@ Take **GPT-5** as an example. Its real breakthrough isn't in the number of param
   
 * **Complex/High-value queries → Strong inference models**: Legal analysis, financial simulations, etc., are routed to models with Chain-of-Thought capabilities.
 
-The logic behind this mechanism is called **"Unit Token Economics"**. Every token generated is no longer a meaningless "consumption" but must bring value:
+The logic behind this mechanism is called **"Per-token Unit Economics"**.
 
-* Free-tier users can still get responses through light models, **controlling costs**.
+Every token generated is no longer a meaningless "consumption" but must bring value.
 
-* Once a query involves commercial intent (e.g., booking flights, finding lawyers), it will be routed to high-computation models + Agent services, **directly connecting to transaction loops**, where OpenAI can take a commission from the transaction.
+Free-tier users receive answers from lightweight models, keeping costs under control.
+When a query shows commercial intent (e.g., booking flights or finding legal services), it is routed to high-computation models and agent services that plug directly into transaction flows.
 
-This means **free traffic is finally monetized**.
+For use cases like this, companies such as OpenAI can participate in the value chain by taking a commission on completed transactions — turning free traffic from a cost center into a monetizable entry point.
 
 Meanwhile, other companies are rapidly following suit:
 
@@ -41,7 +42,7 @@ In summary: The industry is entering a new era where **"not a single token shoul
 
 Amid the industry's push for "Hybrid inference," we focus on the **open-source inference engine vLLM**.
 
-vLLM has become the de facto standard for deploying large models in the industry. However, it lacks "semantic-level fine control." Developers either enable full inference (wasting computation) or disable inference entirely (losing accuracy).
+vLLM has become the de facto standard for deploying large models in the industry. However, it lacks fine-grained semantic-level control - the ability to decide based on meaning rather than just query type. As a result, developers either enable full inference (wasting computation) or disable inference entirely (losing accuracy).
 
 Thus, we propose the **vLLM Semantic Router**, bringing GPT-5's "smart routing" capabilities to the open-source ecosystem.
 
@@ -63,11 +64,11 @@ Thus, we propose the **vLLM Semantic Router**, bringing GPT-5's "smart routing"
 
 Experimental data shows:
 
-* **Accuracy**: Improved by **+10.2 percentage points**  
+* **Accuracy**: Improved by **+10.2%**  
 * **Latency**: Reduced by **47.1%**  
 * **Token Consumption**: Decreased by **48.5%**
 
-Especially in knowledge-intensive areas like business and economics, accuracy improvements even exceed **20 percentage points**.
+Especially in knowledge-intensive areas like business and economics, accuracy improvements even exceed **20%**.
 
 ## **Background of the vLLM Semantic Router Project**