Skip to content

Commit 52f372e

Browse files
committed
docs: update
Signed-off-by: bitliu <[email protected]>
1 parent 1e18c1f commit 52f372e

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

_posts/2025-09-01-semantic-router.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,14 @@ Take **GPT-5** as an example. Its real breakthrough isn't in the number of param
1717

1818
* **Complex/High-value queries → Strong inference models**: Legal analysis, financial simulations, etc., are routed to models with Chain-of-Thought capabilities.
1919

20-
The logic behind this mechanism is called **"Unit Token Economics"**. Every token generated is no longer a meaningless "consumption" but must bring value:
20+
The logic behind this mechanism is called **"Per-token Unit Economics"**.
2121

22-
* Free-tier users can still get responses through light models, **controlling costs**.
22+
Every token generated is no longer a meaningless "consumption" but must bring value.
2323

24-
* Once a query involves commercial intent (e.g., booking flights, finding lawyers), it will be routed to high-computation models + Agent services, **directly connecting to transaction loops**, where OpenAI can take a commission from the transaction.
24+
Free-tier users receive answers from lightweight models, keeping costs under control.
25+
When a query shows commercial intent (e.g., booking flights or finding legal services), it is routed to high-computation models and agent services that plug directly into transaction flows.
2526

26-
This means **free traffic is finally monetized**.
27+
For use cases like this, companies such as OpenAI can participate in the value chain by taking a commission on completed transactions — turning free traffic from a cost center into a monetizable entry point.
2728

2829
Meanwhile, other companies are rapidly following suit:
2930

@@ -41,7 +42,7 @@ In summary: The industry is entering a new era where **"not a single token shoul
4142

4243
Amid the industry's push for "Hybrid inference," we focus on the **open-source inference engine vLLM**.
4344

44-
vLLM has become the de facto standard for deploying large models in the industry. However, it lacks "semantic-level fine control." Developers either enable full inference (wasting computation) or disable inference entirely (losing accuracy).
45+
vLLM has become the de facto standard for deploying large models in the industry. However, it lacks fine-grained semantic-level control - the ability to decide based on meaning rather than just query type. As a result, developers either enable full inference (wasting computation) or disable inference entirely (losing accuracy).
4546

4647
Thus, we propose the **vLLM Semantic Router**, bringing GPT-5's "smart routing" capabilities to the open-source ecosystem.
4748

@@ -63,11 +64,11 @@ Thus, we propose the **vLLM Semantic Router**, bringing GPT-5's "smart routing"
6364

6465
Experimental data shows:
6566

66-
* **Accuracy**: Improved by **+10.2 percentage points**
67+
* **Accuracy**: Improved by **+10.2%**
6768
* **Latency**: Reduced by **47.1%**
6869
* **Token Consumption**: Decreased by **48.5%**
6970

70-
Especially in knowledge-intensive areas like business and economics, accuracy improvements even exceed **20 percentage points**.
71+
Especially in knowledge-intensive areas like business and economics, accuracy improvements even exceed **20%**.
7172

7273
## **Background of the vLLM Semantic Router Project**
7374

0 commit comments

Comments
 (0)