Skip to content

Commit 77d7553

Browse files
committed
resolve reviews
Signed-off-by: bitliu <[email protected]>
1 parent 5fd1503 commit 77d7553

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

_posts/2025-09-01-semantic-router.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ image: /assets/logos/vllm-logo-text-light.png
77

88
![](/assets/figures/semantic-router/request.png)
99

10-
## **Industry Status: Inference ≠ The More, The Better**
10+
## Industry Status: Inference ≠ The More, The Better
1111

1212
Over the past year, **Hybrid inference / automatic routing** has become one of the hottest topics in the large model industry.
1313

@@ -38,7 +38,7 @@ Meanwhile, other companies are rapidly following suit:
3838

3939
In summary: The industry is entering a new era where **"not a single token should be wasted"**.
4040

41-
## **Recent Research: vLLM Semantic Router**
41+
## Recent Research: vLLM Semantic Router
4242

4343
Amid the industry's push for "Hybrid inference," we focus on the **open-source inference engine vLLM**.
4444

@@ -70,7 +70,7 @@ Experimental data shows:
7070

7171
Especially in knowledge-intensive areas like business and economics, accuracy improvements even exceed **20%**.
7272

73-
## **Background of the vLLM Semantic Router Project**
73+
## Background of the vLLM Semantic Router Project
7474

7575
The Semantic Router is not the isolated outcome of a single paper, but rather the result of collaboration and sustained efforts within the open-source community:
7676

@@ -90,7 +90,7 @@ Thus, the vLLM Semantic Router is not just a research achievement but an **impor
9090

9191
You can start exploring and experience it by visiting the GitHub repository: [https://github.com/vllm-project/semantic-router](https://github.com/vllm-project/semantic-router).
9292

93-
## **Future Trends: Cost-Effective, Just-in-Time Inference**
93+
## Future Trends: Cost-Effective, Just-in-Time Inference
9494

9595
The large model industry has shifted from "Can we perform inference?" to "**When to perform inference and how to perform it?**"
9696

@@ -106,7 +106,7 @@ The future competitive focus will no longer be about "whose model is the largest
106106

107107
Thus, the next frontier will be: **Intelligent self-adjusting inference mechanisms**. No need for explicit user switches or hardcoding; instead, the model/system can autonomously decide when to "think deeply" or provide a quick answer.
108108

109-
## **Summary in One Sentence**
109+
## Summary in One Sentence
110110

111111
* **GPT-5**: Uses routing for business, driving widespread intelligence.
112112
* **vLLM Semantic Router**: Uses semantic routing for efficiency, driving green AI.

0 commit comments

Comments
 (0)