Skip to content

Commit 4fbee46

Browse files
authored
paper: Category-Aware Semantic Caching for Heterogeneous LLM Workloads (#578)
* paper: Category-Aware Semantic Caching for Heterogeneous LLM Workloads Signed-off-by: bitliu <[email protected]> * more Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: bitliu <[email protected]>
1 parent 2eeeda8 commit 4fbee46

File tree

2 files changed

+16
-2
lines changed

2 files changed

+16
-2
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,12 @@
1616

1717
*Latest News* 🔥
1818

19+
- [2025/11/03] **Our paper** [Category-Aware Semantic Caching for Heterogeneous LLM Workloads](https://arxiv.org/abs/2510.26835) published 📝
1920
- [2025/10/26] We reached 2000 stars on GitHub! 🔥
2021
- [2025/10/21] We announced the [2025 Q4 Roadmap: Journey to Iris](https://vllm-semantic-router.com/blog/q4-roadmap-iris) 📅.
2122
- [2025/10/16] We established the [vLLM Semantic Router Youtube Channel](https://www.youtube.com/@vLLMSemanticRouter) ✨.
2223
- [2025/10/15] We announced the [vLLM Semantic Router Dashboard](https://www.youtube.com/watch?v=E2IirN8PsFw) 🚀.
23-
- [2025/10/12] Our paper [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
24+
- [2025/10/12] **Our paper** [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
2425
- [2025/10/08] We announced the integration with [vLLM Production Stack](https://github.com/vllm-project/production-stack) Team 👋.
2526
- [2025/10/01] We supported to deploy on [Kubernetes](https://vllm-semantic-router.com/docs/installation/kubernetes/) 🌊.
2627
- [2025/09/15] We reached 1000 stars on GitHub! 🔥

website/src/pages/publications.js

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,19 @@ const papers = [
1919
{
2020
id: 2,
2121
type: 'paper',
22+
title: 'Category-Aware Semantic Caching for Heterogeneous LLM Workloads',
23+
authors: 'Chen Wang, Xunzhuo Liu, Yue Zhu, Alaa Youssef, Priya Nagpurkar, Huamin Chen',
24+
venue: '',
25+
year: '2025',
26+
abstract: 'We present a category-aware semantic caching where similarity thresholds, TTLs, and quotas vary by query category, with a hybrid architecture separating in-memory HNSW search from external document storage.',
27+
links: [
28+
{ type: 'paper', url: 'https://arxiv.org/abs/2510.26835', label: '📄 Paper' },
29+
],
30+
featured: true,
31+
},
32+
{
33+
id: 3,
34+
type: 'paper',
2235
title: 'Semantic Inference Routing Protocol (SIRP)',
2336
authors: 'Huamin Chen, Luay Jalil',
2437
venue: 'Internet Engineering Task Force (IETF)',
@@ -30,7 +43,7 @@ const papers = [
3043
featured: true,
3144
},
3245
{
33-
id: 3,
46+
id: 4,
3447
type: 'paper',
3548
title: 'Multi-Provider Extensions for Agentic AI Inference APIs',
3649
authors: 'H. Chen, L. Jalil, N. Cocker',

0 commit comments

Comments
 (0)