paper: Category-Aware Semantic Caching for Heterogeneous LLM Workloads (#578)

Xunzhuo · web-flow · commit 4fbee46c78b3 · 2025-11-03T15:02:09.000+08:00
* paper: Category-Aware Semantic Caching for Heterogeneous LLM Workloads

Signed-off-by: bitliu &lt;bitliu@tencent.com&gt;

* more

Signed-off-by: bitliu &lt;bitliu@tencent.com&gt;

---------

Signed-off-by: bitliu &lt;bitliu@tencent.com&gt;
diff --git a/README.md b/README.md
@@ -16,11 +16,12 @@
 
 *Latest News* 🔥
 
+- [2025/11/03] **Our paper** [Category-Aware Semantic Caching for Heterogeneous LLM Workloads](https://arxiv.org/abs/2510.26835) published 📝
 - [2025/10/26] We reached 2000 stars on GitHub! 🔥
 - [2025/10/21] We announced the [2025 Q4 Roadmap: Journey to Iris](https://vllm-semantic-router.com/blog/q4-roadmap-iris) 📅.
 - [2025/10/16] We established the [vLLM Semantic Router Youtube Channel](https://www.youtube.com/@vLLMSemanticRouter) ✨.
 - [2025/10/15] We announced the [vLLM Semantic Router Dashboard](https://www.youtube.com/watch?v=E2IirN8PsFw) 🚀.
-- [2025/10/12] Our paper [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
+- [2025/10/12] **Our paper** [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
 - [2025/10/08] We announced the integration with [vLLM Production Stack](https://github.com/vllm-project/production-stack) Team 👋.
 - [2025/10/01] We supported to deploy on [Kubernetes](https://vllm-semantic-router.com/docs/installation/kubernetes/) 🌊.
 - [2025/09/15] We reached 1000 stars on GitHub! 🔥
diff --git a/website/src/pages/publications.js b/website/src/pages/publications.js
@@ -19,6 +19,19 @@ const papers = [
   {
     id: 2,
     type: 'paper',
+    title: 'Category-Aware Semantic Caching for Heterogeneous LLM Workloads',
+    authors: 'Chen Wang, Xunzhuo Liu, Yue Zhu, Alaa Youssef, Priya Nagpurkar, Huamin Chen',
+    venue: '',
+    year: '2025',
+    abstract: 'We present a category-aware semantic caching where similarity thresholds, TTLs, and quotas vary by query category, with a hybrid architecture separating in-memory HNSW search from external document storage.',
+    links: [
+      { type: 'paper', url: 'https://arxiv.org/abs/2510.26835', label: '📄 Paper' },
+    ],
+    featured: true,
+  },
+  {
+    id: 3,
+    type: 'paper',
     title: 'Semantic Inference Routing Protocol (SIRP)',
     authors: 'Huamin Chen, Luay Jalil',
     venue: 'Internet Engineering Task Force (IETF)',
@@ -30,7 +43,7 @@ const papers = [
     featured: true,
   },
   {
-    id: 3,
+    id: 4,
     type: 'paper',
     title: 'Multi-Provider Extensions for Agentic AI Inference APIs',
     authors: 'H. Chen, L. Jalil, N. Cocker',