Enable prefix-cache awareness in active-active multi-replica scheduler deployments #578

vMaroon · 2026-01-21T14:58:54Z

Summary

active-active multi-replica scheduler support llm-d/llm-d-kv-cache#212

Summary

Implemented a pod reconciler controller that manages per-pod ZMQ subscribers for KVEvents processing, and the required logic. Also moved the kvevents to the same level of kvcache library, as should have been.

Components:

PodReconciler: Kubernetes controller that watches pods and manages subscribers

This is one standalone approach, actual integration with the scheduler depends on [RFC] Pod lifecycle subscription through data layer kubernetes-sigs/gateway-api-inference-extension#2017

SubscriberManager: Thread-safe manager for multiple ZMQ subscribers

Updated kvevents.Pool - Now works with SubscriberManager instead of a global socket

Added integration + unit tests for all new functionality, updated documentation + examples.

The current integration manages pod discovery through looking at the available pods on every Score call. A pod that does not appear there in any request for 10 minutes is assumed dead.

A proper integration through the data-layer will be implemented once ready in IGW. Tracker: kubernetes-sigs/gateway-api-inference-extension#2017

docs/architecture.md

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

elevran · 2026-01-22T07:43:14Z

/lgtm
/approve

github-project-automation bot added this to llm-d-inference-scheduler Jan 21, 2026

github-actions bot requested review from elevran and nirrozenbaum January 21, 2026 14:59

vMaroon force-pushed the active-active-ha branch from 247add7 to 5fa5844 Compare January 21, 2026 15:24

elevran requested changes Jan 21, 2026

View reviewed changes

docs/architecture.md Outdated Show resolved Hide resolved

github-project-automation bot moved this to In review in llm-d-inference-scheduler Jan 21, 2026

elevran self-requested a review January 21, 2026 15:24

vMaroon and others added 3 commits January 21, 2026 22:43

- active-active-ha support

193a660

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

Update docs/architecture.md

5d25278

Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maroon Ayoub <Maroonay@gmail.com>

lint

e8a081e

Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>

vMaroon force-pushed the active-active-ha branch from 58669eb to e8a081e Compare January 21, 2026 20:44

github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2026

github-actions bot approved these changes Jan 22, 2026

View reviewed changes

elevran approved these changes Jan 22, 2026

View reviewed changes

github-actions bot merged commit c58cf2d into llm-d:main Jan 22, 2026
8 checks passed

github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable prefix-cache awareness in active-active multi-replica scheduler deployments #578

Enable prefix-cache awareness in active-active multi-replica scheduler deployments #578

vMaroon commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

elevran commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable prefix-cache awareness in active-active multi-replica scheduler deployments #578

Enable prefix-cache awareness in active-active multi-replica scheduler deployments #578

Conversation

vMaroon commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

active-active multi-replica scheduler support llm-d/llm-d-kv-cache#212

Summary

Uh oh!

Uh oh!

elevran commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vMaroon commented Jan 21, 2026 •

edited

Loading