Topology Aware Routing to lower KV Cache transfer latency

### Problem
 Prefill and decode pods are currently selected independently without network locality awareness. KV cache transfer between pods on different nodes incurs inter-node network latency (infiniband upto 400 GB/s)
 vs. intra-node transfer (NVLink upto 900GB/s), directly impacting TTFT SLO compliance.

 ### Proposal
 Add topology-aware prefill pod selection:
 - After selecting decode pod, prefer prefill pods on same node (giving them a higher score)
 - Fallback to global selection if non-colocated pods still have a higher score


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topology Aware Routing to lower KV Cache transfer latency #545

Problem

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Topology Aware Routing to lower KV Cache transfer latency #545

Description

Problem

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions