-
Notifications
You must be signed in to change notification settings - Fork 124
Open
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Description
Problem
Prefill and decode pods are currently selected independently without network locality awareness. KV cache transfer between pods on different nodes incurs inter-node network latency (infiniband upto 400 GB/s)
vs. intra-node transfer (NVLink upto 900GB/s), directly impacting TTFT SLO compliance.
Proposal
Add topology-aware prefill pod selection:
- After selecting decode pod, prefer prefill pods on same node (giving them a higher score)
- Fallback to global selection if non-colocated pods still have a higher score
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Type
Projects
Status
Backlog