You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llm-d experiments show that routing requests to the replica with the highest KV cache hit rate dramatically improves latency and throughput. We should measure this impact and incorporate it into recommendations.
Acceptance Criteria
Benchmark multi-replica deployments with and without KV cache-aware routing
Measure latency and throughput improvements
Update capacity planning logic to account for routing efficiency
Document routing strategies (round-robin vs cache-aware vs semantic routing)