You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Dynamic Rate Limiter v3** - Automatically maximizes throughput when capacity is available (< 80% saturation) by allowing lower-priority requests to use unused capacity, then switches to fair priority-based allocation under high load (≥ 80%) to prevent blocking
62
-
-**Major Performance Improvements** - Router optimization reducing P99 latency by 62.5%, cache improvements from O(n*log(n)) to O(log(n))
62
+
-**Major Performance Improvements** - 2.9x lower median latency at 1,000 concurrent users.
63
63
-**Claude Sonnet 4.5** - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing
64
64
-**MCP Gateway Enhancements** - Fine-grained tool control, server permissions, and forwardable headers
65
65
-**AMD Lemonade & Nvidia NIM** - New provider support for AMD Lemonade and Nvidia NIM Rerank
This update removes LiteLLM router inefficiencies, reducing complexity from O(M×N) to O(1). Previously, it built a new array and ran repeated checks like data["model"] in llm_router.get_model_ids(). Now, a direct ID-to-deployment map eliminates redundant allocations and scans.
71
73
72
74
As a result, performance improved across all latency percentiles:
73
75
74
-
-**Median latency:**600 ms → **280 ms** (−53%)
75
-
-**p95 latency:**1,900 ms → **520 ms** (−72%)
76
-
-**p99 latency:**3,000 ms → **1,000 ms** (−62.5%)
76
+
-**Median latency:**320 ms → **110 ms** (−65.6%)
77
+
-**p95 latency:**850 ms → **440 ms** (−48.2%)
78
+
-**p99 latency:**1,400 ms → **810 ms** (−42.1%)
77
79
-**Average latency:** 864 ms → **310 ms** (−64%)
78
80
79
81
Overall throughput increased to ~1,880 RPS (aggregated) per instance, while maintaining low overhead (~27 ms average).
0 commit comments