Skip to content

Commit d12da2e

Browse files
committed
docs fix
1 parent 9d6c06d commit d12da2e

File tree

2 files changed

+7
-5
lines changed

2 files changed

+7
-5
lines changed
253 KB
Loading

docs/my-website/release_notes/v1.77.7-stable/index.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,21 +59,23 @@ pip install litellm==1.77.7.rc.1
5959
## Key Highlights
6060

6161
- **Dynamic Rate Limiter v3** - Automatically maximizes throughput when capacity is available (< 80% saturation) by allowing lower-priority requests to use unused capacity, then switches to fair priority-based allocation under high load (≥ 80%) to prevent blocking
62-
- **Major Performance Improvements** - Router optimization reducing P99 latency by 62.5%, cache improvements from O(n*log(n)) to O(log(n))
62+
- **Major Performance Improvements** - 2.9x lower median latency at 1,000 concurrent users.
6363
- **Claude Sonnet 4.5** - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing
6464
- **MCP Gateway Enhancements** - Fine-grained tool control, server permissions, and forwardable headers
6565
- **AMD Lemonade & Nvidia NIM** - New provider support for AMD Lemonade and Nvidia NIM Rerank
6666
- **GitLab Prompt Management** - GitLab-based prompt management integration
6767

68-
### 62.5% Faster P99 Latency
68+
### 2.9x Lower Median Latency
69+
70+
<Image img={require('../../img/perf_77_7.png')} style={{ width: '800px', height: 'auto' }} />
6971

7072
This update removes LiteLLM router inefficiencies, reducing complexity from O(M×N) to O(1). Previously, it built a new array and ran repeated checks like data["model"] in llm_router.get_model_ids(). Now, a direct ID-to-deployment map eliminates redundant allocations and scans.
7173

7274
As a result, performance improved across all latency percentiles:
7375

74-
- **Median latency:** 600 ms → **280 ms** (−53%)
75-
- **p95 latency:** 1,900 ms → **520 ms** (−72%)
76-
- **p99 latency:** 3,000 ms → **1,000 ms** (−62.5%)
76+
- **Median latency:** 320 ms → **110 ms** (−65.6%)
77+
- **p95 latency:** 850 ms → **440 ms** (−48.2%)
78+
- **p99 latency:** 1,400 ms → **810 ms** (−42.1%)
7779
- **Average latency:** 864 ms → **310 ms** (−64%)
7880

7981
Overall throughput increased to ~1,880 RPS (aggregated) per instance, while maintaining low overhead (~27 ms average).

0 commit comments

Comments
 (0)