Skip to content

Commit 9d6c06d

Browse files
doc: perf update (#15211)
1 parent 02cb7a4 commit 9d6c06d

File tree

1 file changed

+35
-0
lines changed
  • docs/my-website/release_notes/v1.77.7-stable

1 file changed

+35
-0
lines changed

docs/my-website/release_notes/v1.77.7-stable/index.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,41 @@ pip install litellm==1.77.7.rc.1
6565
- **AMD Lemonade & Nvidia NIM** - New provider support for AMD Lemonade and Nvidia NIM Rerank
6666
- **GitLab Prompt Management** - GitLab-based prompt management integration
6767

68+
### 62.5% Faster P99 Latency
69+
70+
This update removes LiteLLM router inefficiencies, reducing complexity from O(M×N) to O(1). Previously, it built a new array and ran repeated checks like data["model"] in llm_router.get_model_ids(). Now, a direct ID-to-deployment map eliminates redundant allocations and scans.
71+
72+
As a result, performance improved across all latency percentiles:
73+
74+
- **Median latency:** 600 ms → **280 ms** (−53%)
75+
- **p95 latency:** 1,900 ms → **520 ms** (−72%)
76+
- **p99 latency:** 3,000 ms → **1,000 ms** (−62.5%)
77+
- **Average latency:** 864 ms → **310 ms** (−64%)
78+
79+
Overall throughput increased to ~1,880 RPS (aggregated) per instance, while maintaining low overhead (~27 ms average).
80+
81+
#### Test Setup
82+
83+
**Locust**
84+
85+
- **Concurrent users:** 1,000
86+
- **Ramp-up:** 500
87+
88+
**System Specs**
89+
90+
- **CPU:** 8 vCPUs
91+
- **Memory:** 32 GB RAM
92+
- **LiteLLM Workers:** 8
93+
- **Instances**: 1
94+
95+
**Configuration (config.yaml)**
96+
97+
View the complete configuration: [gist.github.com/AlexsanderHamir/config.yaml](https://gist.github.com/AlexsanderHamir/53f7d554a5d2afcf2c4edb5b6be68ff4)
98+
99+
**Load Script (no_cache_hits.py)**
100+
101+
View the complete load testing script: [gist.github.com/AlexsanderHamir/no_cache_hits.py](https://gist.github.com/AlexsanderHamir/42c33d7a4dc7a57f56a78b560dee3a42)
102+
68103
## New Models / Updated Models
69104

70105
#### New Model Support

0 commit comments

Comments
 (0)