You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Oracle Cloud Infrastructure** - New LLM provider for calling models on Oracle Cloud Infrastructure.
51
51
-**Digital Ocean's Gradient AI** - New LLM provider for calling models on Digital Ocean's Gradient AI platform.
52
52
53
-
54
-
### 54% RPS Improvement
55
-
56
-
Throughput increased by 54% (1,040 → 1,602 RPS, aggregated) per instance while maintaining a 40 ms median overhead. The improvement comes from fixing major O(n²) inefficiencies in the router, primarily caused by repeated use of in statements inside loops over large arrays. Tests were run with a database-only setup (no cache hits). As a result, p95 latency improved by 30% (2,700 → 1,900 ms), enhancing overall stability and scalability under heavy load.
57
-
58
-
---
59
-
60
-
### Test Setup
61
-
62
-
All benchmarks were executed using Locust with 1,000 concurrent users and a ramp-up of 500. The environment was configured to stress the routing layer and eliminate caching as a variable.
63
-
64
-
**System Specs**
65
-
66
-
-**CPU:** 8 vCPUs
67
-
-**Memory:** 32 GB RAM
68
-
69
-
**Configuration (config.yaml)**
70
-
71
-
View the complete configuration: [gist.github.com/AlexsanderHamir/config.yaml](https://gist.github.com/AlexsanderHamir/53f7d554a5d2afcf2c4edb5b6be68ff4)
72
-
73
-
**Load Script (no_cache_hits.py)**
74
-
75
-
View the complete load testing script: [gist.github.com/AlexsanderHamir/no_cache_hits.py](https://gist.github.com/AlexsanderHamir/42c33d7a4dc7a57f56a78b560dee3a42)
Throughput increased by 54% (1,040 → 1,602 RPS, aggregated) per instance while maintaining a 40 ms median overhead. The improvement comes from fixing major O(n²) inefficiencies in the router, primarily caused by repeated use of in statements inside loops over large arrays. Tests were run with a database-only setup (no cache hits). As a result, p95 latency improved by 30% (2,700 → 1,900 ms), enhancing overall stability and scalability under heavy load.
63
+
64
+
---
65
+
66
+
### Test Setup
67
+
68
+
All benchmarks were executed using Locust with 1,000 concurrent users and a ramp-up of 500. The environment was configured to stress the routing layer and eliminate caching as a variable.
69
+
70
+
**System Specs**
71
+
72
+
-**CPU:** 8 vCPUs
73
+
-**Memory:** 32 GB RAM
74
+
75
+
**Configuration (config.yaml)**
76
+
77
+
View the complete configuration: [gist.github.com/AlexsanderHamir/config.yaml](https://gist.github.com/AlexsanderHamir/53f7d554a5d2afcf2c4edb5b6be68ff4)
78
+
79
+
**Load Script (no_cache_hits.py)**
80
+
81
+
View the complete load testing script: [gist.github.com/AlexsanderHamir/no_cache_hits.py](https://gist.github.com/AlexsanderHamir/42c33d7a4dc7a57f56a78b560dee3a42)
0 commit comments