You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/my-website/docs/proxy/cost_tracking.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,10 @@ Track spend for keys, users, and teams across 100+ LLMs.
8
8
9
9
LiteLLM automatically tracks spend for all known models. See our [model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
10
10
11
+
:::tip Keep Pricing Data Updated
12
+
[Sync model pricing data from GitHub](../sync_models_github.md) to ensure accurate cost tracking.
Copy file name to clipboardExpand all lines: docs/my-website/docs/proxy/model_management.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,10 @@ model_list:
19
19
20
20
Retrieve detailed information about each model listed in the `/model/info` endpoint, including descriptions from the `config.yaml` file, and additional model info (e.g. max tokens, cost per input token, etc.) pulled from the model_info you set and the [litellm model cost map](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). Sensitive details like API keys are excluded for security purposes.
21
21
22
+
:::tip Sync Model Data
23
+
Keep your model pricing data up to date by [syncing models from GitHub](../sync_models_github.md).
Sync model pricing data from GitHub's `model_prices_and_context_window.json` file outside of the LiteLLM UI.
4
+
5
+
> **📹 Video Tutorial**: [Watch how to sync models via the Admin UI](https://www.loom.com/share/ba41acc1882d41b284bbddbb0e9c27ce?sid=bdae351e-2026-4e39-932b-fcb185ff612c)
6
+
7
+
## Quick Start
8
+
9
+
**Manual sync:**
10
+
```bash
11
+
curl -X POST "https://your-proxy-url/reload/model_cost_map" \
12
+
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
13
+
-H "Content-Type: application/json"
14
+
```
15
+
16
+
**Automatic sync every 6 hours:**
17
+
```bash
18
+
curl -X POST "https://your-proxy-url/schedule/model_cost_map_reload?hours=6" \
19
+
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
20
+
-H "Content-Type: application/json"
21
+
```
22
+
23
+
## API Endpoints
24
+
25
+
| Endpoint | Method | Description |
26
+
|----------|--------|-------------|
27
+
|`/reload/model_cost_map`| POST | Manual sync |
28
+
|`/schedule/model_cost_map_reload?hours={hours}`| POST | Schedule periodic sync |
-**Digital Ocean's Gradient AI** - New LLM provider for calling models on Digital Ocean's Gradient AI platform.
52
52
53
53
54
+
### 54% RPS Improvement
55
+
56
+
Throughput increased by 54% (1,040 → 1,602 RPS, aggregated) per instance while maintaining a 40 ms median overhead. The improvement comes from fixing major O(n²) inefficiencies in the router, primarily caused by repeated use of in statements inside loops over large arrays. Tests were run with a database-only setup (no cache hits). As a result, p95 latency improved by 30% (2,700 → 1,900 ms), enhancing overall stability and scalability under heavy load.
57
+
58
+
---
59
+
60
+
### Test Setup
61
+
62
+
All benchmarks were executed using Locust with 1,000 concurrent users and a ramp-up of 500. The environment was configured to stress the routing layer and eliminate caching as a variable.
63
+
64
+
**System Specs**
65
+
66
+
-**CPU:** 8 vCPUs
67
+
-**Memory:** 32 GB RAM
68
+
69
+
**Configuration (config.yaml)**
70
+
71
+
View the complete configuration: [gist.github.com/AlexsanderHamir/config.yaml](https://gist.github.com/AlexsanderHamir/53f7d554a5d2afcf2c4edb5b6be68ff4)
72
+
73
+
**Load Script (no_cache_hits.py)**
74
+
75
+
View the complete load testing script: [gist.github.com/AlexsanderHamir/no_cache_hits.py](https://gist.github.com/AlexsanderHamir/42c33d7a4dc7a57f56a78b560dee3a42)
76
+
77
+
---
78
+
54
79
### Risk of Upgrade
55
80
56
81
If you build the proxy from the pip package, you should hold off on upgrading. This version makes `prisma migrate deploy` our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual `prisma generate` step.
0 commit comments