You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prevent projects from gobbling too much tpm/rpm. You should use this feature when you want to reserve tpm/rpm capacity for specific projects. For example, a realtime use case should get higher priority than a different use case.
5
+
6
+
Dynamically allocate TPM/RPM quota to api keys, based on active keys in that minute. [**See Code**](https://github.com/BerriAI/litellm/blob/9bffa9a48e610cc6886fc2dce5c1815aeae2ad46/litellm/proxy/hooks/dynamic_rate_limiter.py#L125)
7
+
8
+
1. Setup config.yaml
9
+
10
+
```yaml
11
+
model_list:
12
+
- model_name: my-fake-model
13
+
litellm_params:
14
+
model: gpt-3.5-turbo
15
+
api_key: my-fake-key
16
+
mock_response: hello-world
17
+
tpm: 60
18
+
19
+
litellm_settings:
20
+
callbacks: ["dynamic_rate_limiter_v3"]
21
+
22
+
general_settings:
23
+
master_key: sk-1234 # OR set `LITELLM_MASTER_KEY=".."` in your .env
24
+
database_url: postgres://.. # OR set `DATABASE_URL=".."` in your .env
25
+
```
26
+
27
+
2. Start proxy
28
+
29
+
```bash
30
+
litellm --config /path/to/config.yaml
31
+
```
32
+
33
+
3. Test it!
34
+
35
+
```python
36
+
"""
37
+
- Run 2 concurrent teams calling same model
38
+
- model has 60 TPM
39
+
- Mock response returns 30 total tokens / request
40
+
- Each team will only be able to make 1 request per minute
print("Headers for call 2 - {}".format(response.headers))
82
+
_response = response.parse()
83
+
print("Total tokens for call - {}".format(_response.usage.total_tokens))
84
+
# call proxy with key 2 - fails
85
+
try:
86
+
openai_client_2.chat.completions.with_raw_response.create(model="my-fake-model", messages=[{"role": "user", "content": "Hey, how's it going?"}])
87
+
raiseException("This should have failed!")
88
+
except RateLimitError as e:
89
+
print("This was rate limited b/c - {}".format(str(e)))
90
+
91
+
```
92
+
93
+
**Expected Response**
94
+
95
+
```
96
+
This was rate limited b/c - Error code: 429 - {'error': {'message': {'error': 'Key=<hashed_token> over available TPM=0. Model TPM=0, Active keys=2'}, 'type': 'None', 'param': 'None', 'code': 429}}
97
+
```
98
+
99
+
100
+
#### ✨ [BETA] Set Priority / Reserve Quota
101
+
102
+
Reserve tpm/rpm capacity for projects in prod.
103
+
104
+
:::tip
105
+
106
+
Reserving tpm/rpm on keys based on priority is a premium feature. Please [get an enterprise license](./enterprise.md) for it.
107
+
:::
108
+
109
+
110
+
1. Setup config.yaml
111
+
112
+
```yaml
113
+
model_list:
114
+
- model_name: gpt-3.5-turbo
115
+
litellm_params:
116
+
model: "gpt-3.5-turbo"
117
+
api_key: os.environ/OPENAI_API_KEY
118
+
rpm: 100
119
+
120
+
litellm_settings:
121
+
callbacks: ["dynamic_rate_limiter"]
122
+
priority_reservation: {"dev": 0, "prod": 1}
123
+
124
+
general_settings:
125
+
master_key: sk-1234 # OR set `LITELLM_MASTER_KEY=".."` in your .env
126
+
database_url: postgres://.. # OR set `DATABASE_URL=".."` in your .env
127
+
```
128
+
129
+
130
+
priority_reservation:
131
+
- Dict[str, float]
132
+
- str: can be any string
133
+
- float: from 0 to 1. Specify the % of tpm/rpm to reserve for keys of this priority.
134
+
135
+
**Start Proxy**
136
+
137
+
```
138
+
litellm --config /path/to/config.yaml
139
+
```
140
+
141
+
2. Create a key with that priority
142
+
143
+
```bash
144
+
curl -X POST 'http://0.0.0.0:4000/key/generate' \
145
+
-H 'Authorization: Bearer <your-master-key>' \
146
+
-H 'Content-Type: application/json' \
147
+
-D '{
148
+
"metadata": {"priority": "dev"} # 👈 KEY CHANGE
149
+
}'
150
+
```
151
+
152
+
**Expected Response**
153
+
154
+
```
155
+
{
156
+
...
157
+
"key": "sk-.."
158
+
}
159
+
```
160
+
161
+
162
+
3. Test it!
163
+
164
+
```bash
165
+
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
166
+
-H 'Content-Type: application/json' \
167
+
-H 'Authorization: sk-...'\ # 👈 key from step 2.
168
+
-d '{
169
+
"model": "gpt-3.5-turbo",
170
+
"messages": [
171
+
{
172
+
"role": "user",
173
+
"content": "what llm are you"
174
+
}
175
+
],
176
+
}'
177
+
```
178
+
179
+
**Expected Response**
180
+
181
+
```
182
+
Key=... over available RPM=0. Model RPM=100, Active keys=None
Dynamically allocate TPM/RPM quota to api keys, based on active keys in that minute. [**See Code**](https://github.com/BerriAI/litellm/blob/9bffa9a48e610cc6886fc2dce5c1815aeae2ad46/litellm/proxy/hooks/dynamic_rate_limiter.py#L125)
188
-
189
-
1. Setup config.yaml
190
-
191
-
```yaml
192
-
model_list:
193
-
- model_name: my-fake-model
194
-
litellm_params:
195
-
model: gpt-3.5-turbo
196
-
api_key: my-fake-key
197
-
mock_response: hello-world
198
-
tpm: 60
199
-
200
-
litellm_settings:
201
-
callbacks: ["dynamic_rate_limiter"]
202
-
203
-
general_settings:
204
-
master_key: sk-1234 # OR set `LITELLM_MASTER_KEY=".."` in your .env
205
-
database_url: postgres://.. # OR set `DATABASE_URL=".."` in your .env
206
-
```
207
-
208
-
2. Start proxy
209
-
210
-
```bash
211
-
litellm --config /path/to/config.yaml
212
-
```
213
-
214
-
3. Test it!
215
-
216
-
```python
217
-
"""
218
-
- Run 2 concurrent teams calling same model
219
-
- model has 60 TPM
220
-
- Mock response returns 30 total tokens / request
221
-
- Each team will only be able to make 1 request per minute
print("Headers for call 2 - {}".format(response.headers))
263
-
_response = response.parse()
264
-
print("Total tokens for call - {}".format(_response.usage.total_tokens))
265
-
# call proxy with key 2 - fails
266
-
try:
267
-
openai_client_2.chat.completions.with_raw_response.create(model="my-fake-model", messages=[{"role": "user", "content": "Hey, how's it going?"}])
268
-
raiseException("This should have failed!")
269
-
except RateLimitError as e:
270
-
print("This was rate limited b/c - {}".format(str(e)))
271
-
272
-
```
273
-
274
-
**Expected Response**
275
-
276
-
```
277
-
This was rate limited b/c - Error code: 429 - {'error': {'message': {'error': 'Key=<hashed_token> over available TPM=0. Model TPM=0, Active keys=2'}, 'type': 'None', 'param': 'None', 'code': 429}}
278
-
```
279
-
280
-
281
-
#### ✨ [BETA] Set Priority / Reserve Quota
282
-
283
-
Reserve tpm/rpm capacity for projects in prod.
284
-
285
-
:::tip
286
-
287
-
Reserving tpm/rpm on keys based on priority is a premium feature. Please [get an enterprise license](./enterprise.md) for it.
288
-
:::
289
-
290
-
291
-
1. Setup config.yaml
292
-
293
-
```yaml
294
-
model_list:
295
-
- model_name: gpt-3.5-turbo
296
-
litellm_params:
297
-
model: "gpt-3.5-turbo"
298
-
api_key: os.environ/OPENAI_API_KEY
299
-
rpm: 100
300
-
301
-
litellm_settings:
302
-
callbacks: ["dynamic_rate_limiter"]
303
-
priority_reservation: {"dev": 0, "prod": 1}
304
-
305
-
general_settings:
306
-
master_key: sk-1234 # OR set `LITELLM_MASTER_KEY=".."` in your .env
307
-
database_url: postgres://.. # OR set `DATABASE_URL=".."` in your .env
308
-
```
309
-
310
-
311
-
priority_reservation:
312
-
- Dict[str, float]
313
-
- str: can be any string
314
-
- float: from 0 to 1. Specify the % of tpm/rpm to reserve for keys of this priority.
315
-
316
-
**Start Proxy**
317
-
318
-
```
319
-
litellm --config /path/to/config.yaml
320
-
```
321
-
322
-
2. Create a key with that priority
323
-
324
-
```bash
325
-
curl -X POST 'http://0.0.0.0:4000/key/generate' \
326
-
-H 'Authorization: Bearer <your-master-key>' \
327
-
-H 'Content-Type: application/json' \
328
-
-D '{
329
-
"metadata": {"priority": "dev"} # 👈 KEY CHANGE
330
-
}'
331
-
```
332
-
333
-
**Expected Response**
334
-
335
-
```
336
-
{
337
-
...
338
-
"key": "sk-.."
339
-
}
340
-
```
341
-
342
-
343
-
3. Test it!
344
-
345
-
```bash
346
-
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
347
-
-H 'Content-Type: application/json' \
348
-
-H 'Authorization: sk-...'\ # 👈 key from step 2.
349
-
-d '{
350
-
"model": "gpt-3.5-turbo",
351
-
"messages": [
352
-
{
353
-
"role": "user",
354
-
"content": "what llm are you"
355
-
}
356
-
],
357
-
}'
358
-
```
359
-
360
-
**Expected Response**
361
-
362
-
```
363
-
Key=... over available RPM=0. Model RPM=100, Active keys=None
0 commit comments