Skip to content

Commit 78506d1

Browse files
committed
docs fix
1 parent 41ce31b commit 78506d1

File tree

1 file changed

+79
-34
lines changed

1 file changed

+79
-34
lines changed

docs/my-website/docs/proxy/dynamic_rate_limit.md

Lines changed: 79 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -101,88 +101,133 @@ This was rate limited b/c - Error code: 429 - {'error': {'message': {'error': 'K
101101

102102
## [BETA] Set Priority / Reserve Quota
103103

104-
Reserve tpm/rpm capacity for projects in prod. You should use this feature when you want to reserve tpm/rpm capacity for specific projects. For example, a realtime use case should get higher priority than a different use case.
104+
Reserve TPM/RPM capacity for different environments or use cases. This ensures critical production workloads always have guaranteed capacity, while development or lower-priority tasks use remaining quota.
105105

106+
**Use Cases:**
107+
- Production vs Development environments
108+
- Real-time applications vs batch processing
109+
- Critical services vs experimental features
106110

107111
:::tip
108112

109-
Reserving tpm/rpm on keys based on priority is a premium feature. Please [get an enterprise license](./enterprise.md) for it.
113+
Reserving TPM/RPM on keys based on priority is a premium feature. Please [get an enterprise license](./enterprise.md) for it.
110114
:::
111115

112-
### Usage
116+
### How Priority Reservation Works
113117

114-
1. Setup config.yaml
118+
Priority reservation allocates a percentage of your model's total TPM/RPM to specific priority levels. Keys with higher priority get guaranteed access to their reserved quota first.
115119

116-
```yaml
120+
**Example Scenario:**
121+
- Model has 10 RPM total capacity
122+
- Priority reservation: `{"prod": 0.9, "dev": 0.1}`
123+
- Result: Production keys get 9 RPM guaranteed, Development keys get 1 RPM guaranteed
124+
125+
### Configuration
126+
127+
#### 1. Setup config.yaml
128+
129+
```yaml showLineNumbers title="config.yaml"
117130
model_list:
118131
- model_name: gpt-3.5-turbo
119132
litellm_params:
120133
model: "gpt-3.5-turbo"
121134
api_key: os.environ/OPENAI_API_KEY
122-
rpm: 100
135+
rpm: 10 # Total model capacity
123136

124137
litellm_settings:
125138
callbacks: ["dynamic_rate_limiter_v3"]
126-
priority_reservation: {"dev": 0, "prod": 1}
139+
priority_reservation:
140+
"prod": 0.9 # 90% reserved for production (9 RPM)
141+
"dev": 0.1 # 10% reserved for development (1 RPM)
127142

128143
general_settings:
129144
master_key: sk-1234 # OR set `LITELLM_MASTER_KEY=".."` in your .env
130-
database_url: postgres://.. # OR set `DATABASE_URL=".."` in your .env
145+
database_url: postgres://.. # OR set `DATABASE_URL=".."` in your.env
131146
```
132147
148+
**Configuration Details:**
133149
134-
priority_reservation:
135-
- Dict[str, float]
136-
- str: can be any string
137-
- float: from 0 to 1. Specify the % of tpm/rpm to reserve for keys of this priority.
150+
`priority_reservation`: Dict[str, float]
151+
- **Key (str)**: Priority level name (can be any string like "prod", "dev", "critical", etc.)
152+
- **Value (float)**: Percentage of total TPM/RPM to reserve (0.0 to 1.0)
153+
- **Note**: Values should sum to 1.0 or less
138154

139155
**Start Proxy**
140156

141-
```
157+
```bash
142158
litellm --config /path/to/config.yaml
143159
```
144160

145-
2. Create a key with that priority
161+
#### 2. Create Keys with Priority Levels
146162

163+
**Production Key:**
147164
```bash
148165
curl -X POST 'http://0.0.0.0:4000/key/generate' \
149-
-H 'Authorization: Bearer <your-master-key>' \
166+
-H 'Authorization: Bearer sk-1234' \
150167
-H 'Content-Type: application/json' \
151-
-D '{
152-
"metadata": {"priority": "dev"} # 👈 KEY CHANGE
168+
-d '{
169+
"metadata": {"priority": "prod"}
153170
}'
154171
```
155172

156-
**Expected Response**
157-
173+
**Development Key:**
174+
```bash
175+
curl -X POST 'http://0.0.0.0:4000/key/generate' \
176+
-H 'Authorization: Bearer sk-1234' \
177+
-H 'Content-Type: application/json' \
178+
-d '{
179+
"metadata": {"priority": "dev"}
180+
}'
158181
```
182+
183+
**Expected Response for both:**
184+
```json
159185
{
186+
"key": "sk-...",
187+
"metadata": {"priority": "prod"}, // or "dev"
160188
...
161-
"key": "sk-.."
162189
}
163190
```
164191

192+
#### 3. Test Priority Allocation
165193

166-
3. Test it!
194+
**Test Production Key (should get 9 RPM):**
195+
```bash
196+
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
197+
-H 'Content-Type: application/json' \
198+
-H 'Authorization: Bearer sk-prod-key' \
199+
-d '{
200+
"model": "gpt-3.5-turbo",
201+
"messages": [{"role": "user", "content": "Hello from prod"}]
202+
}'
203+
```
167204

205+
**Test Development Key (should get 1 RPM):**
168206
```bash
169207
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
170208
-H 'Content-Type: application/json' \
171-
-H 'Authorization: sk-...' \ # 👈 key from step 2.
209+
-H 'Authorization: Bearer sk-dev-key' \
172210
-d '{
173-
"model": "gpt-3.5-turbo",
174-
"messages": [
175-
{
176-
"role": "user",
177-
"content": "what llm are you"
178-
}
179-
],
180-
}'
211+
"model": "gpt-3.5-turbo",
212+
"messages": [{"role": "user", "content": "Hello from dev"}]
213+
}'
181214
```
182215

183-
**Expected Response**
216+
### Expected Behavior
184217

185-
```
186-
Key=... over available RPM=0. Model RPM=100, Active keys=None
187-
```
218+
With the configuration above:
219+
220+
1. **Production keys** can make up to 9 requests per minute
221+
2. **Development keys** can make up to 1 request per minute
222+
3. Production requests are never blocked by development usage
188223

224+
**Rate Limit Error Example:**
225+
```json
226+
{
227+
"error": {
228+
"message": "Key=sk-dev-... over available RPM=0. Model RPM=10, Reserved RPM for priority 'dev'=1, Active keys=1",
229+
"type": "rate_limit_exceeded",
230+
"code": 429
231+
}
232+
}
233+
```

0 commit comments

Comments
 (0)