You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/my-website/docs/proxy/dynamic_rate_limit.md
+79-34Lines changed: 79 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -101,88 +101,133 @@ This was rate limited b/c - Error code: 429 - {'error': {'message': {'error': 'K
101
101
102
102
## [BETA] Set Priority / Reserve Quota
103
103
104
-
Reserve tpm/rpm capacity for projects in prod. You should use this feature when you want to reserve tpm/rpm capacity for specific projects. For example, a realtime use case should get higher priority than a different use case.
104
+
Reserve TPM/RPM capacity for different environments or use cases. This ensures critical production workloads always have guaranteed capacity, while development or lower-priority tasks use remaining quota.
105
105
106
+
**Use Cases:**
107
+
- Production vs Development environments
108
+
- Real-time applications vs batch processing
109
+
- Critical services vs experimental features
106
110
107
111
:::tip
108
112
109
-
Reserving tpm/rpm on keys based on priority is a premium feature. Please [get an enterprise license](./enterprise.md) for it.
113
+
Reserving TPM/RPM on keys based on priority is a premium feature. Please [get an enterprise license](./enterprise.md) for it.
110
114
:::
111
115
112
-
### Usage
116
+
### How Priority Reservation Works
113
117
114
-
1. Setup config.yaml
118
+
Priority reservation allocates a percentage of your model's total TPM/RPM to specific priority levels. Keys with higher priority get guaranteed access to their reserved quota first.
0 commit comments