-
Notifications
You must be signed in to change notification settings - Fork 858
Closed
Description
Now that most rate limits are handled by the Hub (see internal PRs #15081 and #15100), the server is returning extra information when getting rate limited, following ietf proposal: https://www.ietf.org/archive/id/draft-ietf-httpapi-ratelimit-headers-09.html. In practice here is how it looks like:
>>> print(response.headers)
Headers({..., ..., 'ratelimit': '"api";r=0;t=55', 'ratelimit-policy': '"fixed window";"api";q=500;w=300', ...})with
"fixed window"is the policy type (let's assume it'll always be a fixed window from the Hub)"api"=> endpoint group that has triggered the rate limitq=500=> limit (max number of requests to that endpoint group in the same fixed window)w=300=> window in secondsr=0=> 0 remaining calls before getting rate limitedt=55=> number of seconds before the end of the fixed windows, i.e. before counter is reset
Another example:
>>> client.get("https://huggingface.co/api/models/moonshotai/Kimi-K2-Thinking").headers["ratelimit"]
'"api";r=489;t=189'
>>> client.get("https://huggingface.co/api/models/moonshotai/Kimi-K2-Thinking").headers["ratelimit-policy"]
'"fixed window";"api";q=500;w=300'- request succeeded
- still the same policy
'"fixed window";"api";q=500;w=300' - "489 remaining calls for the next 189 seconds"
[Feature request]
Let's take advantage of these headers!
- in
hf_raise_for_status=> we can print more informative error message in case of 429 - in
http_backoff=> if we retry on 429, let's wait for the window to reset (if within the timeout period) otherwise we retry knowing it'll fail - once 2. is addressed, we can reassess our retry mechanism when downloading files (especially when fetching file metadata). We have been retrying on 429 for a few months but it lead to increased issues because we couldn't do it properly. Now that we have the correct headers it should be more beneficial than harmful :) (see Do not retry on 429 (only on 5xx) #3377)
The 3 bullet points above can be tackled in separate PRs (let's at least start with a first one introducing the correct headers parser).
cc @coyotte508 who implemented the rate limits server-side
See also related issue:
hanouticelina, lhoestq, julien-c and eric-czech
Metadata
Metadata
Assignees
Labels
No labels