Skip to content

Handle rate limit headers in case of HTTP 429 #3553

@Wauplin

Description

@Wauplin

Now that most rate limits are handled by the Hub (see internal PRs #15081 and #15100), the server is returning extra information when getting rate limited, following ietf proposal: https://www.ietf.org/archive/id/draft-ietf-httpapi-ratelimit-headers-09.html. In practice here is how it looks like:

>>> print(response.headers)
Headers({..., ..., 'ratelimit': '"api";r=0;t=55', 'ratelimit-policy': '"fixed window";"api";q=500;w=300', ...})

with

  • "fixed window" is the policy type (let's assume it'll always be a fixed window from the Hub)
  • "api" => endpoint group that has triggered the rate limit
  • q=500 => limit (max number of requests to that endpoint group in the same fixed window)
  • w=300 => window in seconds
  • r=0 => 0 remaining calls before getting rate limited
  • t=55 => number of seconds before the end of the fixed windows, i.e. before counter is reset

Another example:

>>> client.get("https://huggingface.co/api/models/moonshotai/Kimi-K2-Thinking").headers["ratelimit"]
'"api";r=489;t=189'
>>> client.get("https://huggingface.co/api/models/moonshotai/Kimi-K2-Thinking").headers["ratelimit-policy"]
'"fixed window";"api";q=500;w=300'
  • request succeeded
  • still the same policy '"fixed window";"api";q=500;w=300'
  • "489 remaining calls for the next 189 seconds"

[Feature request]

Let's take advantage of these headers!

  1. in hf_raise_for_status => we can print more informative error message in case of 429
  2. in http_backoff => if we retry on 429, let's wait for the window to reset (if within the timeout period) otherwise we retry knowing it'll fail
  3. once 2. is addressed, we can reassess our retry mechanism when downloading files (especially when fetching file metadata). We have been retrying on 429 for a few months but it lead to increased issues because we couldn't do it properly. Now that we have the correct headers it should be more beneficial than harmful :) (see Do not retry on 429 (only on 5xx) #3377)

The 3 bullet points above can be tackled in separate PRs (let's at least start with a first one introducing the correct headers parser).

cc @coyotte508 who implemented the rate limits server-side


See also related issue:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions