Skip to content

Conversation

@julien-c
Copy link
Member

@julien-c julien-c commented Feb 3, 2025

the important part is in rate-limits.md

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

You get charged for every inference request, based on the compute time x price of the underlying hardware.

Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to have dedicated resources.
For instance, a request to [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slightly confusing - isn't it more based on the inference provider and what they charge? and specifically for LLMs more on the basis of the tokens processed and generated?

Maybe we can use a different example here, say Flux, each generation is X dollars?

(so it's a bit easy to grok)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially talking about the routed requests here (maybe it's supposed to be somewhere else and I'm confused)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doc is about HF's own Inference , not Inference Providers, but i agree its a tad confusing :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take an example that actually runs on HF's Inference API then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, do you have one on hand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B" if we still want to ride the deepseek wave

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops i just merged vb's suggestion (Flux) but we can add more examples in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally fine with Flux! Makes sense to set an example with fixed cost

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets tons of abuse tho - and is down quite a lot - recommended BFL flux instead - which always works 😅

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@julien-c julien-c merged commit 91ba7a3 into main Feb 4, 2025
2 checks passed
@julien-c julien-c deleted the api-quotas branch February 4, 2025 13:27
Copy link
Contributor

@SBrandeis SBrandeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants