-
Notifications
You must be signed in to change notification settings - Fork 374
quickfix re. pricing system #1597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
docs/api-inference/rate-limits.md
Outdated
| You get charged for every inference request, based on the compute time x price of the underlying hardware. | ||
|
|
||
| Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider [Inference Endpoints](https://huggingface.co/docs/inference-endpoints) to have dedicated resources. | ||
| For instance, a request to [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) that takes 10 seconds to complete on a GPU machine that costs $0.00012 per second to run, will be billed $0.0012. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is slightly confusing - isn't it more based on the inference provider and what they charge? and specifically for LLMs more on the basis of the tokens processed and generated?
Maybe we can use a different example here, say Flux, each generation is X dollars?
(so it's a bit easy to grok)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
essentially talking about the routed requests here (maybe it's supposed to be somewhere else and I'm confused)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doc is about HF's own Inference , not Inference Providers, but i agree its a tad confusing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take an example that actually runs on HF's Inference API then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, do you have one on hand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B" if we still want to ride the deepseek wave
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops i just merged vb's suggestion (Flux) but we can add more examples in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally fine with Flux! Makes sense to set an example with fixed cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets tons of abuse tho - and is down quite a lot - recommended BFL flux instead - which always works 😅
Co-authored-by: Lucain <[email protected]>
Co-authored-by: vb <[email protected]>
Co-authored-by: vb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
the important part is in rate-limits.md