Skip to content

Commit 4697036

Browse files
Wauplinjulien-c
andauthored
Some tweaks in wording between inference providers / hf inference API (#1689)
* Some tweaks in wording between inference providers / hf inference API * add link * Update docs/hub/models-inference.md --------- Co-authored-by: Julien Chaumond <[email protected]>
1 parent 01baa08 commit 4697036

File tree

5 files changed

+19
-26
lines changed

5 files changed

+19
-26
lines changed

docs/hub/academia-hub.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ Key Features of Academia Hub:
2525
- **Spaces Hosting:** Create ZeroGPU Spaces with A100 hardware.
2626
- **Spaces Dev Mode:** Fast iterations via SSH/VS Code for Spaces.
2727
- **Inference Providers:** Get monthly included credits across all Inference Providers.
28-
- **HF Inference API:** Get x20 higher rate limits on HF Serverless API.
2928
- **Dataset Viewer:** Activate it on private datasets.
3029
- **Blog Articles:** Publish articles to the Hugging Face blog.
3130
- **Social Posts:** Share short updates with the community.

docs/hub/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ The Hub offers **versioning, commit history, diffs, branches, and over a dozen l
110110

111111
## Models
112112

113-
You can discover and use dozens of thousands of open-source ML models shared by the community. To promote responsible model usage and development, model repos are equipped with [Model Cards](./model-cards) to inform users of each model's limitations and biases. Additional [metadata](./model-cards#model-card-metadata) about info such as their tasks, languages, and evaluation results can be included, with training metrics charts even added if the repository contains [TensorBoard traces](./tensorboard). It's also easy to add an [**inference widget**](./models-widgets) to your model, allowing anyone to play with the model directly in the browser! For programmatic access, a serverless API is provided to [**instantly serve your model**](./models-inference).
113+
You can discover and use dozens of thousands of open-source ML models shared by the community. To promote responsible model usage and development, model repos are equipped with [Model Cards](./model-cards) to inform users of each model's limitations and biases. Additional [metadata](./model-cards#model-card-metadata) about info such as their tasks, languages, and evaluation results can be included, with training metrics charts even added if the repository contains [TensorBoard traces](./tensorboard). It's also easy to add an [**inference widget**](./models-widgets) to your model, allowing anyone to play with the model directly in the browser! For programmatic access, a serverless API is provided by [**Inference Providers**](./models-inference).
114114

115115
To upload models to the Hub, or download models and integrate them into your work, explore the [**Models documentation**](./models). You can also choose from [**over a dozen libraries**](./models-libraries) such as 🤗 Transformers, Asteroid, and ESPnet that support the Hub.
116116

docs/hub/models-inference.md

Lines changed: 14 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,29 @@
22

33
Please refer to the [Inference Providers Documentation](https://huggingface.co/docs/inference-providers) for detailed information.
44

5+
## What is HF-Inference API?
56

6-
## What technology do you use to power the HF-Inference API?
7-
8-
For 🤗 Transformers models, [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) power the API.
7+
HF-Inference API is one of the many providers available on the Hugging Face Hub.
8+
It is deployed by Hugging Face ourselves, using text-generation-inference for LLMs for instance. This service used to be called “Inference API (serverless)” prior to Inference Providers.
99

10-
On top of `Pipelines` and depending on the model type, there are several production optimizations like:
11-
- compiling models to optimized intermediary representations (e.g. [ONNX](https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333)),
12-
- maintaining a Least Recently Used cache, ensuring that the most popular models are always loaded,
13-
- scaling the underlying compute infrastructure on the fly depending on the load constraints.
10+
For more details about the HF-Inference API, check out it's [dedicated page](https://huggingface.co/docs/inference-providers/providers/hf-inference).
1411

15-
For models from [other libraries](./models-libraries), the API uses [Starlette](https://www.starlette.io) and runs in [Docker containers](https://github.com/huggingface/api-inference-community/tree/main/docker_images). Each library defines the implementation of [different pipelines](https://github.com/huggingface/api-inference-community/tree/main/docker_images/sentence_transformers/app/pipelines).
16-
17-
## How can I turn off the HF-Inference API for my model?
12+
## What technology do you use to power the HF-Inference API?
1813

19-
Specify `inference: false` in your model card's metadata.
14+
The HF-Inference API is powered by [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) under the hood.
2015

2116
## Why don't I see an inference widget, or why can't I use the API?
2217

23-
For some tasks, there might not be support in the HF-Inference API, and, hence, there is no widget.
24-
For all libraries (except 🤗 Transformers), there is a [library-to-tasks.ts file](https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/library-to-tasks.ts) of supported tasks in the API. When a model repository has a task that is not supported by the repository library, the repository has `inference: false` by default.
25-
26-
## Can I send large volumes of requests? Can I get accelerated APIs?
27-
28-
If you are interested in accelerated inference, higher volumes of requests, or an SLA, please contact us at `api-enterprise at huggingface.co`.
18+
For some tasks, there might not be support by any Inference Provider, and hence, there is no widget.
2919

3020
## How can I see my usage?
3121

32-
You can check your usage in the [Inference Dashboard](https://ui.endpoints.huggingface.co/endpoints). The dashboard shows both your serverless and dedicated endpoints usage.
22+
To check usage across all providers, check out your [billing page](https://huggingface.co/settings/billing).
23+
24+
To check your HF-Inference usage specifically, check out the [Inference Dashboard](https://ui.endpoints.huggingface.co/endpoints). The dashboard shows both your serverless and dedicated endpoints usage.
3325

34-
## Is there programmatic access to the HF-Inference API?
26+
## Is there programmatic access to Inference Providers?
3527

36-
Yes, the `huggingface_hub` library has a client wrapper documented [here](https://huggingface.co/docs/huggingface_hub/guides/inference).
28+
Yes! We provide client wrappers in both JS and Python:
29+
- [JS (`@huggingface/inference`)](https://huggingface.co/docs/huggingface.js/inference/classes/InferenceClient)
30+
- [Python (`huggingface_hub`)](https://huggingface.co/docs/huggingface_hub/guides/inference)

docs/hub/spaces-sdks-docker-langfuse.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,9 @@ Langfuse is model agnostic and can be used to trace any application. Follow the
7777

7878
Langfuse maintains native integrations with many popular LLM frameworks, including [Langchain](https://langfuse.com/docs/integrations/langchain/tracing), [LlamaIndex](https://langfuse.com/docs/integrations/llama-index/get-started) and [OpenAI](https://langfuse.com/docs/integrations/openai/python/get-started) and offers Python and JS/TS SDKs to instrument your code. Langfuse also offers various API endpoints to ingest data and has been integrated by other open source projects such as [Langflow](https://langfuse.com/docs/integrations/langflow), [Dify](https://langfuse.com/docs/integrations/dify) and [Haystack](https://langfuse.com/docs/integrations/haystack/get-started).
7979

80-
### Example 1: Trace Calls to HF Serverless API
80+
### Example 1: Trace Calls to Inference Providers
8181

82-
As a simple example, here's how to trace LLM calls to the [HF Serverless API](https://huggingface.co/docs/inference-providers/en/index) using the Langfuse Python SDK.
82+
As a simple example, here's how to trace LLM calls to [Inference Providers](https://huggingface.co/docs/inference-providers/en/index) using the Langfuse Python SDK.
8383

8484
Be sure to first configure your `LANGFUSE_HOST`, `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` environment variables, and make sure you've [authenticated with your Hugging Face account](https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication).
8585

@@ -88,7 +88,7 @@ from langfuse.openai import openai
8888
from huggingface_hub import get_token
8989

9090
client = openai.OpenAI(
91-
base_url="https://api-inference.huggingface.co/v1/",
91+
base_url="https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.3-70B-Instruct/v1",
9292
api_key=get_token(),
9393
)
9494

docs/inference-providers/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers/Inference-providers-banner-dark.png"/>
66
</div>
77

8-
Hugging Face Inference Providers simplify and unify how developers access and run machine learning models by offering a unified, flexible interface to multiple serverless inference providers. This new approach extends our previous Serverless Inference API, providing more models, increased performances and better reliability thanks to our inference partners.
8+
Hugging Face’s Inference Providers give developers streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners. This new approach builds on our previous Serverless Inference API, offering more models, improved performance, and greater reliability thanks to world-class providers.
99

1010
To learn more about the launch of Inference Providers, check out our [announcement blog post](https://huggingface.co/blog/inference-providers).
1111

0 commit comments

Comments
 (0)