Skip to content

Commit a30c9c3

Browse files
authored
Add tutorials/guides for integrating libraries with the Hub (#358)
1 parent 0724762 commit a30c9c3

File tree

5 files changed

+604
-0
lines changed

5 files changed

+604
-0
lines changed

docs/assets/hub/hub_filters.png

528 KB
Loading

docs/hub/how-to-downstream.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# How to integrate downstream utilities in your library
2+
3+
Utilities that allow your library to download files from the Hub are referred to as *downstream* utilities. This guide introduces additional downstream utilities you can integrate with your library, or use separately on their own. You will learn how to:
4+
5+
* Retrieve a URL to download.
6+
* Download a file and cache it on your disk.
7+
* Download all the files in a repository.
8+
9+
## `hf_hub_url`
10+
11+
Use `hf_hub_url` to retrieve the URL of a specific file to download by providing a `filename`.
12+
13+
![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)
14+
15+
```python
16+
>>> from huggingface_hub import hf_hub_url
17+
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
18+
'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'
19+
```
20+
21+
Specify a particular file version by providing the file revision. The file revision can be a branch, a tag, or a commit hash.
22+
23+
When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:
24+
25+
```python
26+
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a")
27+
'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'
28+
```
29+
30+
`hf_hub_url` can also use the branch name to specify a file revision:
31+
32+
```python
33+
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
34+
```
35+
36+
Specify a file revision with a tag identifier. For example, if you want `v1.0` of the `config.json` file:
37+
38+
```python
39+
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
40+
```
41+
42+
## `cached_download`
43+
44+
`cached_download` is useful for downloading and caching a file on your local disk. Once stored in your cache, you don't have to redownload the file the next time you use it. `cached_download` is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, `cached_download` will automatically download and store it for you.
45+
46+
Begin by retrieving your file URL with `hf_hub_url`, and then pass the specified URL to `cached_download` to download the file:
47+
48+
```python
49+
>>> from huggingface_hub import hf_hub_url, cached_download
50+
>>> config_file_url = hf_hub_url("lysandre/arxiv-nlp", filename="config.json")
51+
>>> cached_download(config_file_url)
52+
'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'
53+
```
54+
55+
`hf_hub_url` and `cached_download` work hand in hand to download a file. This is precisely how `hf_hub_download` from the tutorial works! `hf_hub_download` is simply a wrapper that calls both `hf_hub_url` and `cached_download`.
56+
57+
```python
58+
>>> from huggingface_hub import hf_hub_download
59+
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
60+
```
61+
62+
## `snapshot_download`
63+
64+
`snapshot_download` downloads an entire repository at a given revision. Like `cached_download`, all downloaded files are cached on your local disk. However, even if only a single file is updated, the entire repository will be redownloaded.
65+
66+
Download a whole repository as shown in the following:
67+
68+
```python
69+
>>> from huggingface_hub import snapshot_download
70+
>>> snapshot_download(repo_id="lysandre/arxiv-nlp")
71+
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'
72+
```
73+
74+
`snapshot_download` downloads the latest revision by default. If you want a specific repository revision, use the `revision` parameter as shown with `hf_hub_url`.
75+
76+
```python
77+
>>> from huggingface_hub import snapshot_download
78+
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")
79+
```
80+
81+
In general, it is usually better to manually download files with `hf_hub_download` (if you already know which files you need) to avoid redownloading an entire repository. `snapshot_download` is helpful when your library's downloading utility is a helper, and unaware of which files need to be downloaded.

docs/hub/how-to-inference.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# How to integrate the Inference API in your library
2+
3+
The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming languages, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API from your library. For more detailed information, refer to the [Inference API documentation](https://api-inference.huggingface.co/docs/python/html/index.html).
4+
5+
Begin by creating an instance of the `InferenceApi` with a specific model repository ID. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.
6+
7+
```python
8+
>>> from huggingface_hub.inference_api import InferenceApi
9+
>>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)
10+
```
11+
12+
The pipeline is determined from the metadata in the model card and configuration files (see [here](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined) for more details). For example, when using the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model, the Inference API can automatically infer that this model should be used for a `fill-mask` task.
13+
14+
```python
15+
>>> from huggingface_hub.inference_api import InferenceApi
16+
>>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)
17+
>>> inference(inputs="The goal of life is [MASK].")
18+
>>> [{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]
19+
```
20+
21+
Each task requires a different type of input. A `question-answering` task expects a dictionary with the `question` and `context` keys as the input:
22+
23+
```python
24+
>>> inference = InferenceApi(repo_id="deepset/roberta-base-squad2", token=API_TOKEN)
25+
>>> inputs = {"question":"Where is Hugging Face headquarters?", "context":"Hugging Face is based in Brooklyn, New York. There is also an office in Paris, France."}
26+
>>> inference(inputs)
27+
>>> {'score': 0.94622403383255, 'start': 25, 'end': 43, 'answer': 'Brooklyn, New York'}
28+
```
29+
30+
Some tasks may require additional parameters (see [here](https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html) for a detailed list of all parameters for each task). As an example, for `zero-shot-classification` tasks, the model needs candidate labels that can be supplied to `params`:
31+
32+
```python
33+
>>> inference = InferenceApi(repo_id="typeform/distilbert-base-uncased-mnli", token=API_TOKEN)
34+
>>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
35+
>>> params = {"candidate_labels":["refund", "legal", "faq"]}
36+
>>> inference(inputs, params)
37+
>>> {'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}
38+
```
39+
40+
Some models may support multiple tasks. The `sentence-transformers` models can complete both `sentence-similarity` and `feature-extraction` tasks. Specify which task you want to perform with the `task` parameter:
41+
42+
```python
43+
>>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", task="feature-extraction", token=API_TOKEN)
44+
```

0 commit comments

Comments
 (0)