Skip to content

Commit dabcc83

Browse files
authored
Update how to guides (#840)
* πŸ“ clarify how to download and upload files * πŸ– finish section on create/manage a repo * πŸ– add sign-in section to repo manage guide * ✨ update inference api section * πŸ– apply omar review * πŸ– format links to functions * πŸ– more review * πŸ– fix toctree
1 parent cab4152 commit dabcc83

File tree

6 files changed

+279
-258
lines changed

6 files changed

+279
-258
lines changed

β€Ždocs/source/_toctree.ymlβ€Ž

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,28 @@
55
title: Quick start
66
title: "Get started"
77
- sections:
8+
- local: how-to-manage
9+
title: Create and manage repositories
810
- local: how-to-downstream
9-
title: How to download files from the hub
11+
title: Download files from the Hub
1012
- local: how-to-upstream
11-
title: How to upload files to the hub
13+
title: Upload files to the Hub
1214
- local: searching-the-hub
1315
title: Searching the Hub
1416
- local: how-to-inference
15-
title: How to programmatically access the Inference API
17+
title: Access the Inference API
1618
title: "Guides"
1719
- sections:
18-
- local: package_reference/repository
19-
title: Managing local and online repositories
20-
- local: package_reference/hf_api
21-
title: Hugging Face Hub API
22-
- local: package_reference/file_download
23-
title: Downloading files
24-
- local: package_reference/mixins
25-
title: Mixins & serialization methods
26-
- local: package_reference/logging
27-
title: Logging
28-
title: "Reference"
29-
20+
- local: package_reference/repository
21+
title: Managing local and online repositories
22+
- local: package_reference/hf_api
23+
title: Hugging Face Hub API
24+
- local: package_reference/file_download
25+
title: Downloading files
26+
- local: package_reference/mixins
27+
title: Mixins & serialization methods
28+
- local: package_reference/inference_api
29+
title: Inference API
30+
- local: package_reference/logging
31+
title: Logging
32+
title: "Reference"
Lines changed: 59 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,62 @@
1-
---
2-
title: How to download files from the Hub
3-
---
1+
# Download files from the Hub
42

5-
# How to integrate downstream utilities in your library
3+
The `huggingface_hub` library provides functions to download files from the repositories
4+
stored on the Hub. You can use these functions independently or integrate them into your
5+
own library, making it more convenient for your users to interact with the Hub. This
6+
guide will show you how to:
67

7-
Utilities that allow your library to download files from the Hub are referred to as *downstream* utilities. This guide introduces additional downstream utilities you can integrate with your library, or use separately on their own. You will learn how to:
8-
9-
* Retrieve a URL to download.
10-
* Download a file and cache it on your disk.
8+
* Specify a file to download from the Hub.
9+
* Download and cache a file on your disk.
1110
* Download all the files in a repository.
1211

13-
## hf_hub_url
14-
15-
Use [`hf_hub_url`] to retrieve the URL of a specific file to download by providing a `filename`.
12+
## Choose a file to download
1613

17-
![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)
14+
Use the `filename` parameter in the [`hf_hub_url`] function to retrieve the URL of a
15+
specific file to download:
1816

1917
```python
2018
>>> from huggingface_hub import hf_hub_url
2119
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
2220
'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'
2321
```
2422

25-
Specify a particular file version by providing the file revision. The file revision can be a branch, a tag, or a commit hash.
23+
![/docs/assets/hub/repo.png](/docs/assets/hub/repo.png)
2624

27-
When using the commit hash, it must be the full-length hash instead of a 7-character commit hash:
25+
Specify a particular file version by providing the file revision, which can be the
26+
branch name, a tag, or a commit hash. When using the commit hash, it must be the
27+
full-length hash instead of a 7-character commit hash:
2828

2929
```python
30-
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a")
30+
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp",
31+
... filename="config.json",
32+
... revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
33+
... )
3134
'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'
3235
```
3336

34-
[`hf_hub_url`] can also use the branch name to specify a file revision:
37+
To specify a file revision with the branch name:
3538

3639
```python
37-
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
40+
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
3841
```
3942

40-
Specify a file revision with a tag identifier. For example, if you want `v1.0` of the `config.json` file:
43+
To specify a file revision with a tag identifier. For example, if you want `v1.0` of the
44+
`config.json` file:
4145

4246
```python
43-
hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
47+
>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
4448
```
4549

46-
## cached_download
50+
## Download and store a file
4751

48-
[`cached_download`] is useful for downloading and caching a file on your local disk. Once stored in your cache, you don't have to redownload the file the next time you use it. [`cached_download`] is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, [`cached_download`] will automatically download and store it for you.
52+
[`cached_download`] is used to download and cache a file on your local disk. Once a file
53+
is stored in your cache, you don't have to redownload it the next time you use it.
54+
[`cached_download`] is a hands-free solution for staying up to date with new file
55+
versions. When a downloaded file is updated in the remote repository,
56+
[`cached_download`] will automatically download and store it.
4957

50-
Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specified URL to [`cached_download`] to download the file:
58+
Begin by retrieving the file URL with [`hf_hub_url`], and then pass the specified URL to
59+
[`cached_download`] to download the file:
5160

5261
```python
5362
>>> from huggingface_hub import hf_hub_url, cached_download
@@ -56,16 +65,20 @@ Begin by retrieving your file URL with [`hf_hub_url`], and then pass the specifi
5665
'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'
5766
```
5867

59-
[`hf_hub_url`] and [`cached_download`] work hand in hand to download a file. This is precisely how [`hf_hub_download`] from the tutorial works! [`hf_hub_download`] is simply a wrapper that calls both [`hf_hub_url`] and [`cached_download`].
68+
[`hf_hub_url`] and [`cached_download`] work hand-in-hand to download a file. This is
69+
such a standard workflow that [`hf_hub_download`] is a wrapper that calls both of these
70+
functions.
6071

6172
```python
6273
>>> from huggingface_hub import hf_hub_download
6374
>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
6475
```
6576

66-
## snapshot_download
77+
## Download an entire repository
6778

68-
[`snapshot_download`] downloads an entire repository at a given revision. Like [`cached_download`], all downloaded files are cached on your local disk. However, even if only a single file is updated, the entire repository will be redownloaded.
79+
[`snapshot_download`] downloads an entire repository at a given revision. Like
80+
[`cached_download`], all downloaded files are cached on your local disk. However, even
81+
if only a single file is updated, the entire repository will be redownloaded.
6982

7083
Download a whole repository as shown in the following:
7184

@@ -75,20 +88,27 @@ Download a whole repository as shown in the following:
7588
'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade'
7689
```
7790

78-
[`snapshot_download`] downloads the latest revision by default. If you want a specific repository revision, use the `revision` parameter as shown with [`hf_hub_url`].
91+
[`snapshot_download`] downloads the latest revision by default. If you want a specific
92+
repository revision, use the `revision` parameter:
7993

8094
```python
8195
>>> from huggingface_hub import snapshot_download
8296
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main")
8397
```
8498

85-
In general, it is usually better to manually download files with [`hf_hub_download`] (if you already know the file name) to avoid re-downloading an entire repository. [`snapshot_download`] is helpful when your library's downloading utility is a helper, and unaware of which files need to be downloaded.
99+
In general, it is usually better to download files with [`hf_hub_download`] - if you
100+
already know the file name - to avoid redownloading an entire repository.
101+
[`snapshot_download`] is helpful when you are unaware of which files to download.
102+
103+
However, you don't always want to download the contents of an entire repository with
104+
[`snapshot_download`]. Even if you don't know the file name, you can download specific
105+
files if you know the file type with `allow_regex` and `ignore_regex`. Use the
106+
`allow_regex` and `ignore_regex` arguments to specify which files to download. These
107+
parameters accept either a single regex or a list of regexes.
86108

87-
However, you don't want to always download the contents of an entire repository with [`snapshot_download`]. Even if you don't know the file name and only know the file type, you can download specific files with `allow_regex` and `ignore_regex`.
88-
Use the `allow_regex` and `ignore_regex` arguments to specify
89-
which files to download.
90-
`allow_regex` and `ignore_regex` accept either a single regex or a list of regexes.
91-
The regex matching is based on [`fnmatch`](https://docs.python.org/3/library/fnmatch.html) which means it provides support for Unix shell-style wildcards.
109+
The regex matching is based on
110+
[`fnmatch`](https://docs.python.org/3/library/fnmatch.html), which provides support for
111+
Unix shell-style wildcards.
92112

93113
For example, you can use `allow_regex` to only download JSON configuration files:
94114

@@ -97,17 +117,17 @@ For example, you can use `allow_regex` to only download JSON configuration files
97117
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", allow_regex="*.json")
98118
```
99119

100-
On the other hand, `ignore_regex` can be used to exclude certain files from being downloaded. The following example ignores the `.msgpack` and `.h5` file extensions:
101-
or `.h5` extensions, you could make use of `ignore_regex`:
120+
On the other hand, `ignore_regex` can exclude certain files from being downloaded. The
121+
following example ignores the `.msgpack` and `.h5` file extensions:
102122

103123
```python
104124
>>> from huggingface_hub import snapshot_download
105125
>>> snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_regex=["*.msgpack", "*.h5"])
106126
```
107127

108-
Passing a regex can be especially useful when repositories contain files that
109-
are never expected to be downloaded by [`snapshot_download`].
128+
Passing a regex can be especially useful when repositories contain files that are never
129+
expected to be downloaded by [`snapshot_download`].
110130

111-
Note that passing `allow_regex` or `ignore_regex` does **not** prevent
112-
[`snapshot_download`] from re-downloading the entire model repository if an ignored
113-
file is changed.
131+
Note that passing `allow_regex` or `ignore_regex` does **not** prevent
132+
[`snapshot_download`] from redownloading the entire model repository if an ignored file
133+
is changed.

β€Ždocs/source/how-to-inference.mdxβ€Ž

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
1-
---
2-
title: How to programmatically access the Inference API
3-
---
1+
# Access the Inference API
42

5-
# How to programmatically access the Inference API
3+
The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming language, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API with the `huggingface_hub` library.
64

7-
The Inference API provides fast inference for your hosted models. The Inference API can be accessed via usual HTTP requests with your favorite programming languages, but the `huggingface_hub` library has a client wrapper to access the Inference API programmatically. This guide will show you how to make calls to the Inference API with the `huggingface_hub` library.
5+
<Tip>
86

9-
**If you want to make the HTTP calls directly, please refer to [Accelerated Inference API Documentation](https://api-inference.huggingface.co/docs/python/html/index.html) or to the sample snippets visible on every supported model page.**
7+
If you want to make the HTTP calls directly, please refer to [Accelerated Inference API Documentation](https://api-inference.huggingface.co/docs/python/html/index.html) or to the sample snippets visible on every supported model page.
8+
9+
</Tip>
1010

1111
![Snippet of code to make calls to the Inference API](/docs/assets/hub/inference_api_snippet.png)
1212

13-
Begin by creating an instance of the `InferenceApi` with a specific model repository ID. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.
13+
Begin by creating an instance of the [`InferenceApi`] with the model repository ID of the model you want to use. You can find your `API_TOKEN` under Settings from your Hugging Face account. The `API_TOKEN` will allow you to send requests to the Inference API.
1414

1515
```python
1616
>>> from huggingface_hub.inference_api import InferenceApi
1717
>>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)
1818
```
1919

20-
The pipeline is determined from the metadata in the model card and configuration files (see [here](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined) for more details). For example, when using the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model, the Inference API can automatically infer that this model should be used for a `fill-mask` task.
20+
The metadata in the model card and configuration files (see [here](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined) for more details) determines the pipeline type. For example, when using the [bert-base-uncased](https://huggingface.co/bert-base-uncased) model, the Inference API can automatically infer that this model should be used for a `fill-mask` task.
2121

2222
```python
2323
>>> from huggingface_hub.inference_api import InferenceApi
@@ -48,5 +48,8 @@ Some tasks may require additional parameters (see [here](https://api-inference.h
4848
Some models may support multiple tasks. The `sentence-transformers` models can complete both `sentence-similarity` and `feature-extraction` tasks. Specify which task you want to perform with the `task` parameter:
4949

5050
```python
51-
>>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", task="feature-extraction", token=API_TOKEN)
51+
>>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1",
52+
... task="feature-extraction",
53+
... token=API_TOKEN,
54+
... )
5255
```

0 commit comments

Comments
Β (0)