|
| 1 | +# How to integrate downstream utilities in your library |
| 2 | + |
| 3 | +Utilities that allow your library to download files from the Hub are referred to as *downstream* utilities. This guide introduces additional downstream utilities you can integrate with your library, or use separately on their own. You will learn how to: |
| 4 | + |
| 5 | +* Retrieve a URL to download. |
| 6 | +* Download a file and cache it on your disk. |
| 7 | +* Download all the files in a repository. |
| 8 | + |
| 9 | +## `hf_hub_url` |
| 10 | + |
| 11 | +Use `hf_hub_url` to retrieve the URL of a specific file to download by providing a `filename`. |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +```python |
| 16 | +>>> from huggingface_hub import hf_hub_url |
| 17 | +>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json") |
| 18 | +'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json' |
| 19 | +``` |
| 20 | + |
| 21 | +Specify a particular file version by providing the file revision. The file revision can be a branch, a tag, or a commit hash. |
| 22 | + |
| 23 | +When using the commit hash, it must be the full-length hash instead of a 7-character commit hash: |
| 24 | + |
| 25 | +```python |
| 26 | +>>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a") |
| 27 | +'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json' |
| 28 | +``` |
| 29 | + |
| 30 | +`hf_hub_url` can also use the branch name to specify a file revision: |
| 31 | + |
| 32 | +```python |
| 33 | +hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main") |
| 34 | +``` |
| 35 | + |
| 36 | +Specify a file revision with a tag identifier. For example, if you want `v1.0` of the `config.json` file: |
| 37 | + |
| 38 | +```python |
| 39 | +hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0") |
| 40 | +``` |
| 41 | + |
| 42 | +## `cached_download` |
| 43 | + |
| 44 | +`cached_download` is useful for downloading and caching a file on your local disk. Once stored in your cache, you don't have to redownload the file the next time you use it. `cached_download` is a hands-free solution for staying up to date with new file versions. When a downloaded file is updated in the remote repository, `cached_download` will automatically download and store it for you. |
| 45 | + |
| 46 | +Begin by retrieving your file URL with `hf_hub_url`, and then pass the specified URL to `cached_download` to download the file: |
| 47 | + |
| 48 | +```python |
| 49 | +>>> from huggingface_hub import hf_hub_url, cached_download |
| 50 | +>>> config_file_url = hf_hub_url("lysandre/arxiv-nlp", filename="config.json") |
| 51 | +>>> cached_download(config_file_url) |
| 52 | +'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1' |
| 53 | +``` |
| 54 | + |
| 55 | +`hf_hub_url` and `cached_download` work hand in hand to download a file. This is precisely how `hf_hub_download` from the tutorial works! `hf_hub_download` is simply a wrapper that calls both `hf_hub_url` and `cached_download`. |
| 56 | + |
| 57 | +```python |
| 58 | +>>> from huggingface_hub import hf_hub_download |
| 59 | +>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json") |
| 60 | +``` |
| 61 | + |
| 62 | +## `snapshot_download` |
| 63 | + |
| 64 | +`snapshot_download` downloads an entire repository at a given revision. Like `cached_download`, all downloaded files are cached on your local disk. However, even if only a single file is updated, the entire repository will be redownloaded. |
| 65 | + |
| 66 | +Download a whole repository as shown in the following: |
| 67 | + |
| 68 | +```python |
| 69 | +>>> from huggingface_hub import snapshot_download |
| 70 | +>>> snapshot_download(repo_id="lysandre/arxiv-nlp") |
| 71 | +'/home/lysandre/.cache/huggingface/hub/lysandre__arxiv-nlp.894a9adde21d9a3e3843e6d5aeaaf01875c7fade' |
| 72 | +``` |
| 73 | + |
| 74 | +`snapshot_download` downloads the latest revision by default. If you want a specific repository revision, use the `revision` parameter as shown with `hf_hub_url`. |
| 75 | + |
| 76 | +```python |
| 77 | +>>> from huggingface_hub import snapshot_download |
| 78 | +>>> snapshot_download(repo_id="lysandre/arxiv-nlp", revision="main") |
| 79 | +``` |
| 80 | + |
| 81 | +In general, it is usually better to manually download files with `hf_hub_download` (if you already know which files you need) to avoid redownloading an entire repository. `snapshot_download` is helpful when your library's downloading utility is a helper, and unaware of which files need to be downloaded. |
0 commit comments