|
11 | 11 | <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/huggingface_hub.svg"> |
12 | 12 | </a> |
13 | 13 | </p> |
| 14 | + |
| 15 | +> **Do you have an open source ML library?** |
| 16 | +> We're looking to partner with a small number of other cool open source ML libraries to provide model hosting + versioning. |
| 17 | +> https://twitter.com/julien_c/status/1336374565157679104 https://twitter.com/mnlpariente/status/1336277058062852096 |
| 18 | +> Let us know if interested 😎 |
| 19 | +
|
| 20 | +<br> |
| 21 | + |
| 22 | +### ♻️ Partial list of implementations in third party libraries: |
| 23 | + |
| 24 | +- http://github.com/asteroid-team/asteroid [[initial PR 👀](https://github.com/asteroid-team/asteroid/pull/377)] |
| 25 | +- https://github.com/pyannote/pyannote-audio [[initial PR 👀](https://github.com/pyannote/pyannote-audio/pull/549)] |
| 26 | +- https://github.com/flairNLP/flair [[work-in-progress, initial PR 👀](https://github.com/flairNLP/flair/pull/1974)] |
| 27 | + |
| 28 | +<br> |
| 29 | + |
| 30 | +## Download files from the huggingface.co hub |
| 31 | + |
| 32 | +Integration inside a library is super simple. We expose two functions, `hf_hub_url()` and `cached_download()`. |
| 33 | + |
| 34 | +### `hf_hub_url` |
| 35 | + |
| 36 | +`hf_hub_url()` takes: |
| 37 | +- a model id (like `julien-c/EsperBERTo-small`), |
| 38 | +- a filename (like `pytorch_model.bin`), |
| 39 | +- and an optional git revision id (can be a branch name, a tag, or a commit hash) |
| 40 | + |
| 41 | +and returns the url we'll use to download the actual files: `https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch_model.bin` |
| 42 | + |
| 43 | +If you check out this URL's headers with a `HEAD` http request (which you can do from the command line with `curl -I`) for a few different files, you'll see that: |
| 44 | +- small files are returned directly |
| 45 | +- large files (i.e. the ones stored through [git-lfs](https://git-lfs.github.com/)) are returned via a redirect to a Cloudfront URL. Cloudfront is a Content Delivery Network, or CDN, that ensures that downloads are as fast as possible from anywhere on the globe. |
| 46 | + |
| 47 | +### `cached_download` |
| 48 | + |
| 49 | +`cached_download()` takes the following parameters, downloads the remote file, stores it to disk (in a versioning-aware way) and returns its local file path. |
| 50 | + |
| 51 | +Parameters: |
| 52 | +- a remote `url` |
| 53 | +- your library's name and version (`library_name` and `library_version`), which will be added to the HTTP requests' user-agent so that we can provide some usage stats. |
| 54 | +- a `cache_dir` which you can specify if you want to control where on disk the files are cached. |
| 55 | + |
| 56 | +Check out the source code for all possible params (we'll create a real doc page in the future). |
| 57 | + |
| 58 | +<br> |
| 59 | + |
| 60 | +## Publish models to the huggingface.co hub |
| 61 | + |
| 62 | +Uploading a model to the hub is super simple too: |
| 63 | +- create a model repo directly from the website, at huggingface.co/new (models can be public or private, and are namespaced under either a user or an organization) |
| 64 | +- clone it with git |
| 65 | +- install [lfs](https://git-lfs.github.com/) with `git lfs install` if you haven't done that before |
| 66 | +- add, commit and push your files, from git, as you usually do. |
| 67 | + |
| 68 | +**We are intentionally not wrapping git too much, so that you can go on with the workflow you’re used to and the tools you already know.** |
| 69 | + |
| 70 | +> 👀 To see an example of how we document the model sharing process in `transformers`, check out https://huggingface.co/transformers/model_sharing.html |
| 71 | +
|
| 72 | +### API utilities in `hf_api.py` |
| 73 | + |
| 74 | +You don't need them for the standard publishing workflow, however, if you need a programmatic way of creating a repo, deleting it (`⚠️ caution`), or listing models from the hub, you'll find helpers in `hf_api.py`. |
| 75 | + |
| 76 | +### `huggingface-cli` |
| 77 | + |
| 78 | +Those API utilities are also exposed through a CLI: |
| 79 | + |
| 80 | +```bash |
| 81 | +huggingface-cli login |
| 82 | +huggingface-cli logout |
| 83 | +huggingface-cli whoami |
| 84 | +huggingface-cli repo create |
| 85 | +``` |
| 86 | + |
| 87 | +### Need to upload large (>5GB) files? |
| 88 | + |
| 89 | +To upload large files (>5GB 🔥), you need to install the custom transfer agent for git-lfs, bundled in this package. Spec for LFS custom transfer agent is: |
| 90 | +https://github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md |
| 91 | + |
| 92 | +To install, just run: |
| 93 | + |
| 94 | +```bash |
| 95 | +$ huggingface-cli lfs-enable-largefiles |
| 96 | +``` |
| 97 | + |
| 98 | +This should be executed once for each model repo that contains a model file >5GB. It's documented in the error |
| 99 | +message you get if you just try to git push a 5GB file without enabling it before. |
| 100 | + |
| 101 | +Finally, there's a `huggingface-cli lfs-multipart-upload` command but that one is internal (called by lfs directly) and is not meant to be called by the user. |
| 102 | + |
| 103 | +## Feedback (feature requests, bugs, etc.) is super welcome 💙💚💛💜♥️🧡 |
0 commit comments