Skip to content

Commit ace5bee

Browse files
committed
storage-limits
1 parent 1239d8e commit ace5bee

File tree

8 files changed

+46
-13
lines changed

8 files changed

+46
-13
lines changed

docs/hub/_redirects.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ searching-the-hub: /docs/huggingface_hub/searching-the-hub
1717
api-webhook: webhooks
1818
adapter-transformers: adapters
1919
security-two-fa: security-2fa
20+
repositories-recommendations: storage-limits

docs/hub/_toctree.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@
2525
title: "How-to: Create automatic metadata quality reports"
2626
- local: notebooks
2727
title: Notebooks
28-
- local: repositories-recommendations
29-
title: Repository size recommendations
28+
- local: storage-limits
29+
title: Storage Limits
3030
- local: repositories-next-steps
3131
title: Next Steps
3232
- local: repositories-licenses

docs/hub/billing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Billing
22

3-
At Hugging Face, we build a collaboration platform for the ML community (i.e., the Hub) and monetize by providing simple access to compute for AI.
3+
At Hugging Face, we build a collaboration platform for the ML community (i.e., the Hub) and monetize by providing advanced features and simple access to compute for AI.
44

55
Any feedback or support request related to billing is welcome at [email protected]
66

docs/hub/datasets-adding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,4 +111,4 @@ The Hugging Face Hub supports large scale datasets, usually uploaded in Parquet
111111

112112
You can upload large scale datasets at high speed using the `huggingface_hub` library.
113113

114-
See [how to upload a folder by chunks](/docs/huggingface_hub/guides/upload#upload-a-folder-by-chunks), the [tips and tricks for large uploads](/docs/huggingface_hub/guides/upload#tips-and-tricks-for-large-uploads) and the [repository limitations and recommendations](./repositories-recommendations).
114+
See [how to upload a folder by chunks](/docs/huggingface_hub/guides/upload#upload-a-folder-by-chunks), the [tips and tricks for large uploads](/docs/huggingface_hub/guides/upload#tips-and-tricks-for-large-uploads) and the [repository storage limits and recommendations](./storage-limits).

docs/hub/enterprise-hub.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,5 @@ In this section we will document the following Enterprise Hub features:
2323
- [Tokens Management](./enterprise-hub-tokens-management)
2424
- [Analytics](./enterprise-hub-analytics)
2525
- [Network Security](./enterprise-hub-network-security)
26+
27+
Finally, Enterprise Hub includes 1TB of [private repository storage](./storage-limits) per seat in the subscription, i.e. if your organization has 40 members, then you have 40TB included storage for your private models and datasets.

docs/hub/other.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
- [Managing Organizations](./organizations-managing)
77
- [Organization Cards](./organizations-cards)
88
- [Access control in organizations](./organizations-security)
9+
- [Enterprise Hub](./enterprise-hub)
910
- [Moderation](./moderation)
1011
- [Billing](./billing)
1112
- [Digital Object Identifier (DOI)](./doi)

docs/hub/repositories.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,6 @@ In these pages, you will go over the basics of getting started with Git and inte
1313
- [Webhooks](./webhooks)
1414
- [Notifications](./notifications)
1515
- [Collections](./collections)
16-
- [Repository size recommendations](./repositories-recommendations)
16+
- [Repository storage limits](./storage-limits)
1717
- [Next Steps](./repositories-next-steps)
1818
- [Licenses](./repositories-licenses)

docs/hub/repositories-recommendations.md renamed to docs/hub/storage-limits.md

Lines changed: 37 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,32 @@
1-
# Repository limitations and recommendations
1+
# Storage limits
22

3-
There are some limitations to be aware of when dealing with a large amount of data in your repo. Given the time it takes to stream the data,
4-
getting an upload/push to fail at the end of the process or encountering a degraded experience, be it on hf.co or when working locally, can be very annoying.
3+
At Hugging Face our intent is to provide the AI community with **free storage space for public models and datasets**. We do bill for storage space for **private repositories**, above a free tier (see table below).
54

6-
## Recommendations
5+
We [optimize our infrastructure](https://huggingface.co/blog/xethub-joins-hf) continuously to [scale our storage](https://x.com/julien_c/status/1821540661973160339) for the coming years of growth in Machine learning.
6+
7+
We do have mitigations in place to prevent abuse of free public storage, and in general we ask users and organizations to make sure any uploaded large model or dataset is **as useful to the community as possible** (as represented by numbers of likes or downloads, for instance).
8+
9+
## Storage plans
10+
11+
| Type of account | Public storage | Private storage |
12+
| ---------------- | -------------- | ---------------------------- |
13+
| Free user or org | Unlimited ✅ | 100GB |
14+
| PRO | Unlimited ✅ | 1TB + pay-as-you-go |
15+
| Enterprise Hub | Unlimited ✅ | 1TB per seat + pay-as-you-go |
16+
17+
18+
💡 Enterprise Hub includes 1TB of private storage per seat in the subscription: for example, if your organization has 40 members, then you have 40TB included private storage.
19+
20+
### Pay-as-you-go price
21+
22+
Above the included 1TB (or 1TB per seat) of private storage in PRO and Enterprise Hub, private storage is invoiced at **$25/TB/month**. See our [billing doc](./billing) for more details.
23+
24+
## Repository limitations and recommendations
25+
26+
In parallel to storage limits at the account (user or organization) level, there are some limitations to be aware of when dealing with a large amount of data in a specific repo. Given the time it takes to stream the data,
27+
getting an upload/push to fail at the end of the process or encountering a degraded experience, be it on hf.co or when working locally, can be very annoying. In the following section, we describe our recommendations on how to best structure your large repos.
28+
29+
### Recommendations
730

831
We gathered a list of tips and recommendations for structuring your repo. If you are looking for more practical tips, check out [this guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#tips-and-tricks-for-large-uploads) on how to upload large amount of data using the Python library.
932

@@ -21,7 +44,7 @@ _* Not relevant when using `git` CLI directly_
2144

2245
Please read the next section to understand better those limits and how to deal with them.
2346

24-
## Explanations
47+
### Explanations
2548

2649
What are we talking about when we say "large uploads", and what are their associated limitations? Large uploads can be
2750
very diverse, from repositories with a few huge files (e.g. model weights) to repositories with thousands of small files
@@ -31,9 +54,9 @@ Under the hood, the Hub uses Git to version the data, which has structural impli
3154
If your repo is crossing some of the numbers mentioned in the previous section, **we strongly encourage you to check out [`git-sizer`](https://github.com/github/git-sizer)**,
3255
which has very detailed documentation about the different factors that will impact your experience. Here is a TL;DR of factors to consider:
3356

34-
- **Repository size**: The total size of the data you're planning to upload. We generally support repositories up to 300GB. If you would like to upload more than 300 GBs (or even TBs) of data, you will need to ask us to grant more storage. To do that, please send an email with details of your project to [email protected].
57+
- **Repository size**: The total size of the data you're planning to upload. We generally support repositories up to 300GB. If you would like to upload more than 300 GBs (or even TBs) of data, you will need to ask us to grant more storage. To do that, please send an email with details of your project to [email protected] (for datasets) or [email protected] (for models).
3558
- **Number of files**:
36-
- For optimal experience, we recommend keeping the total number of files under 100k. Try merging the data into fewer files if you have more.
59+
- For optimal experience, we recommend keeping the total number of files under 100k, and ideally much less. Try merging the data into fewer files if you have more.
3760
For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format.
3861
- The maximum number of files per folder cannot exceed 10k files per folder. A simple solution is to
3962
create a repository structure that uses subdirectories. For example, a repo with 1k folders from `000/` to `999/`, each containing at most 1000 files, is already enough.
@@ -57,7 +80,7 @@ happen (in rare cases) that even if the timeout is raised client-side, the proce
5780
completed server-side. This can be checked manually by browsing the repo on the Hub. To prevent this timeout, we recommend
5881
adding around 50-100 files per commit.
5982

60-
## Sharing large datasets on the Hub
83+
### Sharing large datasets on the Hub
6184

6285
One key way Hugging Face supports the machine learning ecosystem is by hosting datasets on the Hub, including very large ones. However, if your dataset is bigger than 300GB, you will need to ask us to grant more storage.
6386

@@ -78,3 +101,9 @@ For hosting large datasets on the Hub, we require the following for your dataset
78101
- Avoid the use of custom loading scripts when using datasets. In our experience, datasets that require custom code to use often end up with limited reuse.
79102

80103
Please get in touch with us if any of these requirements are difficult for you to meet because of the type of data or domain you are working in.
104+
105+
### Sharing large volumes of models on the Hub
106+
107+
Similarly to datasets, if you host models bigger than 300GB or if you plan on uploading a large number of smaller sized models (for instance, hundreds of automated quants) totalling more than 1TB, you will need to ask us to grant more storage.
108+
109+
To do that, to ensure we can effectively support the open-source ecosystem, please send an email with details of your project to [email protected].

0 commit comments

Comments
 (0)