-
Notifications
You must be signed in to change notification settings - Fork 374
storage-limits #1515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
storage-limits #1515
Changes from 4 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
ace5bee
storage-limits
julien-c d490c59
mini tweak
julien-c 7d61029
Additional links + details (#1516)
SBrandeis 726ff8a
up - grant private repo
Vaibhavs10 4891fc9
Update docs/hub/storage-limits.md
julien-c 133b789
Update docs/hub/storage-limits.md
julien-c 4b792b7
Update docs/hub/billing.md
julien-c aedcc00
Update docs/hub/storage-limits.md
julien-c 5d1ffb3
up - clarify best-effort.
Vaibhavs10 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| # Billing | ||
|
|
||
| At Hugging Face, we build a collaboration platform for the ML community (i.e., the Hub) and monetize by providing simple access to compute for AI. | ||
| At Hugging Face, we build a collaboration platform for the ML community (i.e., the Hub) and monetize by providing advanced features and simple access to compute for AI. | ||
|
|
||
| Any feedback or support request related to billing is welcome at [email protected] | ||
|
|
||
|
|
@@ -61,13 +61,14 @@ You can view invoices and receipts for the last 3 months in your billing dashboa | |
|
|
||
| ## Enterprise Hub subscriptions | ||
|
|
||
| We offer advanced security and compliance features for organizations through our Enterprise Hub subscription, including [Single Sign-On](./enterprise-sso.md), [Advanced Access Control](./enterprise-hub-resource-groups.md) for repositories, control over your data location, and more. | ||
| We offer advanced security and compliance features for organizations through our Enterprise Hub subscription, including [Single Sign-On](./enterprise-sso.md), [Advanced Access Control](./enterprise-hub-resource-groups.md) for repositories, control over your data location, higher [storage capacity](./storage-limits.md) for private repositories, and more. | ||
|
|
||
| The Enterprise Hub is billed like a typical subscription. It renews automatically, but you can choose to cancel it at any time in the organization's billing settings. | ||
|
|
||
| You can pay for the Enterprise Hub subscription with a credit card or your AWS account. | ||
|
|
||
| Upon renewal, the number of seats in your Enterprise Hub subscription will be updated to match the number of members of your organization. | ||
| Private repository storage above the [included storage](./storage-limits.md) will be billed along with your subscription renewal. | ||
|
|
||
|
|
||
| <div class="flex justify-center"> | ||
|
|
@@ -80,6 +81,7 @@ Upon renewal, the number of seats in your Enterprise Hub subscription will be up | |
| The PRO subscription unlocks additional features for users, including: | ||
|
|
||
| - Higher free tier for the Serverless Inference API and when consuming ZeroGPU Spaces | ||
| - Higher [storage capacity](./storage-limits.md) for private repositories | ||
| - Ability to create ZeroGPU Spaces and use Dev Mode | ||
| - Ability to write Social Posts and Community Blogs | ||
| - Leverage the Dataset Viewer on private datasets | ||
|
|
@@ -89,5 +91,6 @@ View the full list of benefits at https://huggingface.co/subscribe/pro | |
| Similarly to the Enterprise Hub subscription, PRO subscriptions are billed like a typical subscription. The subscription renews automatically for you. You can choose to cancel the subscription at anytime in your billing settings: https://huggingface.co/settings/billing | ||
|
|
||
| You can only pay for the PRO subscription with a credit card. The subscription is billed separately from any pay-as-you-go compute usage. | ||
| Private repository storage above the [included storage](./storage-limits.md) will be billed along with your subscription renewal. | ||
|
|
||
| Note: PRO benefits are also included in the Enterprise Hub subscription. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,32 @@ | ||
| # Repository limitations and recommendations | ||
| # Storage limits | ||
|
|
||
| There are some limitations to be aware of when dealing with a large amount of data in your repo. Given the time it takes to stream the data, | ||
| getting an upload/push to fail at the end of the process or encountering a degraded experience, be it on hf.co or when working locally, can be very annoying. | ||
| At Hugging Face our intent is to provide the AI community with **free storage space for public models and datasets**. We do bill for storage space for **private repositories**, above a free tier (see table below). | ||
julien-c marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Recommendations | ||
| We [optimize our infrastructure](https://huggingface.co/blog/xethub-joins-hf) continuously to [scale our storage](https://x.com/julien_c/status/1821540661973160339) for the coming years of growth in Machine learning. | ||
|
|
||
| We do have mitigations in place to prevent abuse of free public storage, and in general we ask users and organizations to make sure any uploaded large model or dataset is **as useful to the community as possible** (as represented by numbers of likes or downloads, for instance). | ||
Pierrci marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Storage plans | ||
|
|
||
| | Type of account | Public storage | Private storage | | ||
| | ---------------- | -------------- | ---------------------------- | | ||
| | Free user or org | Unlimited ✅ | 100GB | | ||
julien-c marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| | PRO | Unlimited ✅ | 1TB + pay-as-you-go | | ||
| | Enterprise Hub | Unlimited ✅ | 1TB per seat + pay-as-you-go | | ||
|
|
||
|
|
||
| 💡 Enterprise Hub includes 1TB of private storage per seat in the subscription: for example, if your organization has 40 members, then you have 40TB of included private storage. | ||
|
|
||
| ### Pay-as-you-go price | ||
|
|
||
| Above the included 1TB (or 1TB per seat) of private storage in PRO and Enterprise Hub, private storage is invoiced at **$25/TB/month**. See our [billing doc](./billing) for more details. | ||
SBrandeis marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
julien-c marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Repository limitations and recommendations | ||
|
|
||
| In parallel to storage limits at the account (user or organization) level, there are some limitations to be aware of when dealing with a large amount of data in a specific repo. Given the time it takes to stream the data, | ||
| getting an upload/push to fail at the end of the process or encountering a degraded experience, be it on hf.co or when working locally, can be very annoying. In the following section, we describe our recommendations on how to best structure your large repos. | ||
|
|
||
| ### Recommendations | ||
|
|
||
| We gathered a list of tips and recommendations for structuring your repo. If you are looking for more practical tips, check out [this guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#tips-and-tricks-for-large-uploads) on how to upload large amount of data using the Python library. | ||
|
|
||
|
|
@@ -21,7 +44,7 @@ _* Not relevant when using `git` CLI directly_ | |
|
|
||
| Please read the next section to understand better those limits and how to deal with them. | ||
|
|
||
| ## Explanations | ||
| ### Explanations | ||
|
|
||
| What are we talking about when we say "large uploads", and what are their associated limitations? Large uploads can be | ||
| very diverse, from repositories with a few huge files (e.g. model weights) to repositories with thousands of small files | ||
|
|
@@ -31,9 +54,9 @@ Under the hood, the Hub uses Git to version the data, which has structural impli | |
| If your repo is crossing some of the numbers mentioned in the previous section, **we strongly encourage you to check out [`git-sizer`](https://github.com/github/git-sizer)**, | ||
| which has very detailed documentation about the different factors that will impact your experience. Here is a TL;DR of factors to consider: | ||
|
|
||
| - **Repository size**: The total size of the data you're planning to upload. We generally support repositories up to 300GB. If you would like to upload more than 300 GBs (or even TBs) of data, you will need to ask us to grant more storage. To do that, please send an email with details of your project to [email protected]. | ||
| - **Repository size**: The total size of the data you're planning to upload. We generally support repositories up to 300GB. If you would like to upload more than 300 GBs (or even TBs) of data, you will need to ask us to grant more storage. To do that, please send an email with details of your project to [email protected] (for datasets) or [email protected] (for models). | ||
| - **Number of files**: | ||
| - For optimal experience, we recommend keeping the total number of files under 100k. Try merging the data into fewer files if you have more. | ||
| - For optimal experience, we recommend keeping the total number of files under 100k, and ideally much less. Try merging the data into fewer files if you have more. | ||
| For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format. | ||
| - The maximum number of files per folder cannot exceed 10k files per folder. A simple solution is to | ||
| create a repository structure that uses subdirectories. For example, a repo with 1k folders from `000/` to `999/`, each containing at most 1000 files, is already enough. | ||
|
|
@@ -57,7 +80,7 @@ happen (in rare cases) that even if the timeout is raised client-side, the proce | |
| completed server-side. This can be checked manually by browsing the repo on the Hub. To prevent this timeout, we recommend | ||
| adding around 50-100 files per commit. | ||
|
|
||
| ## Sharing large datasets on the Hub | ||
| ### Sharing large datasets on the Hub | ||
|
|
||
| One key way Hugging Face supports the machine learning ecosystem is by hosting datasets on the Hub, including very large ones. However, if your dataset is bigger than 300GB, you will need to ask us to grant more storage. | ||
|
|
||
|
|
@@ -78,3 +101,13 @@ For hosting large datasets on the Hub, we require the following for your dataset | |
| - Avoid the use of custom loading scripts when using datasets. In our experience, datasets that require custom code to use often end up with limited reuse. | ||
|
|
||
| Please get in touch with us if any of these requirements are difficult for you to meet because of the type of data or domain you are working in. | ||
|
|
||
| ### Sharing large volumes of models on the Hub | ||
|
|
||
| Similarly to datasets, if you host models bigger than 300GB or if you plan on uploading a large number of smaller sized models (for instance, hundreds of automated quants) totalling more than 1TB, you will need to ask us to grant more storage. | ||
|
|
||
| To do that, to ensure we can effectively support the open-source ecosystem, please send an email with details of your project to [email protected]. | ||
|
|
||
| ### Grants for private repositories | ||
|
|
||
| If you need more model/ dataset storage than your allocated private storage for academic/ research purposes, please reach out to us at [email protected] or [email protected] along with a proposal of how you will use the storage grant. | ||
julien-c marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.