Skip to content

Commit 3d8026a

Browse files
authored
Update storage limits documentation to reflect new file size recommendations (#2037)
* Update storage limits documentation to reflect new file size recommendations * Update dataset upload guide with revised file size limits * Revise large file upload size recommendation Updated the recommended file size for large uploads from 50GB to <200GB.
1 parent 8c0755c commit 3d8026a

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

docs/hub/datasets-upload-guide-llm.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ find . -name "*.jpg" | wc -l
6767
```yaml
6868
# Machine-readable Hub limits
6969
hub_limits:
70-
max_file_size_gb: 50 # absolute hard stop enforced by LFS
71-
recommended_file_size_gb: 20 # best-practice shard size
70+
max_file_size_gb: 200 # absolute hard stop enforced by LFS
71+
recommended_file_size_gb: 50 # best-practice shard size
7272
max_files_per_folder: 10000 # Git performance threshold
7373
max_files_per_repo: 100000 # Repository file count limit
7474
recommended_repo_size_gb: 300 # public-repo soft cap; contact HF if larger
@@ -80,7 +80,7 @@ hub_limits:
8080
- Free: 100GB private datasets
8181
- Pro (for individuals) | Team or Enterprise (for organizations): 1TB+ private storage per seat (see [pricing](https://huggingface.co/pricing))
8282
- Public: 1TB (contact [email protected] for larger)
83-
- Per file: 50GB max, 20GB recommended
83+
- Per file: 200GB max, <50GB recommended
8484
- Per folder: <10k files
8585
8686
See https://huggingface.co/docs/hub/storage-limits#repository-limitations-and-recommendations for current limits for current recommendations for repository sizes and file counts.

docs/hub/storage-limits.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ We gathered a list of tips and recommendations for structuring your repo. If you
4343
| Repo size | - | contact us for large repos (TBs of data) |
4444
| Files per repo | <100k | merge data into fewer files |
4545
| Entries per folder | <10k | use subdirectories in repo |
46-
| File size | <20GB | split data into chunked files |
46+
| File size | <50GB | split data into chunked files |
4747
| Commit size | <100 files* | upload files in multiple commits |
4848
| Commits per repo | - | upload multiple files per commit and/or squash history |
4949

@@ -67,7 +67,7 @@ which has very detailed documentation about the different factors that will impa
6767
For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format.
6868
- The maximum number of files per folder cannot exceed 10k files per folder. A simple solution is to
6969
create a repository structure that uses subdirectories. For example, a repo with 1k folders from `000/` to `999/`, each containing at most 1000 files, is already enough.
70-
- **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks of around 20GB each**.
70+
- **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks <200GB each.**.
7171
There are a few reasons for this:
7272
- Uploading and downloading smaller files is much easier both for you and the other users. Connection issues can always
7373
happen when streaming data and smaller files avoid resuming from the beginning in case of errors.

0 commit comments

Comments
 (0)