Skip to content

Commit 8abce99

Browse files
authored
Bump file chunk recommendation to 20GB (#1376)
* Bump file chunk recommendation to 15GB * bumped again
1 parent 2ed5cb9 commit 8abce99

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/hub/repositories-recommendations.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ We gathered a list of tips and recommendations for structuring your repo. If you
1313
| Repo size | - | contact us for large repos (TBs of data) |
1414
| Files per repo | <100k | merge data into fewer files |
1515
| Entries per folder | <10k | use subdirectories in repo |
16-
| File size | <5GB | split data into chunked files |
16+
| File size | <20GB | split data into chunked files |
1717
| Commit size | <100 files* | upload files in multiple commits |
1818
| Commits per repo | - | upload multiple files per commit and/or squash history |
1919

@@ -37,7 +37,7 @@ which has very detailed documentation about the different factors that will impa
3737
For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format.
3838
- The maximum number of files per folder cannot exceed 10k files per folder. A simple solution is to
3939
create a repository structure that uses subdirectories. For example, a repo with 1k folders from `000/` to `999/`, each containing at most 1000 files, is already enough.
40-
- **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks of around 5GB each**.
40+
- **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks of around 20GB each**.
4141
There are a few reasons for this:
4242
- Uploading and downloading smaller files is much easier both for you and the other users. Connection issues can always
4343
happen when streaming data and smaller files avoid resuming from the beginning in case of errors.

0 commit comments

Comments
 (0)