Skip to content

Commit 5a3cb26

Browse files
julien-cpcuenca
andauthored
Start promoting Xet more accross the doc (not just in a single doc page) (#1939)
* Start promoting Xet more accross the doc (not just in a single doc page) * here too? * Update docs/hub/repositories.md * Update docs/hub/storage-backends.md Co-authored-by: Pedro Cuenca <[email protected]> * dark --------- Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 85662ba commit 5a3cb26

File tree

5 files changed

+26
-14
lines changed

5 files changed

+26
-14
lines changed

docs/hub/_toctree.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@
4343
title: Getting Started with Repositories
4444
- local: repositories-settings
4545
title: Repository Settings
46+
- local: storage-limits
47+
title: Storage Limits
48+
- local: storage-backends
49+
title: Storage Backends
4650
- local: repositories-pull-requests-discussions
4751
title: Pull Requests & Discussions
4852
- local: notifications
@@ -60,10 +64,6 @@
6064
title: "How-to: Create automatic metadata quality reports"
6165
- local: notebooks
6266
title: Notebooks
63-
- local: storage-limits
64-
title: Storage Limits
65-
- local: storage-backends
66-
title: Storage Backends
6767
- local: repositories-next-steps
6868
title: Next Steps
6969
- local: repositories-licenses

docs/hub/enterprise-hub-gating-group-collections.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Gating Group Collections allow organizations to grant (or reject) access to all
99
To enable Gating Group in a collection:
1010

1111
- the collection owner must be an organization
12-
- the organization must be subscribed to the Enterprise Hub
12+
- the organization must be subscribed to a Team or Enterprise plan
1313
- all models and datasets in the collection must be owned by the same organization as the collection
1414
- each model or dataset in the collection may only belong to one Gating Group Collection (but they can still be included in non-gating i.e. _regular_ collections).
1515

docs/hub/index.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,12 @@ The Hugging Face Hub is a platform with over 1.7M models, 400k datasets, and 600
2525
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories">Introduction</a>
2626
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-getting-started">Getting Started</a>
2727
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-settings">Repository Settings</a>
28+
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./storage-limits">Storage Limits</a>
29+
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./storage-backends">Storage Backends</a>
2830
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-pull-requests-discussions">Pull requests and Discussions</a>
2931
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./notifications">Notifications</a>
3032
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./collections">Collections</a>
3133
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./webhooks">Webhooks</a>
32-
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./storage-backends">Storage Backends</a>
3334
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-next-steps">Next Steps</a>
3435
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-licenses">Licenses</a>
3536
</div>
@@ -119,7 +120,10 @@ On it, you'll be able to upload and discover...
119120
- Datasets: _featuring a wide variety of data for different domains and modalities_
120121
- Spaces: _interactive apps for demonstrating ML models directly in your browser_
121122

122-
The Hub offers **versioning, commit history, diffs, branches, and over a dozen library integrations**! You can learn more about the features that all repositories share in the [**Repositories documentation**](./repositories).
123+
The Hub offers **versioning, commit history, diffs, branches, and over a dozen library integrations**!
124+
All repositories build on [Xet](https://huggingface.co/join/xet), a new technology to efficiently store Large Files inside Git, intelligently splitting files into unique chunks and accelerating uploads and downloads.
125+
126+
You can learn more about the features that all repositories share in the [**Repositories documentation**](./repositories).
123127

124128
## Models
125129

docs/hub/repositories.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,26 @@
22

33
Models, Spaces, and Datasets are hosted on the Hugging Face Hub as [Git repositories](https://git-scm.com/about), which means that version control and collaboration are core elements of the Hub. In a nutshell, a repository (also known as a **repo**) is a place where code and assets can be stored to back up your work, share it with the community, and work in a team.
44

5-
In these pages, you will go over the basics of getting started with Git and interacting with repositories on the Hub. Once you get the hang of it, you can explore the best practices and next steps that we've compiled for effective repository usage.
5+
Unlike other collaboration platforms, our Git repositories are optimized for Machine Learning and AI files – large binary files, usually in specific file formats like Parquet and Safetensors, and up to [Terabyte-scale sizes](https://huggingface.co/blog/from-files-to-chunks)!
6+
To achieve this, we built [Xet](./storage-backends), a modern custom storage system built specifically for AI/ML development, enabling chunk-level deduplication, smaller uploads, and faster downloads.
7+
8+
<div class="flex justify-center">
9+
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/xet-speed.gif"/>
10+
<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/xet-speed-dark.gif"/>
11+
</div>
12+
13+
In these pages, you will go over the basics of getting started with Git and Xet and interacting with repositories on the Hub. Once you get the hang of it, you can explore the best practices and next steps that we've compiled for effective repository usage.
614

715
## Contents
816

917
- [Getting Started with Repositories](./repositories-getting-started)
1018
- [Settings](./repositories-settings)
19+
- [Storage Limits](./storage-limits)
20+
- [Storage Backends](./storage-backends)
1121
- [Pull Requests & Discussions](./repositories-pull-requests-discussions)
1222
- [Pull Requests advanced usage](./repositories-pull-requests-discussions#pull-requests-advanced-usage)
13-
- [Webhooks](./webhooks)
14-
- [Notifications](./notifications)
1523
- [Collections](./collections)
16-
- [Storage Backends](./storage-backends)
17-
- [Storage Limits](./storage-limits)
24+
- [Notifications](./notifications)
25+
- [Webhooks](./webhooks)
1826
- [Next Steps](./repositories-next-steps)
1927
- [Licenses](./repositories-licenses)

docs/hub/storage-backends.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@ Repositories on the Hugging Face Hub are different from those on software develo
77

88
While the Hub leverages modern version control with the support of Git, these differences make [Model](https://huggingface.co/docs/hub/models) and [Dataset](https://huggingface.co/docs/hub/datasets) repositories quite different from those that contain only source code.
99

10-
Storing these files directly in a Git repository is impractical. Not only are the typical storage systems behind Git repositories unsuited for such files, but when you clone a repository, Git retrieves the entire history, including all file revisions. This can be prohibitively large for massive binaries, forcing you to download gigabytes of historic data you may never need.
10+
Storing these files directly in a pure Git repository is impractical. Not only are the typical storage systems behind Git repositories unsuited for such files, but when you clone a repository, Git retrieves the entire history, including all file revisions. This can be prohibitively large for massive binaries, forcing you to download gigabytes of historic data you may never need.
1111

1212
Instead, on the Hub, these large files are tracked using "pointer files" and identified through a `.gitattributes` file (both discussed in more detail below), which remain in the Git repository while the actual data is stored in remote storage (like [Amazon S3](https://aws.amazon.com/s3/)). As a result, the repository stays small and typical Git workflows remain efficient.
1313

14-
Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported and widely used (see the [Legacy section below](#legacy-storage-git-lfs)), the Hub is introducing a modern custom storage system built specifically for AI/ML development, enabling chunk-level deduplication, smaller uploads, and faster downloads than Git LFS.
14+
Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported (see the [Legacy section below](#legacy-storage-git-lfs)), the Hub has adopted Xet, a modern custom storage system built specifically for AI/ML development. It enables chunk-level deduplication, smaller uploads, and faster downloads than Git LFS.
1515

1616
## Xet
1717

0 commit comments

Comments
 (0)