Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/hub/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@
title: Getting Started with Repositories
- local: repositories-settings
title: Repository Settings
- local: storage-limits
title: Storage Limits
- local: storage-backends
title: Storage Backends
- local: repositories-pull-requests-discussions
title: Pull Requests & Discussions
- local: notifications
Expand All @@ -60,10 +64,6 @@
title: "How-to: Create automatic metadata quality reports"
- local: notebooks
title: Notebooks
- local: storage-limits
title: Storage Limits
- local: storage-backends
title: Storage Backends
- local: repositories-next-steps
title: Next Steps
- local: repositories-licenses
Expand Down
2 changes: 1 addition & 1 deletion docs/hub/enterprise-hub-gating-group-collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Gating Group Collections allow organizations to grant (or reject) access to all
To enable Gating Group in a collection:

- the collection owner must be an organization
- the organization must be subscribed to the Enterprise Hub
- the organization must be subscribed to a Team or Enterprise plan
- all models and datasets in the collection must be owned by the same organization as the collection
- each model or dataset in the collection may only belong to one Gating Group Collection (but they can still be included in non-gating i.e. _regular_ collections).

Expand Down
8 changes: 6 additions & 2 deletions docs/hub/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,12 @@ The Hugging Face Hub is a platform with over 1.7M models, 400k datasets, and 600
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories">Introduction</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-getting-started">Getting Started</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-settings">Repository Settings</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./storage-limits">Storage Limits</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./storage-backends">Storage Backends</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-pull-requests-discussions">Pull requests and Discussions</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./notifications">Notifications</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./collections">Collections</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./webhooks">Webhooks</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./storage-backends">Storage Backends</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-next-steps">Next Steps</a>
<a class="transform no-underline! transition-colors hover:translate-x-px hover:text-gray-700" href="./repositories-licenses">Licenses</a>
</div>
Expand Down Expand Up @@ -119,7 +120,10 @@ On it, you'll be able to upload and discover...
- Datasets: _featuring a wide variety of data for different domains and modalities_
- Spaces: _interactive apps for demonstrating ML models directly in your browser_

The Hub offers **versioning, commit history, diffs, branches, and over a dozen library integrations**! You can learn more about the features that all repositories share in the [**Repositories documentation**](./repositories).
The Hub offers **versioning, commit history, diffs, branches, and over a dozen library integrations**!
All repositories build on [Xet](https://huggingface.co/join/xet), a new technology to efficiently store Large Files inside Git, intelligently splitting files into unique chunks and accelerating uploads and downloads.

You can learn more about the features that all repositories share in the [**Repositories documentation**](./repositories).

## Models

Expand Down
17 changes: 12 additions & 5 deletions docs/hub/repositories.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,25 @@

Models, Spaces, and Datasets are hosted on the Hugging Face Hub as [Git repositories](https://git-scm.com/about), which means that version control and collaboration are core elements of the Hub. In a nutshell, a repository (also known as a **repo**) is a place where code and assets can be stored to back up your work, share it with the community, and work in a team.

In these pages, you will go over the basics of getting started with Git and interacting with repositories on the Hub. Once you get the hang of it, you can explore the best practices and next steps that we've compiled for effective repository usage.
Unlike other collaboration platforms, our Git repositories are optimized for Machine Learning and AI files – large binary files, usually in specific file formats like Parquet and Safetensors, and up to Terabyte-scale sizes!
To achieve this, we built [Xet](./storage-backends), a modern custom storage system built specifically for AI/ML development, enabling chunk-level deduplication, smaller uploads, and faster downloads.

<div class="flex justify-center">
<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/xet-speed.gif"/>
</div>

In these pages, you will go over the basics of getting started with Git and Xet and interacting with repositories on the Hub. Once you get the hang of it, you can explore the best practices and next steps that we've compiled for effective repository usage.

## Contents

- [Getting Started with Repositories](./repositories-getting-started)
- [Settings](./repositories-settings)
- [Storage Limits](./storage-limits)
- [Storage Backends](./storage-backends)
- [Pull Requests & Discussions](./repositories-pull-requests-discussions)
- [Pull Requests advanced usage](./repositories-pull-requests-discussions#pull-requests-advanced-usage)
- [Webhooks](./webhooks)
- [Notifications](./notifications)
- [Collections](./collections)
- [Storage Backends](./storage-backends)
- [Storage Limits](./storage-limits)
- [Notifications](./notifications)
- [Webhooks](./webhooks)
- [Next Steps](./repositories-next-steps)
- [Licenses](./repositories-licenses)
4 changes: 2 additions & 2 deletions docs/hub/storage-backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ Repositories on the Hugging Face Hub are different from those on software develo

While the Hub leverages modern version control with the support of Git, these differences make [Model](https://huggingface.co/docs/hub/models) and [Dataset](https://huggingface.co/docs/hub/datasets) repositories quite different from those that contain only source code.

Storing these files directly in a Git repository is impractical. Not only are the typical storage systems behind Git repositories unsuited for such files, but when you clone a repository, Git retrieves the entire history, including all file revisions. This can be prohibitively large for massive binaries, forcing you to download gigabytes of historic data you may never need.
Storing these files directly in a pure Git repository is impractical. Not only are the typical storage systems behind Git repositories unsuited for such files, but when you clone a repository, Git retrieves the entire history, including all file revisions. This can be prohibitively large for massive binaries, forcing you to download gigabytes of historic data you may never need.

Instead, on the Hub, these large files are tracked using "pointer files" and identified through a `.gitattributes` file (both discussed in more detail below), which remain in the Git repository while the actual data is stored in remote storage (like [Amazon S3](https://aws.amazon.com/s3/)). As a result, the repository stays small and typical Git workflows remain efficient.

Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported and widely used (see the [Legacy section below](#legacy-storage-git-lfs)), the Hub is introducing a modern custom storage system built specifically for AI/ML development, enabling chunk-level deduplication, smaller uploads, and faster downloads than Git LFS.
Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported (see the [Legacy section below](#legacy-storage-git-lfs)), the Hub is introducing a modern custom storage system built specifically for AI/ML development, enabling chunk-level deduplication, smaller uploads, and faster downloads than Git LFS.

## Xet

Expand Down