Skip to content

Commit b722616

Browse files
authored
Adding Git Xet docs (#2010)
* initial git-xet docs * cleaning up markdown headers in xet folder * addressing PR feedback * typo * updating usage notes with pointer to setup instructions * missed return character * getting annoyed by whitespace
1 parent fa79da0 commit b722616

File tree

7 files changed

+99
-13
lines changed

7 files changed

+99
-13
lines changed

docs/hub/repositories-getting-started.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ This document shows how to handle repositories through the web interface as well
88

99
If you do not have `git` available as a CLI command yet, you will need to [install Git](https://git-scm.com/downloads) for your platform. You will also need to [install Git LFS](https://git-lfs.github.com/), which will be used to handle large files such as images and model weights.
1010

11+
> [!TIP]
12+
> For improved upload and download speeds when working with large files and Git, install the [Git Xet](xet/using-xet-storage#git) extension.
13+
1114
To be able to push your code to the Hub, you'll need to authenticate somehow. The easiest way to do this is by installing the [`huggingface_hub` CLI](https://huggingface.co/docs/huggingface_hub/index) and running the login command:
1215

1316
```bash

docs/hub/xet/deduplication.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Deduplication
1+
# Deduplication
22

33
Xet-enabled repositories utilize [content-defined chunking (CDC)](https://huggingface.co/blog/from-files-to-chunks) to deduplicate on the level of bytes (~64KB of data, also referred to as a "chunk"). Each chunk is identified by a rolling hash that determines chunk boundaries based on the actual file contents, making it resilient to insertions or deletions anywhere in the file. When a file is uploaded to a Xet-backed repository using a Xet-aware client, its contents are broken down into these variable-sized chunks. Only new chunks not already present in Xet storage are kept after chunking, everything else is discarded.
44

docs/hub/xet/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Instead, on the Hub, these large files are tracked using "pointer files" and ide
1313

1414
Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported (see [Backwards Compatibility & Legacy](./legacy-git-lfs)), the Hub has adopted Xet, a modern custom storage system built specifically for AI/ML development. It enables chunk-level deduplication, smaller uploads, and faster downloads than Git LFS.
1515

16-
### Open Source Xet Protocol
16+
## Open Source Xet Protocol
1717

1818
If you are looking to understand the underlying Xet protocol or are looking to build a new client library to access Xet Storage, check out the [Xet Protocol Specification](https://huggingface.co/docs/xet/index).
1919

docs/hub/xet/legacy-git-lfs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Backward Compatibility with LFS
1+
# Backward Compatibility with LFS
22

33
Uploads from legacy / non‑Xet‑aware clients still follow the standard Git LFS path, even if the repo is already Xet-backed. Once the file is uploaded to LFS, a background process automatically migrates the file to using Xet storage. The Xet architecture provides backwards compatibility for legacy clients downloading files from Xet-backed repos by offering a Git LFS bridge. While a Xet-aware client will receive file reconstruction information from CAS to download the Xet-backed file, a legacy client will get a single URL from the bridge which does the work of reconstructing the request file and returning the URL to the resource. This allows downloading files through a URL so that you can continue to use the Hub's web interface or `curl`. By having LFS file uploads automatically migrate and having older clients continue to download files from Xet-backed repositories, maintainers and the rest of the Hub can update their pipelines at their own pace.
44

docs/hub/xet/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Xet History & Overview
1+
# Xet History & Overview
22

33
[In August 2024 Hugging Face acquired XetHub](https://huggingface.co/blog/xethub-joins-hf), a [seed-stage startup based in Seattle](https://www.geekwire.com/2023/ex-apple-engineers-raise-7-5m-for-new-seattle-data-storage-startup/), to replace Git LFS on the Hub.
44

docs/hub/xet/security.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Security Model
1+
# Security Model
22

33
Xet storage provides data deduplication over all chunks stored in Hugging Face. This is done via cryptographic hashing in a privacy sensitive way. The contents of chunks are protected and are associated with repository permissions, i.e. you can only read chunks which are required to reproduce files you have access to, and no more.
44

docs/hub/xet/using-xet-storage.md

Lines changed: 91 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
## Using Xet Storage
1+
# Using Xet Storage
2+
3+
## Python
24

35
To access a Xet-aware version of the `huggingface_hub`, simply install the latest version:
46

@@ -24,19 +26,100 @@ To see more detailed usage docs, refer to the `huggingface_hub` docs for:
2426
- [Download](https://huggingface.co/docs/huggingface_hub/guides/download#hfxet)
2527
- [Managing the `hf_xet` cache](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#chunk-based-caching-xet)
2628

27-
### Recommendations
29+
## Git
30+
31+
Git users can access the benefits of Xet by downloading and installing the Git Xet extension. Once installed, simply use the [standard workflows for managing Hub repositories with Git](../repositories-getting-started) - no additional changes necessary.
32+
33+
### Prerequisites
34+
35+
Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/).
36+
37+
### Install on macOS or Linux (amd64 or aarch64)
38+
39+
Install using an installation script with the following command in your terminal (requires `curl` and `unzip`):
40+
```
41+
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh
42+
```
43+
Or, install using [Homebrew](https://brew.sh/), with the following [tap](https://docs.brew.sh/Taps) (direct `brew install` coming soon):
44+
```
45+
brew tap huggingface/tap
46+
brew install git-xet
47+
```
48+
49+
To verify the installation, run:
50+
```
51+
git-xet --version
52+
```
53+
### Windows (amd64)
54+
55+
Using an installer:
56+
- Download `git-xet-windows-installer-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-installer-x86_64.zip)) and unzip.
57+
- Run the `msi` installer file and follow the prompts.
58+
59+
Manual installation:
60+
- Download `git-xet-windows-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-x86_64.zip)) and unzip.
61+
- Place the extracted `git-xet.exe` under a `PATH` directory.
62+
- Run `git-xet install` in a terminal.
63+
64+
To verify the installation, run:
65+
```
66+
git-xet --version
67+
```
68+
69+
### Using Git Xet
70+
71+
Once installed on your platform, using Git Xet is as simple as following the Hub's standard Git workflows.
72+
73+
Make sure all [prerequisites are installed and configured](https://huggingface.co/docs/hub/repositories-getting-started#requirements), follow the [setup instructions for working with repositories on the Hub](https://huggingface.co/docs/hub/repositories-getting-started#set-up), then commit your changes, and `push` to the Hub:
74+
75+
```
76+
# Create any files you like! Then...
77+
git add .
78+
git commit -m "Uploading new models" # You can choose any descriptive message
79+
git push
80+
```
81+
Under the hood, the [Xet protocol](https://huggingface.co/docs/xet/index) is invoked to upload large files directly to Xet storage, increasing upload speeds through the power of [chunk-level deduplication](./deduplication).
82+
83+
### Uninstall on macOS or Linux
84+
85+
Using Homebrew:
86+
```
87+
git-xet uninstall
88+
brew uninstall git-xet
89+
```
90+
If you used the installation script (for MacOS or Linux), run the following in your terminal:
91+
```
92+
git-xet uninstall
93+
sudo rm $(which git-xet)
94+
```
95+
### Uninstall on Windows
96+
97+
If you used the installer:
98+
- Navigate to Settings -> Apps -> Installed apps
99+
- Find "Git-Xet".
100+
- Select the "Uninstall" option available in the context menu.
101+
102+
If you manually installed:
103+
- Run `git-xet uninstall` in a terminal.
104+
- Delete the `git-xet.exe` file from the location where it was originally placed.
105+
106+
## Recommendations
107+
108+
Xet integrates seamlessly with all of the Hub's workflows. However, there are a few steps you may consider to get the most benefits from Xet storage.
109+
110+
When uploading or downloading with Python:
111+
112+
- **Make sure `hf_xet` is installed**: While Xet remains backward compatible with legacy clients optimized for Git LFS, the `hf_xet` integration with `huggingface_hub` delivers optimal chunk-based performance and faster iteration on large files.
113+
- **Utilize `hf_xet` environment variables**: The default installation of `hf_xet` is designed to support the broadest range of hardware. To take advantage of setups with more network bandwidth or processing power read up on `hf_xet`'s [environment variables](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#xet) to optimize downloads and uploads.
28114

29-
Xet integrates seamlessly with the Hub's current Python-based workflows. However, there are a few steps you may consider to get the most benefits from Xet storage:
115+
When uploading or downloading in Git or Python:
30116

31-
- **Use `hf_xet`**: While Xet remains backward compatible with legacy clients optimized for Git LFS, the `hf_xet` integration with `huggingface_hub` delivers optimal chunk-based performance and faster iteration on large files.
32-
- **Utilize `hf_xet` environment variables**: The default installation of `hf_xet` is designed to support the broadest range of hardware. To take advantage of setups with more network bandwidth or processing power read up on `hf_xet`'s [environment variables](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#xet) to further speed up downloads and uploads.
33117
- **Leverage frequent, incremental commits**: Xet's chunk-level deduplication means you can safely make incremental updates to models or datasets. Only changed chunks are uploaded, so frequent commits are both fast and storage-efficient.
34118
- **Be Specific in .gitattributes**: When defining patterns for Xet or LFS, use precise file extensions (e.g., `*.safetensors`, `*.bin`) to avoid unnecessarily routing smaller files through large-file storage.
35119
- **Prioritize community access**: Xet substantially increases the efficiency and scale of large file transfers. Instead of structuring your repository to reduce its total size (or the size of individual files), organize it for collaborators and community users so they may easily navigate and retrieve the content they need.
36120

37-
### Current Limitations
121+
## Current Limitations
38122

39123
While Xet brings fine-grained deduplication and enhanced performance to Git-based storage, some features and platform compatibilities are still in development. As a result, keep the following constraints in mind when working with a Xet-enabled repository:
40124

41-
- **64-bit systems only**: The `hf_xet` client currently requires a 64-bit architecture; 32-bit systems are not supported.
42-
- **Git client integration (git-xet)**: Under active development - coming soon, stay tuned!
125+
- **64-bit systems only**: Both `hf_xet` and Git Xet currently require a 64-bit architecture; 32-bit systems are not supported.

0 commit comments

Comments
 (0)