From 342b278f4d21d813260f584ceabda850ade1d6f5 Mon Sep 17 00:00:00 2001 From: Jared Sulzdorf Date: Wed, 22 Oct 2025 19:00:39 -0700 Subject: [PATCH 1/7] initial git-xet docs --- docs/hub/repositories-getting-started.md | 3 + docs/hub/xet/using-xet-storage.md | 84 +++++++++++++++++++++--- 2 files changed, 79 insertions(+), 8 deletions(-) diff --git a/docs/hub/repositories-getting-started.md b/docs/hub/repositories-getting-started.md index fab8fd8e9..735b7a67e 100644 --- a/docs/hub/repositories-getting-started.md +++ b/docs/hub/repositories-getting-started.md @@ -8,6 +8,9 @@ This document shows how to handle repositories through the web interface as well If you do not have `git` available as a CLI command yet, you will need to [install Git](https://git-scm.com/downloads) for your platform. You will also need to [install Git LFS](https://git-lfs.github.com/), which will be used to handle large files such as images and model weights. +> [!TIP] +> For improved upload and download speeds when working with large files and Git, install the [Git Xet](xet/using-xet-storage#git) extension. + To be able to push your code to the Hub, you'll need to authenticate somehow. The easiest way to do this is by installing the [`huggingface_hub` CLI](https://huggingface.co/docs/huggingface_hub/index) and running the login command: ```bash diff --git a/docs/hub/xet/using-xet-storage.md b/docs/hub/xet/using-xet-storage.md index e6828691c..cbb5f11fd 100644 --- a/docs/hub/xet/using-xet-storage.md +++ b/docs/hub/xet/using-xet-storage.md @@ -1,4 +1,6 @@ -## Using Xet Storage +# Using Xet Storage + +## Python To access a Xet-aware version of the `huggingface_hub`, simply install the latest version: @@ -24,19 +26,85 @@ To see more detailed usage docs, refer to the `huggingface_hub` docs for: - [Download](https://huggingface.co/docs/huggingface_hub/guides/download#hfxet) - [Managing the `hf_xet` cache](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#chunk-based-caching-xet) -### Recommendations +## Git + +Git users can access the benefits of Xet by downloading and installing the Git Xet extension. Once installed, simply use the [standard workflows for managing Hub repositories with Git](../repositories-getting-started) - no additional changes necessary. + +### Prerequisites + +Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/). + +### Install on macOS or Linux (amd64 or aarch64) + + To install using [Homebrew](https://brew.sh/): + ``` + brew tap huggingface/tap + brew install git-xet + ``` + Or, using an installation script, run the following in your terminal (requires `curl` and `unzip`): + ``` + curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh + ``` + To verify the installation, run: + ``` + git-xet --version + ``` +### Windows (amd64) + + Using an installer: + - Download `git-xet-windows-installer-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-installer-x86_64.zip)) and unzip. + - Run the `msi` installer file and follow the prompts. + + Manual installation: + - Download `git-xet-windows-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-x86_64.zip)) and unzip. + - Place the extracted `git-xet.exe` under a `PATH` directory. + - Run `git-xet install` in a terminal. + +To verity the installation, run: + ``` + git-xet --version + ``` + +### Uninstall on macOS or Linux + +Using Homebrew: + ``` + git-xet uninstall + brew uninstall git-xet + ``` +If you used the installation script (for MacOS or Linux), run the following in your terminal: + ``` + git-xet uninstall + sudo rm $(which git-xet) + ``` +### Uninstall on Windows + +If you used the installer: +- Navigate to Settings -> Apps -> Installed apps +- Find "Git-Xet". +- Select the "Uninstall" option available in the context menu. + +If you manually installed: +- Run `git-xet uninstall` in a terminal. +- Delete the `git-xet.exe` file from the location where it was originally placed. + +## Recommendations + +Xet integrates seamlessly with all of the Hub's workflows. However, there are a few steps you may consider to get the most benefits from Xet storage. + +When uploading or downloading with Python: + +- **Make sure `hf_xet` is installed**: While Xet remains backward compatible with legacy clients optimized for Git LFS, the `hf_xet` integration with `huggingface_hub` delivers optimal chunk-based performance and faster iteration on large files. +- **Utilize `hf_xet` environment variables**: The default installation of `hf_xet` is designed to support the broadest range of hardware. To take advantage of setups with more network bandwidth or processing power read up on `hf_xet`'s [environment variables](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#xet) to optimize downloads and uploads. -Xet integrates seamlessly with the Hub's current Python-based workflows. However, there are a few steps you may consider to get the most benefits from Xet storage: +When uploading or downloading in Git or Python: -- **Use `hf_xet`**: While Xet remains backward compatible with legacy clients optimized for Git LFS, the `hf_xet` integration with `huggingface_hub` delivers optimal chunk-based performance and faster iteration on large files. -- **Utilize `hf_xet` environment variables**: The default installation of `hf_xet` is designed to support the broadest range of hardware. To take advantage of setups with more network bandwidth or processing power read up on `hf_xet`'s [environment variables](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#xet) to further speed up downloads and uploads. - **Leverage frequent, incremental commits**: Xet's chunk-level deduplication means you can safely make incremental updates to models or datasets. Only changed chunks are uploaded, so frequent commits are both fast and storage-efficient. - **Be Specific in .gitattributes**: When defining patterns for Xet or LFS, use precise file extensions (e.g., `*.safetensors`, `*.bin`) to avoid unnecessarily routing smaller files through large-file storage. - **Prioritize community access**: Xet substantially increases the efficiency and scale of large file transfers. Instead of structuring your repository to reduce its total size (or the size of individual files), organize it for collaborators and community users so they may easily navigate and retrieve the content they need. -### Current Limitations +## Current Limitations While Xet brings fine-grained deduplication and enhanced performance to Git-based storage, some features and platform compatibilities are still in development. As a result, keep the following constraints in mind when working with a Xet-enabled repository: -- **64-bit systems only**: The `hf_xet` client currently requires a 64-bit architecture; 32-bit systems are not supported. -- **Git client integration (git-xet)**: Under active development - coming soon, stay tuned! +- **64-bit systems only**: The `hf_xet` client currently requires a 64-bit architecture; 32-bit systems are not supported. \ No newline at end of file From 551e10bb7368001d263f3d2e8bf7bffadd257342 Mon Sep 17 00:00:00 2001 From: Jared Sulzdorf Date: Wed, 22 Oct 2025 19:17:36 -0700 Subject: [PATCH 2/7] cleaning up markdown headers in xet folder --- docs/hub/xet/deduplication.md | 2 +- docs/hub/xet/index.md | 2 +- docs/hub/xet/legacy-git-lfs.md | 2 +- docs/hub/xet/overview.md | 2 +- docs/hub/xet/security.md | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/hub/xet/deduplication.md b/docs/hub/xet/deduplication.md index 0d2855e5e..9c6a9286f 100644 --- a/docs/hub/xet/deduplication.md +++ b/docs/hub/xet/deduplication.md @@ -1,4 +1,4 @@ -## Deduplication +# Deduplication Xet-enabled repositories utilize [content-defined chunking (CDC)](https://huggingface.co/blog/from-files-to-chunks) to deduplicate on the level of bytes (~64KB of data, also referred to as a "chunk"). Each chunk is identified by a rolling hash that determines chunk boundaries based on the actual file contents, making it resilient to insertions or deletions anywhere in the file. When a file is uploaded to a Xet-backed repository using a Xet-aware client, its contents are broken down into these variable-sized chunks. Only new chunks not already present in Xet storage are kept after chunking, everything else is discarded. diff --git a/docs/hub/xet/index.md b/docs/hub/xet/index.md index 3f53caff3..5dc4aac02 100644 --- a/docs/hub/xet/index.md +++ b/docs/hub/xet/index.md @@ -13,7 +13,7 @@ Instead, on the Hub, these large files are tracked using "pointer files" and ide Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported (see [Backwards Compatibility & Legacy](./legacy-git-lfs)), the Hub has adopted Xet, a modern custom storage system built specifically for AI/ML development. It enables chunk-level deduplication, smaller uploads, and faster downloads than Git LFS. -### Open Source Xet Protocol +## Open Source Xet Protocol If you are looking to understand the underlying Xet protocol or are looking to build a new client library to access Xet Storage, check out the [Xet Protocol Specification](https://huggingface.co/docs/xet/index). diff --git a/docs/hub/xet/legacy-git-lfs.md b/docs/hub/xet/legacy-git-lfs.md index 10c2b5e3f..4ae1df021 100644 --- a/docs/hub/xet/legacy-git-lfs.md +++ b/docs/hub/xet/legacy-git-lfs.md @@ -1,4 +1,4 @@ -## Backward Compatibility with LFS +# Backward Compatibility with LFS Uploads from legacy / non‑Xet‑aware clients still follow the standard Git LFS path, even if the repo is already Xet-backed. Once the file is uploaded to LFS, a background process automatically migrates the file to using Xet storage. The Xet architecture provides backwards compatibility for legacy clients downloading files from Xet-backed repos by offering a Git LFS bridge. While a Xet-aware client will receive file reconstruction information from CAS to download the Xet-backed file, a legacy client will get a single URL from the bridge which does the work of reconstructing the request file and returning the URL to the resource. This allows downloading files through a URL so that you can continue to use the Hub's web interface or `curl`. By having LFS file uploads automatically migrate and having older clients continue to download files from Xet-backed repositories, maintainers and the rest of the Hub can update their pipelines at their own pace. diff --git a/docs/hub/xet/overview.md b/docs/hub/xet/overview.md index e00b759f7..191a49560 100644 --- a/docs/hub/xet/overview.md +++ b/docs/hub/xet/overview.md @@ -1,4 +1,4 @@ -## Xet History & Overview +# Xet History & Overview [In August 2024 Hugging Face acquired XetHub](https://huggingface.co/blog/xethub-joins-hf), a [seed-stage startup based in Seattle](https://www.geekwire.com/2023/ex-apple-engineers-raise-7-5m-for-new-seattle-data-storage-startup/), to replace Git LFS on the Hub. diff --git a/docs/hub/xet/security.md b/docs/hub/xet/security.md index 3c623ab41..e239ab34c 100644 --- a/docs/hub/xet/security.md +++ b/docs/hub/xet/security.md @@ -1,4 +1,4 @@ -## Security Model +# Security Model Xet storage provides data deduplication over all chunks stored in Hugging Face. This is done via cryptographic hashing in a privacy sensitive way. The contents of chunks are protected and are associated with repository permissions, i.e. you can only read chunks which are required to reproduce files you have access to, and no more. From cabe97990c82218c89ee6f032e4ce6bd8a4c6976 Mon Sep 17 00:00:00 2001 From: Jared Sulzdorf Date: Thu, 23 Oct 2025 14:31:51 -0700 Subject: [PATCH 3/7] addressing PR feedback --- docs/hub/xet/using-xet-storage.md | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/docs/hub/xet/using-xet-storage.md b/docs/hub/xet/using-xet-storage.md index cbb5f11fd..efb0e575d 100644 --- a/docs/hub/xet/using-xet-storage.md +++ b/docs/hub/xet/using-xet-storage.md @@ -36,15 +36,16 @@ Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/). ### Install on macOS or Linux (amd64 or aarch64) - To install using [Homebrew](https://brew.sh/): + Install using an installation script with the following command in your terminal (requires `curl` and `unzip`): ``` - brew tap huggingface/tap - brew install git-xet + curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh ``` - Or, using an installation script, run the following in your terminal (requires `curl` and `unzip`): + Or, install using [Homebrew](https://brew.sh/), with the following [tap](https://docs.brew.sh/Taps) (direct `brew install` coming soon): ``` - curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/huggingface/xet-core/refs/heads/main/git_xet/install.sh | sh + brew tap huggingface/tap + brew install git-xet ``` + To verify the installation, run: ``` git-xet --version @@ -65,6 +66,23 @@ To verity the installation, run: git-xet --version ``` +### Using Git Xet + +Once installed on your platform, using Git Xet is as simple as following the Hub's standard Git workflows. First, make sure Git LFS is initialized in your local version of the repository: + + ``` + git lfs install + ``` +Then make any additions you might want, commit your changes, and `push` your commit to the Hub: + + ``` + # Create any files you like! Then... + git add . + git commit -m "Uploading new models" # You can choose any descriptive message + git push + ``` +Under the hood, the [Xet protocol](https://huggingface.co/docs/xet/index) is invoked to upload large files directly to Xet storage, increasing upload speeds through the power of [chunk-level deduplication](./deduplication). + ### Uninstall on macOS or Linux Using Homebrew: From 6c04a5adf43ca90b8e9d5b7ca932e24a10c4ae0f Mon Sep 17 00:00:00 2001 From: Jared Sulzdorf Date: Thu, 23 Oct 2025 14:48:50 -0700 Subject: [PATCH 4/7] typo --- docs/hub/xet/using-xet-storage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/xet/using-xet-storage.md b/docs/hub/xet/using-xet-storage.md index efb0e575d..2f4b7b326 100644 --- a/docs/hub/xet/using-xet-storage.md +++ b/docs/hub/xet/using-xet-storage.md @@ -61,7 +61,7 @@ Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/). - Place the extracted `git-xet.exe` under a `PATH` directory. - Run `git-xet install` in a terminal. -To verity the installation, run: +To verify the installation, run: ``` git-xet --version ``` From 0d0641be9d3cc354df0bb67c3b9afa2190ab0e86 Mon Sep 17 00:00:00 2001 From: Jared Sulzdorf Date: Thu, 23 Oct 2025 18:31:51 -0700 Subject: [PATCH 5/7] updating usage notes with pointer to setup instructions --- docs/hub/xet/using-xet-storage.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/docs/hub/xet/using-xet-storage.md b/docs/hub/xet/using-xet-storage.md index 2f4b7b326..3b4f4ac6e 100644 --- a/docs/hub/xet/using-xet-storage.md +++ b/docs/hub/xet/using-xet-storage.md @@ -68,12 +68,9 @@ To verify the installation, run: ### Using Git Xet -Once installed on your platform, using Git Xet is as simple as following the Hub's standard Git workflows. First, make sure Git LFS is initialized in your local version of the repository: +Once installed on your platform, using Git Xet is as simple as following the Hub's standard Git workflows. - ``` - git lfs install - ``` -Then make any additions you might want, commit your changes, and `push` your commit to the Hub: +Make sure all [prerequisites are installed and configured](https://huggingface.co/docs/hub/repositories-getting-started#requirements), follow the [setup instructions for working with repositories on the Hub](https://huggingface.co/docs/hub/repositories-getting-started#set-up), then commit your changes, and `push` to the Hub: ``` # Create any files you like! Then... @@ -125,4 +122,4 @@ When uploading or downloading in Git or Python: While Xet brings fine-grained deduplication and enhanced performance to Git-based storage, some features and platform compatibilities are still in development. As a result, keep the following constraints in mind when working with a Xet-enabled repository: -- **64-bit systems only**: The `hf_xet` client currently requires a 64-bit architecture; 32-bit systems are not supported. \ No newline at end of file +- **64-bit systems only**: Both `hf_xet` and Git Xet currently require a 64-bit architecture; 32-bit systems are not supported. \ No newline at end of file From 4f88bc09e094e08d851c69d7ca2e2fdffe5e8e11 Mon Sep 17 00:00:00 2001 From: Jared Sulzdorf Date: Thu, 23 Oct 2025 18:46:20 -0700 Subject: [PATCH 6/7] missed return character --- docs/hub/xet/using-xet-storage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/xet/using-xet-storage.md b/docs/hub/xet/using-xet-storage.md index 3b4f4ac6e..7c4eac551 100644 --- a/docs/hub/xet/using-xet-storage.md +++ b/docs/hub/xet/using-xet-storage.md @@ -55,7 +55,7 @@ Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/). Using an installer: - Download `git-xet-windows-installer-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-installer-x86_64.zip)) and unzip. - Run the `msi` installer file and follow the prompts. - + Manual installation: - Download `git-xet-windows-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-x86_64.zip)) and unzip. - Place the extracted `git-xet.exe` under a `PATH` directory. From 821d1c0d88fb6efe72c63ff02b46860e41ec4daf Mon Sep 17 00:00:00 2001 From: Jared Sulzdorf Date: Thu, 23 Oct 2025 18:51:55 -0700 Subject: [PATCH 7/7] getting annoyed by whitespace --- docs/hub/xet/using-xet-storage.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/hub/xet/using-xet-storage.md b/docs/hub/xet/using-xet-storage.md index 7c4eac551..8620d4e84 100644 --- a/docs/hub/xet/using-xet-storage.md +++ b/docs/hub/xet/using-xet-storage.md @@ -52,11 +52,11 @@ Install [Git](https://git-scm.com/) and [Git LFS](https://git-lfs.com/). ``` ### Windows (amd64) - Using an installer: +Using an installer: - Download `git-xet-windows-installer-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-installer-x86_64.zip)) and unzip. - Run the `msi` installer file and follow the prompts. - - Manual installation: + +Manual installation: - Download `git-xet-windows-x86_64.zip` ([available here](https://github.com/huggingface/xet-core/releases/download/git-xet-v0.1.0/git-xet-windows-x86_64.zip)) and unzip. - Place the extracted `git-xet.exe` under a `PATH` directory. - Run `git-xet install` in a terminal.