Updating Xet usage docs (#1767)

jsulz · web-flow · commit 8856a6272cd0 · 2025-06-05T13:07:32.000-07:00
* updating usage notes and adding recommendations for xet envvars

* words
diff --git a/docs/hub/storage-backends.md b/docs/hub/storage-backends.md
@@ -41,30 +41,33 @@ Unlike Git LFS, which deduplicates at the file level, Xet-enabled repositories d
 
 ### Using Xet Storage
 
-To start using Xet Storage, you need a Xet-enabled repository and a Xet-aware version of the [huggingface_hub](https://huggingface.co/docs/huggingface_hub) Python library.
+To start using Xet Storage, you need a Xet-enabled repository and a Xet-aware version of the [huggingface_hub](https://huggingface.co/docs/huggingface_hub) Python library. As of May 23rd, 2025, Xet-enabled repositories are the default [for all new users and organizations on the Hub](https://huggingface.co/changelog/xet-default-for-new-users). 
 
 <Tip>
 
-To make Xet the default for all your repositories, [join the waitlist](https://huggingface.co/join/xet)! You can apply for yourself or your entire organization (requires [admin permissions](https://huggingface.co/docs/hub/organizations-security)). Once approved, all current repositories will be automatically migrated to Xet and future repositories will be Xet-enabled by default.
-Note that our intent is to fast-track PRO users and Enterprise Hub organizations.
+For user and organization profiles created before May 23rd, 2025, you can make Xet the default for all your repositories by [signing up here](https://huggingface.co/join/xet). You can apply for yourself or your entire organization (requires [admin permissions](https://huggingface.co/docs/hub/organizations-security)). Once approved, all existing repositories will be automatically migrated to Xet and future repositories will be Xet-enabled by default.
+
+PRO users and Enterprise Hub organizations will be fast-tracked for access.
 
 </Tip>
 
-To access a Xet-aware client, add the `hf_xet` Python package when installing `huggingface_hub` (should be >= 0.30.0):
+To access a Xet-aware version of the `huggingface_hub`, simply install the latest version:
 
 ```bash
-pip install -U huggingface_hub[hf_xet]
+pip install -U huggingface_hub
 ```
 
-If you use the `transformers` or `datasets` libraries, it's already using `huggingface_hub` (`huggingface_hub` should be >= 0.30.0) so you can simply install `hf_xet` in the same env:
+As of `huggingface_hub` 0.32.0, this will also install `hf_xet`. The `hf_xet` package integrates `huggingface_hub` with [`xet-core`](https://github.com/huggingface/xet-core), the Rust client for the Xet backend. 
+
+If you use the `transformers` or `datasets` libraries, it's already using `huggingface_hub`. So long as the version of `huggingface_hub` >= 0.32.0, no further action needs to be taken.
+
+Where versions of `huggingface_hub` >= 0.30.0 and < 0.32.0 are installed, `hf_xet` must be installed explicitly:
 
 ```bash
-pip install hf-xet
+pip install -U hf-xet
 ```
 
-If your Python environment has a `hf_xet`-aware version of `huggingface_hub` then your uploads and downloads will automatically use Xet.
-
-That's it! You now get the benefits of Xet deduplication for both uploads and downloads. Team members using older `huggingface_hub` versions will still be able to upload and download repositories through the [backwards compatibility provided by the LFS bridge](#backward-compatibility-with-lfs).
+And that's it! You now get the benefits of Xet deduplication for both uploads and downloads. Team members using a version of `huggingface_hub` < 0.30.0 will still be able to upload and download repositories through the [backwards compatibility provided by the LFS bridge](#backward-compatibility-with-lfs).
 
 To see more detailed usage docs, refer to the `huggingface_hub` docs for:
 
@@ -77,6 +80,7 @@ To see more detailed usage docs, refer to the `huggingface_hub` docs for:
 Xet integrates seamlessly with the Hub's current Python-based workflows. However, there are a few steps you may consider to get the most benefits from Xet storage:
 
 - **Use `hf_xet`**: While Xet remains backward compatible with legacy clients optimized for Git LFS, the `hf_xet` integration with `huggingface_hub` delivers optimal chunk-based performance and faster iteration on large files.
+- **Utilize `hf_xet` environment variables**: The default installation of `hf_xet` is designed to support the broadest range of hardware. To take advantage of setups with more network bandwidth or processing power read up on `hf_xet`'s [environment variables](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#xet) to further speed up downloads and uploads. 
 - **Leverage frequent, incremental commits**: Xet's chunk-level deduplication means you can safely make incremental updates to models or datasets. Only changed chunks are uploaded, so frequent commits are both fast and storage-efficient.
 - **Be Specific in .gitattributes**: When defining patterns for Xet or LFS, use precise file extensions (e.g., `*.safetensors`, `*.bin`) to avoid unnecessarily routing smaller files through large-file storage.
 - **Prioritize community access**: Xet substantially increases the efficiency and scale of large file transfers. Instead of structuring your repository to reduce its total size (or the size of individual files), organize it for collaborators and community users so they may easily navigate and retrieve the content they need.
@@ -95,7 +99,7 @@ While Xet brings fine-grained deduplication and enhanced performance to Git-base
 
 Xet-enabled repositories utilize [content-defined chunking (CDC)](https://huggingface.co/blog/from-files-to-chunks) to deduplicate on the level of bytes (~64KB of data, also referred to as a "chunk"). Each chunk is identified by a rolling hash that determines chunk boundaries based on the actual file contents, making it resilient to insertions or deletions anywhere in the file. When a file is uploaded to a Xet-backed repository using a Xet-aware client, its contents are broken down into these variable-sized chunks. Only new chunks not already present in Xet storage are kept after chunking, everything else is discarded.
 
-To avoid the overhead of communicating and managing at the level of chunks, new chunks are grouped together in [64MB blocks](https://huggingface.co/blog/from-chunks-to-blocks#scaling-deduplication-with-aggregation) and uploaded. Each block is stored once in a [content-addressed store (CAS)](#content-addressed-store-cas), keyed by its hash.
+To avoid the overhead of communicating and managing at the level of chunks, new chunks are grouped together in [64MB blocks](https://huggingface.co/blog/from-chunks-to-blocks#scaling-deduplication-with-aggregation) and uploaded. Each block is stored once in a content-addressed store (CAS), keyed by its hash.
 
 The Hub's [current recommendation](https://huggingface.co/docs/hub/storage-limits#recommendations) is to limit files to 20GB. At a 64KB chunk size, a 20GB file has 312,500 chunks, many of which go unchanged from version to version. Git LFS is designed to notice only that a file has changed and store the entirety of that revision. By deduplicating at the level of chunks, the Xet backend enables storing only the modified content in a file (which might only be a few KB or MB) and securely deduplicates shared blocks across repositories. For the large binary files found in Model and Dataset repositories, this provides significant improvements to file transfer times.