Skip to content

Commit 7e3b36b

Browse files
jsulzpcuenca
andauthored
Update backwards compatibility section of Xet docs (#1740)
* initial update to backward compatability story * wordsmithing * Grammar catch! Co-authored-by: Pedro Cuenca <[email protected]> * Better flow Co-authored-by: Pedro Cuenca <[email protected]> --------- Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 94f255b commit 7e3b36b

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

docs/hub/storage-backends.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@ Note that our intent is to fast-track PRO users and Enterprise Hub organizations
5050

5151
</Tip>
5252

53-
5453
To access a Xet-aware client, add the `hf_xet` Python package when installing `huggingface_hub` (should be >= 0.30.0):
5554

5655
```bash
@@ -65,9 +64,10 @@ pip install hf-xet
6564

6665
If your Python environment has a `hf_xet`-aware version of `huggingface_hub` then your uploads and downloads will automatically use Xet.
6766

68-
That's it! You now get the benefits of Xet deduplication for both uploads and downloads. Team members using older `huggingface_hub` versions will still be able to upload and download repositories through the backwards compatibility provided by the LFS bridge.
67+
That's it! You now get the benefits of Xet deduplication for both uploads and downloads. Team members using older `huggingface_hub` versions will still be able to upload and download repositories through the [backwards compatibility provided by the LFS bridge](#backward-compatibility-with-lfs).
6968

7069
To see more detailed usage docs, refer to the `huggingface_hub` docs for:
70+
7171
- [Upload](https://huggingface.co/docs/huggingface_hub/guides/upload#faster-uploads)
7272
- [Download](https://huggingface.co/docs/huggingface_hub/guides/download#hfxet)
7373
- [Managing the `hf_xet` cache](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#chunk-based-caching-xet)
@@ -88,7 +88,7 @@ While Xet brings fine-grained deduplication and enhanced performance to Git-base
8888
- **64-bit systems only**: The `hf_xet` client currently requires a 64-bit architecture; 32-bit systems are not supported.
8989
- **Partial JavaScript library support**: The [huggingface.js](https://huggingface.co/docs/huggingface.js/index) library has limited functionality with Xet-backed repositories; additional coverage is planned in future releases.
9090
- **Full web support currently unavailable**: Full support for chunked uploads via the Hub web interface remains under development.
91-
- **No EU region support**: Support for EU [storage regions](https://huggingface.co/docs/hub/storage-regions) with Xet-backed repositories is planned but remains under development.
91+
- **No EU region support**: Support for EU [storage regions](https://huggingface.co/docs/hub/storage-regions) with Xet-backed repositories is planned but remains under development.
9292
- **Git client integration (git-xet)**: Planned but remains under development.
9393

9494
### Deduplication
@@ -103,13 +103,16 @@ For more details, refer to the [From Files to Chunks](https://huggingface.co/blo
103103

104104
### Backward Compatibility with LFS
105105

106-
Xet storage provides a seamless transition for existing Hub repositories. It isn't necessary to know if the Xet backend is involved at all. Xet-backed repositories continue to use the Git LFS pointer file format, with only the addition of the `Xet backed hash` field. Meaning, existing repos and newly created repos will not look any different if you do a `bare clone` of them. Each of the large files (or binary files) will continue to have a pointer file and matches the Git LFS pointer file specification.
106+
Xet storage provides a seamless transition for existing Hub repositories. It isn't necessary to know if the Xet backend is involved at all. Xet-backed repositories continue to use the Git LFS pointer file format; the addition of the `Xet backed hash` is only added to the web interface as a convenience. Practically, this means existing repos and newly created repos will not look any different if you do a `bare clone` of them. Each of the large files (or binary files) will continue to have a pointer file that matches the Git LFS pointer file specification.
107107

108-
This symmetry allows non-Xet-aware clients (e.g., older versions of the `huggingface_hub` that are not Xet-aware) to interact with Xet-backed repositories without concern. In fact, within a repository a mixture of Git LFS and Xet backed files are supported. As noted in the section describing the CAS APIs, the Xet backend indicates whether a file is in Git LFS or Xet storage, allowing downstream services (Git LFS or the Git LFS bridge) to provide the proper URL to S3, regardless of which storage system holds the content.
108+
This symmetry allows non-Xet-aware clients (e.g., older versions of the `huggingface_hub`) to interact with Xet-backed repositories without concern. In fact, within a repository a mixture of Git LFS and Xet backed files are supported. The Xet backend indicates whether a file is in Git LFS or Xet storage, allowing downstream services to request the proper URL(s) from S3, regardless of which storage system holds the content.
109109

110-
While a Xet-aware client will receive file reconstruction information from CAS to download the Xet-backed locally, a legacy client will get a S3 URL from the Git LFS bridge. Meanwhile, while uploading an update to a Xet-backed file, a Xet-aware client will run CDC deduplication and upload through CAS while a non-Xet-aware client will upload through Git LFS and a background process will convert the file revision to a Xet-backed version.
110+
Within the Xet architecture, backward compatibility for downloads is achieved by the Git LFS bridge. While a Xet-aware client will receive file reconstruction information from CAS to download the Xet-backed file, a legacy client will get a single URL from the bridge which does the work of reconstructing the request file and returning the URL to the resource. This allows downloading files through a URL so that you can continue to use the Hub's web interface or `curl`.
111+
112+
Meanwhile, uploads from non‑Xet‑aware clients still follow the standard Git LFS path, even if the file is already Xet-backed. Once the file is uploaded to LFS, a background process automatically migrates the content, turning it into a Xet-backed revision. Coupled with the Git LFS bridge, this lets repository maintainers and the rest of the Hub adopt Xet at their own pace without disruption.
111113

112114
### Security Model
115+
113116
Xet storage provides data deduplication over all chunks stored in Hugging Face. This is done via cryptographic hashing in a privacy sensitive way. The contents of chunks are protected and are associated with repository permissions. i.e. you can only read chunks which are required to reproduce files you have access to, and no more. See [xet-core](https://github.com/huggingface/xet-core) for details.
114117

115118
## Legacy Storage: Git LFS

0 commit comments

Comments
 (0)