Skip to content

Commit 551e10b

Browse files
committed
cleaning up markdown headers in xet folder
1 parent 342b278 commit 551e10b

File tree

5 files changed

+5
-5
lines changed

5 files changed

+5
-5
lines changed

docs/hub/xet/deduplication.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Deduplication
1+
# Deduplication
22

33
Xet-enabled repositories utilize [content-defined chunking (CDC)](https://huggingface.co/blog/from-files-to-chunks) to deduplicate on the level of bytes (~64KB of data, also referred to as a "chunk"). Each chunk is identified by a rolling hash that determines chunk boundaries based on the actual file contents, making it resilient to insertions or deletions anywhere in the file. When a file is uploaded to a Xet-backed repository using a Xet-aware client, its contents are broken down into these variable-sized chunks. Only new chunks not already present in Xet storage are kept after chunking, everything else is discarded.
44

docs/hub/xet/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Instead, on the Hub, these large files are tracked using "pointer files" and ide
1313

1414
Historically, Hub repositories have relied on [Git LFS](https://git-lfs.com/) for this mechanism. While Git LFS remains supported (see [Backwards Compatibility & Legacy](./legacy-git-lfs)), the Hub has adopted Xet, a modern custom storage system built specifically for AI/ML development. It enables chunk-level deduplication, smaller uploads, and faster downloads than Git LFS.
1515

16-
### Open Source Xet Protocol
16+
## Open Source Xet Protocol
1717

1818
If you are looking to understand the underlying Xet protocol or are looking to build a new client library to access Xet Storage, check out the [Xet Protocol Specification](https://huggingface.co/docs/xet/index).
1919

docs/hub/xet/legacy-git-lfs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Backward Compatibility with LFS
1+
# Backward Compatibility with LFS
22

33
Uploads from legacy / non‑Xet‑aware clients still follow the standard Git LFS path, even if the repo is already Xet-backed. Once the file is uploaded to LFS, a background process automatically migrates the file to using Xet storage. The Xet architecture provides backwards compatibility for legacy clients downloading files from Xet-backed repos by offering a Git LFS bridge. While a Xet-aware client will receive file reconstruction information from CAS to download the Xet-backed file, a legacy client will get a single URL from the bridge which does the work of reconstructing the request file and returning the URL to the resource. This allows downloading files through a URL so that you can continue to use the Hub's web interface or `curl`. By having LFS file uploads automatically migrate and having older clients continue to download files from Xet-backed repositories, maintainers and the rest of the Hub can update their pipelines at their own pace.
44

docs/hub/xet/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Xet History & Overview
1+
# Xet History & Overview
22

33
[In August 2024 Hugging Face acquired XetHub](https://huggingface.co/blog/xethub-joins-hf), a [seed-stage startup based in Seattle](https://www.geekwire.com/2023/ex-apple-engineers-raise-7-5m-for-new-seattle-data-storage-startup/), to replace Git LFS on the Hub.
44

docs/hub/xet/security.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Security Model
1+
# Security Model
22

33
Xet storage provides data deduplication over all chunks stored in Hugging Face. This is done via cryptographic hashing in a privacy sensitive way. The contents of chunks are protected and are associated with repository permissions, i.e. you can only read chunks which are required to reproduce files you have access to, and no more.
44

0 commit comments

Comments
 (0)