Skip to content

Conversation

@rajatarya
Copy link
Contributor

This is the PR for Xet Storage documentation. This PR currently has partial content, so not ready for review yet.

@rajatarya rajatarya added the documentation Improvements or additions to documentation label Mar 4, 2025
@rajatarya rajatarya requested review from jsulz and julien-c March 4, 2025 01:09
@rajatarya rajatarya self-assigned this Mar 4, 2025
@rajatarya rajatarya removed the request for review from julien-c March 4, 2025 01:10
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jsulz
Copy link
Contributor

jsulz commented Mar 10, 2025

@rajatarya @ylow just did an update of these docs. Few things to note:

  1. I did a structural update of headings so that the in-page navigation on the left sidebar would be easier. I did not love how that looked on https://moon-ci-docs.huggingface.co/docs/hub/pr_1622/en/repositories-storage
  2. Moved up Deduplication section to be directly under Xet and futzed around with language to make that flow better
  3. I updated the Using Xet Storage and all subsections (Recommendations and Current Limitations) with a first pass on the content there. I'm not particularly attached to any of it, but wanted to take a stab based on talking with Rajat to know what we wanted there.

@rajatarya I'm going to work in the screenshots of .gitattributes and pointer files next, along with the sequence diagrams next.

@ylow would you mind taking a look so far? I think we could use some help on the Security Model section. Also, any other feedback that you want to provide would be great ❤️

@jsulz jsulz requested a review from ylow March 10, 2025 20:52
@jsulz
Copy link
Contributor

jsulz commented Mar 11, 2025

@julien-c we've (@rajatarya @ylow and myself) have been looking at these docs are thinking about a larger rewrite where we have a single entry point for all content related to storage with an emphasis on Xet as the default and LFS as legacy.

The thinking being that from a reader's perspective it's somewhat odd to have a Storage Limits section and then a Xet-centric section about the Hub's new storage system.

This would mean the current Storage Limits documentation would move and merge with these docs.

Storage would reside under Repositories (just like Storage Limits does today), but would have subsections like the following:

  • Storage
    • Usage
    • Recommendations and Limits
    • Plans
    • FAQ

(Xet architecture content would be moved to https://github.com/huggingface/xet-core for open source contributors)

The order and exact content on each page would be determined through this PR and with review from everyone that has an interest in the current Storage Limits documentation.

Does this sound reasonable to you, or should we continue to keep Xet documentation separate as we are in this PR?

@julien-c
Copy link
Member

(@jsulz i'm stuck on something else today but will look tomorrow)

@julien-c
Copy link
Member

julien-c commented Mar 13, 2025

@jsulz sorry about long response time.

In my opinion the most natural progression would keep to keep this doc page separate but put it right before the storage limits one, like so:

image

Storage limits are really more about pricing, and we will update that one as well to add mentions of Xet and not just LFS (in a subsequent PR)

Keeping this Storage Backends page separate will also make reviews etc more tractable.

EDIT: Alternatively, it could also be named just "Xet"...

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a good start 🔥 let's iterate and ship a first version of this soon!

@jsulz
Copy link
Contributor

jsulz commented Mar 13, 2025

@rajatarya just did a pass based on all of the feedback so far.

Only thing remaining is the Security Model section (which we could maybe merge without if we wanted and then iterate on later).

@ylow
Copy link
Contributor

ylow commented Mar 13, 2025

I will write the security model section tomorrow morning.

@julien-c julien-c marked this pull request as ready for review March 14, 2025 12:16
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (after the filename change)

jsulz and others added 3 commits March 14, 2025 07:00
Co-authored-by: Julien Chaumond <[email protected]>
Co-authored-by: Célina <[email protected]>
Co-authored-by: Julien Chaumond <[email protected]>
Co-authored-by: Célina <[email protected]>
Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@jsulz jsulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@rajatarya rajatarya merged commit f623b87 into main Mar 14, 2025
2 checks passed
@rajatarya rajatarya deleted the xet-docs branch March 14, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants