-
Notifications
You must be signed in to change notification settings - Fork 843
Xet Docs for huggingface_hub #2899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jsulz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. Added some comments, but none are blockers IMO.
Pierrci
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
|
Nvm, it's here now: https://moon-ci-docs.huggingface.co/docs/huggingface_hub/pr_2899/en/index |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Pierric Cistac <[email protected]>
Co-authored-by: Pierric Cistac <[email protected]>
Co-authored-by: Pierric Cistac <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @rajatarya for taking care of this! I left a couple of comments
IMO it would be better to put the xet documentation into dedicated sections rather than subsections in the upload and download docs. But this is just a personal preference, I guess we can always revamp the documentation once hf_xet becomes a required dependency.
EDIT: let's keep this structure for now :)
Wauplin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Co-authored-by: Célina <[email protected]>
Co-authored-by: Célina <[email protected]>
Co-authored-by: Lucain <[email protected]>
I like that the docs today are user task oriented - how do I upload files, how do download files. Xet falls into the background as it becomes required - it is just what is used to implement uploading & downloading files. I would be very open to revamping docs once hf_xet is required and Xet is enabled by default for new repos. Then LFS truly falls into deprecated/legacy support and lots of the docs should reflect that. |
|
Thank you everyone for your feedback on these docs - I believe they are ready for a final review. |
Wauplin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reorganizing things @rajatarya ! Approving the PR, provided the comments below are addressed (only minor ones)
Co-authored-by: Lucain <[email protected]>
* add hf_xet as an optional dependency * update installed packages at runtime * split xet testing in CI * fix workflow * fix windows * Xet download workflow (#2875) * first draft * remove comment * hf_xet instead of xet * update docstring * fix * update docstring * simplify typing * quality * add logging * fix tests * add unit tests for xet utilities * first draft of download testing * more tests * address some comments * fix tests * check if hf_xet is available or not * remove unnecessary dest dir creation * keep comment Co-authored-by: Lucain <[email protected]> * post-review improvements * Update tests/test_xet_download.py --------- Co-authored-by: Lucain <[email protected]> * Add ability to enable/disable xet storage on a repo (#2893) * add ability to enable/disable xet storage * add test * better way to check if all settings are none * don't strip authorization header with downloading with xet * update comment * Xet upload workflow (#2887) * add upload workflow * fixes and tests * use helper for prgress bar * use tmp repo in tests * some fixes * update tests * mock HF_XET_CACHE * fix tests * fix utils tests * debug CI * fix * check if xet is enabled * debug CI * debug CI again * revert * debugging * don't rerun xet tests * revert * remove pytest timeout * don't run tests in parallel * add comment * revert and rename variable * don't skip tests * remove warning * fix tests * Apply suggestions from code review * fixes * fix syntax error with python 3.8 * catch Invalid credentials * fix * record Space API VCR test * use raise instead of raise e Co-authored-by: Lucain <[email protected]> * disable xet storage for the other tests * reverting * isolate xet tests for windows * fix windows * install hf_xet for xet testing --------- Co-authored-by: Lucain <[email protected]> Co-authored-by: Lucain Pouget <[email protected]> * Xet Docs for huggingface_hub (#2899) * Xet docs * PR feedback, added waitlist links * Added HF_XET_CACHE env variable docs * PR feedback * Doc feedback * Added two lines about flow of upload/download * Updating links to Hub doc location * Reformat headings, less levels in TOC --------- Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Pierric Cistac <[email protected]> Co-authored-by: Célina <[email protected]> Co-authored-by: Lucain <[email protected]> * Adding Token Refresh Xet Test (#2932) Directly calling hfxet.download_files() with token_refresher callback to ensure that hfxet calls the token refresher as expected. --------- Co-authored-by: Celina Hanouti <[email protected]> * Using a two stage download path for xet files. (#2920) * Adding request header on resolve endpoint indicating that we can receive xet info. * Adding test to ensure that the header is always sent on metdata request * Using a two stage download path for xet files. * Using the GET call's JSON * Using xet_backed for the whether the file is a xet file or not to disambiguate from whether xet is enabled * Adding and fixing tests * Testing fix WIP * Rewriting xet download to use the refresh route to resolve the xetmetadata * Parameter type check * Docs * Removing extraneous constant * Fixing file_download tests * Readding the refresh route into the file metadata * Refactoring the XetMetadata object into two objects to reflect the Hub changes. * Fixing broken tests * Code cleanup from self review * Fixing types * Quality & Lint * Handling when hub returns the entire refresh route in its headers. * Update tests/test_xet_utils.py * Fixing merge conflicts in the new tests * Extracting the refresh route from the link header (#2953) * Getting the refresh route from the links header * refactor xet_file_data func signature & tests Co-authored-by: Lucain <[email protected]> Co-authored-by: Rajat Arya <[email protected]> * Update src/huggingface_hub/constants.py Co-authored-by: Célina <[email protected]> --------- Co-authored-by: Celina Hanouti <[email protected]> Co-authored-by: Rajat Arya <[email protected]> Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Pierric Cistac <[email protected]> Co-authored-by: Brian Ronan <[email protected]> Co-authored-by: Rajat Arya <[email protected]>
Docs describing enabling and using Xet Storage with huggingface_hub.
These docs reference docs in HF Hub, which will be anchored here: https://huggingface.co/docs/hub/repositories (as a new Storage section).
The PR for those docs are here: huggingface/hub-docs#1622 (draft PR for now).