You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/hub/xet/auth.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Authentication and Authorization
2
2
3
-
To invoke any API's mentioned in this specification a client MUST first acquire a token (and the url) to authenticate against the server which serves these API's.
3
+
To invoke any API's mentioned in this specification a client MUST first acquire a token (and the URL) to authenticate against the server which serves these API's.
4
4
5
5
The Xet protocol server uses bearer authentication via a token generated by the Hugging Face Hub (<https://huggingface.co>).
-`repo_type`: Type of repository - `model`, `dataset`, or `space`
22
22
-`repo_id`: Repository identifier in format `namespace/repo-name`
@@ -110,7 +110,7 @@ Xet tokens can have either a `read` or a `write` scope.
110
110
`write` scope supersedes `read` scope and all `read` scope API's can be invoked when using a `write` scope token.
111
111
The type of token issued is determined on the `token_type` URI path component when requesting the token from the Hugging Face Hub (see above).
112
112
113
-
Revise API specification for what scope level is necessary to invoke each API (briefly, only `POST /shard` and `POST /xorb/*` API's require `write` scope).
113
+
Check API specification for what scope level is necessary to invoke each API (briefly, only `POST /shard` and `POST /xorb/*` API's require `write` scope).
114
114
115
115
The scope of the Xet tokens is limited to the repository and ref for which they were issued. To upload or download from different repositories or refs (different branches) clients MUST be issued different tokens.
Copy file name to clipboardExpand all lines: docs/hub/xet/chunking.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,7 +81,7 @@ if start_offset < len(data):
81
81
82
82
### Boundary probability and mask selection
83
83
84
-
Given that MASK has 16 one-bits, for a random 64-bit hash h, the chance that all those 16 bits are zero is 1 / 2^16. On average, that means you’ll see a match about once every 64 KiB.
84
+
Given that MASK has 16 one-bits, for a random 64-bit hash `h`, the chance that all those 16 bits are zero is 1 / 2^16. On average, that means you’ll see a match about once every 64 KiB.
Copy file name to clipboardExpand all lines: docs/hub/xet/download-protocol.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Download Protocol
2
2
3
-
This document describes the complete process of downloading a single file from the Xet protocol using the CAS (Content Addressable Storage) reconstruction API.
3
+
This document describes the complete process of downloading a single file from the Xet protocol using the Content Addressable Storage (CAS) reconstruction API.
4
4
5
5
## Overview
6
6
@@ -84,7 +84,7 @@ The reconstruction API returns a `QueryReconstructionResponse` object with three
84
84
- Maps xorb hashes to required information to download some of their chunks.
85
85
- The mapping is to an array of 1 or more `CASReconstructionFetchInfo`
86
86
- Each `CASReconstructionFetchInfo` contains:
87
-
-`url`: HTTP URL for downloading the xorb data, presigned url containing authorization information
87
+
-`url`: HTTP URL for downloading the xorb data, presigned URL containing authorization information
88
88
-`url_range` (bytes_start, bytes_end): Byte range `{ start: number, end: number }` for the Range header; end-inclusive `[start, end]`
89
89
- The `Range` header MUST be set as `Range: bytes=<start>-<end>` when downloading this chunk range
90
90
-`range` (index_start, index_end): Chunk index range `{ start: number, end: number }` that this URL provides; end-exclusive `[start, end)`
@@ -233,7 +233,7 @@ For partial file downloads, the reconstruction API supports range queries:
233
233
234
234
When downloading individual term data:
235
235
236
-
A client MUST include the `Range` header formed with the values from the url_range field to specify the exact range of data of a xorb that they are accessing. Not specifying this header will cause result in an authorization failure.
236
+
A client MUST include the `Range` header formed with the values from the `url_range` field to specify the exact range of data of a xorb that they are accessing. Not specifying this header will cause result in an authorization failure.
237
237
238
238
Xet global deduplication requires that access to xorbs is only granted to authorized ranges.
239
239
Not specifying this header will result in an authorization failure.
@@ -250,8 +250,8 @@ Consider downloading such content only once and reusing the data.
250
250
### Caching recommendations
251
251
252
252
1. It can be ineffective to cache the reconstruction object
253
-
1. The fetch_info section provides short-expiration pre-signed url's hence Clients SHOULD NOT cache the urls beyond their short expiration
254
-
2. To get those url's to access the data you will need to call the reconstruction API again anyway
253
+
1. The fetch_info section provides short-expiration pre-signed URL's hence Clients SHOULD NOT cache the urls beyond their short expiration
254
+
2. To get those URL's to access the data you will need to call the reconstruction API again anyway
255
255
2. Cache chunks by range not just individually
256
256
1. If you need a chunk from a xorb it is very likely that you will need another, so cache them close
257
257
3. Caching helps when downloading similar contents. May not be worth to cache data if you are always downloading different things
@@ -326,8 +326,8 @@ This example shows reconstruction of a file that requires:
326
326
- Chunks `[0, 2)` from the second xorb (~144KB of unpacked data)
327
327
- Chunks `[3, 43)` from the same xorb from the first term (~3MB of unpacked data)
328
328
329
-
The `fetch_info` provides the HTTP URLs and byte ranges needed to download the required chunk data from each xorb. The ranges provided within fetch_info and term sections are always end-exclusive i.e. `{ "start": 0, "end": 3 }` is a range of 3 chunks at indices 0, 1 and 2.
330
-
The ranges provided under a fetch_info items' url_range key are to be used to form the `Range` header when downloading the chunk range.
329
+
The `fetch_info` provides the HTTP URLs and byte ranges needed to download the required chunk data from each xorb. The ranges provided within `fetch_info` and term sections are always end-exclusive i.e. `{ "start": 0, "end": 3 }` is a range of 3 chunks at indices 0, 1 and 2.
330
+
The ranges provided under a `fetch_info` items' `url_range` key are to be used to form the `Range` header when downloading the chunk range.
331
331
A `"url_range"` value of `{ "start": X, "end": Y }` creates a `Range` header value of `bytes=X-Y`.
332
332
333
333
When downloading and deserializing the chunks from xorb `a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456` we will have the chunks at indices `[1, 43)`.
Copy file name to clipboardExpand all lines: docs/hub/xet/file-id.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Getting a Xet File ID from the Hugging Face Hub
2
2
3
-
This section explains the Xet file ID used in the reconstruction API to download a file from the HuggingFace hub using the xet protocol.
3
+
This section explains the Xet file ID used in the reconstruction API to download a file from the Hugging Face Hub using the xet protocol.
4
4
5
5
Given a particular namespace, repository and branch or commit hash and file path from the root of the repository, build the "resolve" URL for the file following this format:
6
6
@@ -11,7 +11,7 @@ repository: the repository name e.g. Qwen-Image-Edit
11
11
branch: any git branch or commit hash e.g. main
12
12
filepath: filepath in repository e.g. transformer/diffusion_pytorch_model-00001-of-00009.safetensors
Copy file name to clipboardExpand all lines: docs/hub/xet/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ Implementors can create their own clients, SDKs, and tools that speak the Xet pr
30
30
31
31
### xet-core: hf-xet + git-xet
32
32
33
-
The primary reference implementation of the protocol written in rust 🦀 lives in the [xet-core](https://github.com/huggingface/xet-core) repository under multiple crates:
33
+
The primary reference implementation of the protocol written in Rust 🦀 lives in the [xet-core](https://github.com/huggingface/xet-core) repository under multiple crates:
34
34
35
35
-[cas_types](https://github.com/huggingface/xet-core/tree/main/cas_types) - Common re-usable types for interacting with CAS API's
36
36
-[cas_client](https://github.com/huggingface/xet-core/tree/main/cas_client) - Client interface that calls CAS API's, including comprehensive implementation of download protocol.
0 commit comments