Skip to content

Commit df6b724

Browse files
committed
Jared suggestions
1 parent e8c859c commit df6b724

File tree

7 files changed

+17
-17
lines changed

7 files changed

+17
-17
lines changed

docs/hub/xet/api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# CAS API Documentation
22

3-
This document describes the HTTP API endpoints used by the CAS (Content Addressable Storage) client to interact with the remote CAS server.
3+
This document describes the HTTP API endpoints used by the Content Addressable Storage (CAS) client to interact with the remote CAS server.
44

55
## Authentication
66

docs/hub/xet/auth.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Authentication and Authorization
22

3-
To invoke any API's mentioned in this specification a client MUST first acquire a token (and the url) to authenticate against the server which serves these API's.
3+
To invoke any API's mentioned in this specification a client MUST first acquire a token (and the URL) to authenticate against the server which serves these API's.
44

55
The Xet protocol server uses bearer authentication via a token generated by the Hugging Face Hub (<https://huggingface.co>).
66

@@ -16,7 +16,7 @@ https://huggingface.co/api/{repo_type}s/{repo_id}/xet-{token_type}-token/{revisi
1616

1717
**Parameters:**
1818

19-
All parameters are required to form the url.
19+
All parameters are required to form the URL.
2020

2121
- `repo_type`: Type of repository - `model`, `dataset`, or `space`
2222
- `repo_id`: Repository identifier in format `namespace/repo-name`
@@ -110,7 +110,7 @@ Xet tokens can have either a `read` or a `write` scope.
110110
`write` scope supersedes `read` scope and all `read` scope API's can be invoked when using a `write` scope token.
111111
The type of token issued is determined on the `token_type` URI path component when requesting the token from the Hugging Face Hub (see above).
112112

113-
Revise API specification for what scope level is necessary to invoke each API (briefly, only `POST /shard` and `POST /xorb/*` API's require `write` scope).
113+
Check API specification for what scope level is necessary to invoke each API (briefly, only `POST /shard` and `POST /xorb/*` API's require `write` scope).
114114

115115
The scope of the Xet tokens is limited to the repository and ref for which they were issued. To upload or download from different repositories or refs (different branches) clients MUST be issued different tokens.
116116

docs/hub/xet/chunking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ if start_offset < len(data):
8181

8282
### Boundary probability and mask selection
8383

84-
Given that MASK has 16 one-bits, for a random 64-bit hash h, the chance that all those 16 bits are zero is 1 / 2^16. On average, that means you’ll see a match about once every 64 KiB.
84+
Given that MASK has 16 one-bits, for a random 64-bit hash `h`, the chance that all those 16 bits are zero is 1 / 2^16. On average, that means you’ll see a match about once every 64 KiB.
8585

8686
### Properties
8787

docs/hub/xet/deduplication.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Xet employs a three-tiered deduplication strategy to maximize efficiency while m
9696

9797
#### Level 3: Global Deduplication API
9898

99-
**Scope**: Entire Xet ecosystem
99+
**Scope**: Entire Xet system
100100
**Mechanism**: Global deduplication service with HMAC protection
101101
**Purpose**: Discover deduplication opportunities across all users and repositories
102102

docs/hub/xet/download-protocol.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Download Protocol
22

3-
This document describes the complete process of downloading a single file from the Xet protocol using the CAS (Content Addressable Storage) reconstruction API.
3+
This document describes the complete process of downloading a single file from the Xet protocol using the Content Addressable Storage (CAS) reconstruction API.
44

55
## Overview
66

@@ -84,7 +84,7 @@ The reconstruction API returns a `QueryReconstructionResponse` object with three
8484
- Maps xorb hashes to required information to download some of their chunks.
8585
- The mapping is to an array of 1 or more `CASReconstructionFetchInfo`
8686
- Each `CASReconstructionFetchInfo` contains:
87-
- `url`: HTTP URL for downloading the xorb data, presigned url containing authorization information
87+
- `url`: HTTP URL for downloading the xorb data, presigned URL containing authorization information
8888
- `url_range` (bytes_start, bytes_end): Byte range `{ start: number, end: number }` for the Range header; end-inclusive `[start, end]`
8989
- The `Range` header MUST be set as `Range: bytes=<start>-<end>` when downloading this chunk range
9090
- `range` (index_start, index_end): Chunk index range `{ start: number, end: number }` that this URL provides; end-exclusive `[start, end)`
@@ -233,7 +233,7 @@ For partial file downloads, the reconstruction API supports range queries:
233233

234234
When downloading individual term data:
235235

236-
A client MUST include the `Range` header formed with the values from the url_range field to specify the exact range of data of a xorb that they are accessing. Not specifying this header will cause result in an authorization failure.
236+
A client MUST include the `Range` header formed with the values from the `url_range` field to specify the exact range of data of a xorb that they are accessing. Not specifying this header will cause result in an authorization failure.
237237

238238
Xet global deduplication requires that access to xorbs is only granted to authorized ranges.
239239
Not specifying this header will result in an authorization failure.
@@ -250,8 +250,8 @@ Consider downloading such content only once and reusing the data.
250250
### Caching recommendations
251251

252252
1. It can be ineffective to cache the reconstruction object
253-
1. The fetch_info section provides short-expiration pre-signed url's hence Clients SHOULD NOT cache the urls beyond their short expiration
254-
2. To get those url's to access the data you will need to call the reconstruction API again anyway
253+
1. The fetch_info section provides short-expiration pre-signed URL's hence Clients SHOULD NOT cache the urls beyond their short expiration
254+
2. To get those URL's to access the data you will need to call the reconstruction API again anyway
255255
2. Cache chunks by range not just individually
256256
1. If you need a chunk from a xorb it is very likely that you will need another, so cache them close
257257
3. Caching helps when downloading similar contents. May not be worth to cache data if you are always downloading different things
@@ -326,8 +326,8 @@ This example shows reconstruction of a file that requires:
326326
- Chunks `[0, 2)` from the second xorb (~144KB of unpacked data)
327327
- Chunks `[3, 43)` from the same xorb from the first term (~3MB of unpacked data)
328328

329-
The `fetch_info` provides the HTTP URLs and byte ranges needed to download the required chunk data from each xorb. The ranges provided within fetch_info and term sections are always end-exclusive i.e. `{ "start": 0, "end": 3 }` is a range of 3 chunks at indices 0, 1 and 2.
330-
The ranges provided under a fetch_info items' url_range key are to be used to form the `Range` header when downloading the chunk range.
329+
The `fetch_info` provides the HTTP URLs and byte ranges needed to download the required chunk data from each xorb. The ranges provided within `fetch_info` and term sections are always end-exclusive i.e. `{ "start": 0, "end": 3 }` is a range of 3 chunks at indices 0, 1 and 2.
330+
The ranges provided under a `fetch_info` items' `url_range` key are to be used to form the `Range` header when downloading the chunk range.
331331
A `"url_range"` value of `{ "start": X, "end": Y }` creates a `Range` header value of `bytes=X-Y`.
332332

333333
When downloading and deserializing the chunks from xorb `a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456` we will have the chunks at indices `[1, 43)`.

docs/hub/xet/file-id.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Getting a Xet File ID from the Hugging Face Hub
22

3-
This section explains the Xet file ID used in the reconstruction API to download a file from the HuggingFace hub using the xet protocol.
3+
This section explains the Xet file ID used in the reconstruction API to download a file from the Hugging Face Hub using the xet protocol.
44

55
Given a particular namespace, repository and branch or commit hash and file path from the root of the repository, build the "resolve" URL for the file following this format:
66

@@ -11,7 +11,7 @@ repository: the repository name e.g. Qwen-Image-Edit
1111
branch: any git branch or commit hash e.g. main
1212
filepath: filepath in repository e.g. transformer/diffusion_pytorch_model-00001-of-00009.safetensors
1313

14-
resolve url:
14+
resolve URL:
1515

1616
https://huggingface.co/{namespace}/{repository}/resolve/{branch}/{filepath}
1717

@@ -21,7 +21,7 @@ Example:
2121
https://huggingface.co/Qwen/Qwen-Image-Edit/resolve/main/transformer/diffusion_pytorch_model-00001-of-00009.safetensors
2222
```
2323

24-
Then make a `GET` request to the resolve url using your standard Hugging Face Hub credentials/token.
24+
Then make a `GET` request to the resolve URL using your standard Hugging Face Hub credentials/token.
2525

2626
If the file is stored on the xet system then a successful response will have a `X-Xet-Hash` header.
2727

docs/hub/xet/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Implementors can create their own clients, SDKs, and tools that speak the Xet pr
3030

3131
### xet-core: hf-xet + git-xet
3232

33-
The primary reference implementation of the protocol written in rust 🦀 lives in the [xet-core](https://github.com/huggingface/xet-core) repository under multiple crates:
33+
The primary reference implementation of the protocol written in Rust 🦀 lives in the [xet-core](https://github.com/huggingface/xet-core) repository under multiple crates:
3434

3535
- [cas_types](https://github.com/huggingface/xet-core/tree/main/cas_types) - Common re-usable types for interacting with CAS API's
3636
- [cas_client](https://github.com/huggingface/xet-core/tree/main/cas_client) - Client interface that calls CAS API's, including comprehensive implementation of download protocol.

0 commit comments

Comments
 (0)