Jared suggestions

assafvayner · assafvayner · commit df6b724b9066 · 2025-09-26T09:46:00.000-07:00
diff --git a/docs/hub/xet/api.md b/docs/hub/xet/api.md
@@ -1,6 +1,6 @@
 # CAS API Documentation
 
-This document describes the HTTP API endpoints used by the CAS (Content Addressable Storage) client to interact with the remote CAS server.
+This document describes the HTTP API endpoints used by the Content Addressable Storage (CAS) client to interact with the remote CAS server.
 
 ## Authentication
 
diff --git a/docs/hub/xet/auth.md b/docs/hub/xet/auth.md
@@ -1,6 +1,6 @@
 # Authentication and Authorization
 
-To invoke any API's mentioned in this specification a client MUST first acquire a token (and the url) to authenticate against the server which serves these API's.
+To invoke any API's mentioned in this specification a client MUST first acquire a token (and the URL) to authenticate against the server which serves these API's.
 
 The Xet protocol server uses bearer authentication via a token generated by the Hugging Face Hub (<https://huggingface.co>).
 
@@ -16,7 +16,7 @@ https://huggingface.co/api/{repo_type}s/{repo_id}/xet-{token_type}-token/{revisi
 
 **Parameters:**
 
-All parameters are required to form the url.
+All parameters are required to form the URL.
 
 - `repo_type`: Type of repository - `model`, `dataset`, or `space`
 - `repo_id`: Repository identifier in format `namespace/repo-name`
@@ -110,7 +110,7 @@ Xet tokens can have either a `read` or a `write` scope.
 `write` scope supersedes `read` scope and all `read` scope API's can be invoked when using a `write` scope token.
 The type of token issued is determined on the `token_type` URI path component when requesting the token from the Hugging Face Hub (see above).
 
-Revise API specification for what scope level is necessary to invoke each API (briefly, only `POST /shard` and `POST /xorb/*` API's require `write` scope).
+Check API specification for what scope level is necessary to invoke each API (briefly, only `POST /shard` and `POST /xorb/*` API's require `write` scope).
 
 The scope of the Xet tokens is limited to the repository and ref for which they were issued. To upload or download from different repositories or refs (different branches) clients MUST be issued different tokens.
 
diff --git a/docs/hub/xet/chunking.md b/docs/hub/xet/chunking.md
@@ -81,7 +81,7 @@ if start_offset < len(data):
 
 ### Boundary probability and mask selection
 
-Given that MASK has 16 one-bits, for a random 64-bit hash h, the chance that all those 16 bits are zero is 1 / 2^16. On average, that means you’ll see a match about once every 64 KiB.
+Given that MASK has 16 one-bits, for a random 64-bit hash `h`, the chance that all those 16 bits are zero is 1 / 2^16. On average, that means you’ll see a match about once every 64 KiB.
 
 ### Properties
 
diff --git a/docs/hub/xet/deduplication.md b/docs/hub/xet/deduplication.md
@@ -96,7 +96,7 @@ Xet employs a three-tiered deduplication strategy to maximize efficiency while m
 
 #### Level 3: Global Deduplication API
 
-**Scope**: Entire Xet ecosystem
+**Scope**: Entire Xet system
 **Mechanism**: Global deduplication service with HMAC protection
 **Purpose**: Discover deduplication opportunities across all users and repositories
 
diff --git a/docs/hub/xet/download-protocol.md b/docs/hub/xet/download-protocol.md
@@ -1,6 +1,6 @@
 # Download Protocol
 
-This document describes the complete process of downloading a single file from the Xet protocol using the CAS (Content Addressable Storage) reconstruction API.
+This document describes the complete process of downloading a single file from the Xet protocol using the Content Addressable Storage (CAS) reconstruction API.
 
 ## Overview
 
@@ -84,7 +84,7 @@ The reconstruction API returns a `QueryReconstructionResponse` object with three
 - Maps xorb hashes to required information to download some of their chunks.
   - The mapping is to an array of 1 or more `CASReconstructionFetchInfo`
 - Each `CASReconstructionFetchInfo` contains:
-  - `url`: HTTP URL for downloading the xorb data, presigned url containing authorization information
+  - `url`: HTTP URL for downloading the xorb data, presigned URL containing authorization information
   - `url_range` (bytes_start, bytes_end): Byte range `{ start: number, end: number }` for the Range header; end-inclusive `[start, end]`
     - The `Range` header MUST be set as `Range: bytes=<start>-<end>` when downloading this chunk range
   - `range` (index_start, index_end): Chunk index range `{ start: number, end: number }` that this URL provides; end-exclusive `[start, end)`
@@ -233,7 +233,7 @@ For partial file downloads, the reconstruction API supports range queries:
 
 When downloading individual term data:
 
-A client MUST include the `Range` header formed with the values from the url_range field to specify the exact range of data of a xorb that they are accessing. Not specifying this header will cause result in an authorization failure.
+A client MUST include the `Range` header formed with the values from the `url_range` field to specify the exact range of data of a xorb that they are accessing. Not specifying this header will cause result in an authorization failure.
 
 Xet global deduplication requires that access to xorbs is only granted to authorized ranges.
 Not specifying this header will result in an authorization failure.
@@ -250,8 +250,8 @@ Consider downloading such content only once and reusing the data.
 ### Caching recommendations
 
 1. It can be ineffective to cache the reconstruction object
-    1. The fetch_info section provides short-expiration pre-signed url's hence Clients SHOULD NOT cache the urls beyond their short expiration
-    2. To get those url's to access the data you will need to call the reconstruction API again anyway
+    1. The fetch_info section provides short-expiration pre-signed URL's hence Clients SHOULD NOT cache the urls beyond their short expiration
+    2. To get those URL's to access the data you will need to call the reconstruction API again anyway
 2. Cache chunks by range not just individually
     1. If you need a chunk from a xorb it is very likely that you will need another, so cache them close
 3. Caching helps when downloading similar contents. May not be worth to cache data if you are always downloading different things
@@ -326,8 +326,8 @@ This example shows reconstruction of a file that requires:
 - Chunks `[0, 2)` from the second xorb (~144KB of unpacked data)
 - Chunks `[3, 43)` from the same xorb from the first term (~3MB of unpacked data)
 
-The `fetch_info` provides the HTTP URLs and byte ranges needed to download the required chunk data from each xorb. The ranges provided within fetch_info and term sections are always end-exclusive i.e. `{ "start": 0, "end": 3 }` is a range of 3 chunks at indices 0, 1 and 2.
-The ranges provided under a fetch_info items' url_range key are to be used to form the `Range` header when downloading the chunk range.
+The `fetch_info` provides the HTTP URLs and byte ranges needed to download the required chunk data from each xorb. The ranges provided within `fetch_info` and term sections are always end-exclusive i.e. `{ "start": 0, "end": 3 }` is a range of 3 chunks at indices 0, 1 and 2.
+The ranges provided under a `fetch_info` items' `url_range` key are to be used to form the `Range` header when downloading the chunk range.
 A `"url_range"` value of `{ "start": X, "end": Y }` creates a `Range` header value of `bytes=X-Y`.
 
 When downloading and deserializing the chunks from xorb `a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456` we will have the chunks at indices `[1, 43)`.
diff --git a/docs/hub/xet/file-id.md b/docs/hub/xet/file-id.md
@@ -1,6 +1,6 @@
 # Getting a Xet File ID from the Hugging Face Hub
 
-This section explains the Xet file ID used in the reconstruction API to download a file from the HuggingFace hub using the xet protocol.
+This section explains the Xet file ID used in the reconstruction API to download a file from the Hugging Face Hub using the xet protocol.
 
 Given a particular namespace, repository and branch or commit hash and file path from the root of the repository, build the "resolve" URL for the file following this format:
 
@@ -11,7 +11,7 @@ repository: the repository name e.g. Qwen-Image-Edit
 branch: any git branch or commit hash e.g. main
 filepath: filepath in repository e.g. transformer/diffusion_pytorch_model-00001-of-00009.safetensors 
 
-resolve url:
+resolve URL:
 
 https://huggingface.co/{namespace}/{repository}/resolve/{branch}/{filepath}
 
@@ -21,7 +21,7 @@ Example:
 https://huggingface.co/Qwen/Qwen-Image-Edit/resolve/main/transformer/diffusion_pytorch_model-00001-of-00009.safetensors
 ```
 
-Then make a `GET` request to the resolve url using your standard Hugging Face Hub credentials/token.
+Then make a `GET` request to the resolve URL using your standard Hugging Face Hub credentials/token.
 
 If the file is stored on the xet system then a successful response will have a `X-Xet-Hash` header.
 
diff --git a/docs/hub/xet/index.md b/docs/hub/xet/index.md
@@ -30,7 +30,7 @@ Implementors can create their own clients, SDKs, and tools that speak the Xet pr
 
 ### xet-core: hf-xet + git-xet
 
-The primary reference implementation of the protocol written in rust 🦀 lives in the [xet-core](https://github.com/huggingface/xet-core) repository under multiple crates:
+The primary reference implementation of the protocol written in Rust 🦀 lives in the [xet-core](https://github.com/huggingface/xet-core) repository under multiple crates:
 
 - [cas_types](https://github.com/huggingface/xet-core/tree/main/cas_types) - Common re-usable types for interacting with CAS API's
 - [cas_client](https://github.com/huggingface/xet-core/tree/main/cas_client) - Client interface that calls CAS API's, including comprehensive implementation of download protocol.