huggingface
diff --git a/‎docs/hub/_redirects.yml‎
Lines changed: 1 addition & 0 deletions b/‎docs/hub/_redirects.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/xet/_toctree.yml‎ renamed to ‎docs/hub/xet/_toctree.yml‎ b/‎docs/xet/_toctree.yml‎ renamed to ‎docs/hub/xet/_toctree.yml‎
diff --git a/‎docs/xet/api.md‎ renamed to ‎docs/hub/xet/api.md‎
Lines changed: 12 additions & 12 deletions b/‎docs/xet/api.md‎ renamed to ‎docs/hub/xet/api.md‎
Lines changed: 12 additions & 12 deletions
diff --git a/‎docs/xet/auth.md‎ renamed to ‎docs/hub/xet/auth.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/xet/auth.md‎ renamed to ‎docs/hub/xet/auth.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/xet/chunking.md‎ renamed to ‎docs/hub/xet/chunking.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/xet/chunking.md‎ renamed to ‎docs/hub/xet/chunking.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/xet/deduplication.md‎ renamed to ‎docs/hub/xet/deduplication.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/xet/deduplication.md‎ renamed to ‎docs/hub/xet/deduplication.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/xet/download-protocol.md‎ renamed to ‎docs/hub/xet/download-protocol.md‎
Lines changed: 16 additions & 16 deletions b/‎docs/xet/download-protocol.md‎ renamed to ‎docs/hub/xet/download-protocol.md‎
Lines changed: 16 additions & 16 deletions
diff --git a/‎docs/xet/file-id.md‎ renamed to ‎docs/hub/xet/file-id.md‎ b/‎docs/xet/file-id.md‎ renamed to ‎docs/hub/xet/file-id.md‎
diff --git a/‎docs/xet/file-reconstruction.md‎ renamed to ‎docs/hub/xet/file-reconstruction.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/xet/file-reconstruction.md‎ renamed to ‎docs/hub/xet/file-reconstruction.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/xet/hashing.md‎ renamed to ‎docs/hub/xet/hashing.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/xet/hashing.md‎ renamed to ‎docs/hub/xet/hashing.md‎
Lines changed: 1 addition & 1 deletion
@@ -19,3 +19,4 @@ adapter-transformers: adapters
 security-two-fa: security-2fa
 repositories-recommendations: storage-limits
 xet: storage-backends#xet
+xet-spec: xet/index
@@ -4,7 +4,7 @@ This document describes the HTTP API endpoints used by the CAS (Content Addressa
 
 ## Authentication
 
-To authenticate, authorize, and obtain the API base URL, follow the instructions in [Authentication](./auth.md).
+To authenticate, authorize, and obtain the API base URL, follow the instructions in [Authentication](./auth).
 
 ## Converting Hashes to Strings
 
@@ -38,7 +38,7 @@ It is: `07060504030201000f0e0d0c0b0a0908171615141312111f1e1d1c1b1a1918`.
 - **Method**: `GET`
 - **Parameters**:
   - `file_id`: File hash in hex format (64 lowercase hexadecimal characters).
-See [file hashes](./hashing.md#file-hashes) for computing the file hash and [converting hashes to strings](./api.md#converting-hashes-to-strings).
+See [file hashes](./hashing#file-hashes) for computing the file hash and [converting hashes to strings](./api#converting-hashes-to-strings).
 - **Headers**:
   - `Range`: OPTIONAL. Format: `bytes={start}-{end}` (end is inclusive).
 - **Minimum Token Scope**: `read`
@@ -53,7 +53,7 @@ See [file hashes](./hashing.md#file-hashes) for computing the file hash and [con
   }
   ```
 
-- **Error Responses**: See [Error Cases](./api.md#error-cases)
+- **Error Responses**: See [Error Cases](./api#error-cases)
   - `400 Bad Request`: Malformed `file_id` in the path. Fix the path before retrying.
   - `401 Unauthorized`: Refresh the token to continue making requests, or provide a token in the `Authorization` header.
   - `404 Not Found`: The file does not exist. Not retryable.
@@ -67,7 +67,7 @@ OPTIONAL: -H Range: "bytes=0-100000"
 
 ### Example File Reconstruction Response Body
 
-See [QueryReconstructionResponse](./download-protocol.md#queryreconstructionresponse-structure) for more details in the download protocol specification.
+See [QueryReconstructionResponse](./download-protocol#queryreconstructionresponse-structure) for more details in the download protocol specification.
 
 ### 2. Query Chunk Deduplication (Global Deduplication)
 
@@ -77,11 +77,11 @@ See [QueryReconstructionResponse](./download-protocol.md#queryreconstructionresp
 - **Parameters**:
   - `prefix`: The only acceptable prefix for the Global Deduplication API is `default-merkledb`.
   - `hash`: Chunk hash in hex format (64 lowercase hexadecimal characters).
-See [Chunk Hashes](./hashing.md#chunk-hashes) to compute the chunk hash and [converting hashes to strings](./api.md#converting-hashes-to-strings).
+See [Chunk Hashes](./hashing#chunk-hashes) to compute the chunk hash and [converting hashes to strings](./api#converting-hashes-to-strings).
 - **Minimum Token Scope**: `read`
 - **Body**: None.
-- **Response**: Shard format bytes (`application/octet-stream`), deserialize as a [shard](./shard.md#global-deduplication).
-- **Error Responses**: See [Error Cases](./api.md#error-cases)
+- **Response**: Shard format bytes (`application/octet-stream`), deserialize as a [shard](./shard#global-deduplication).
+- **Error Responses**: See [Error Cases](./api#error-cases)
   - `400 Bad Request`: Malformed hash in the path. Fix the path before retrying.
   - `401 Unauthorized`: Refresh the token to continue making requests, or provide a token in the `Authorization` header.
   - `404 Not Found`: Chunk not already tracked by global deduplication. Not retryable.
@@ -103,10 +103,10 @@ An example shard response body can be found in [Xet reference files](https://hug
 - **Parameters**:
   - `prefix`: The only acceptable prefix for the Xorb upload API is `default`.
   - `hash`: Xorb hash in hex format (64 lowercase hexadecimal characters).
-See [Xorb Hashes](./hashing.md#xorb-hashes) to compute the hash, and [converting hashes to strings](./api.md#converting-hashes-to-strings).
+See [Xorb Hashes](./hashing#xorb-hashes) to compute the hash, and [converting hashes to strings](./api#converting-hashes-to-strings).
 - **Minimum Token Scope**: `write`
 - **Body**: Serialized Xorb bytes (`application/octet-stream`).
-See [xorb format serialization](./xorb.md).
+See [xorb format serialization](./xorb).
 - **Response**: JSON (`UploadXorbResponse`)
 
 ```json
@@ -117,7 +117,7 @@ See [xorb format serialization](./xorb.md).
 
 - Note: `was_inserted` is `false` if the Xorb already exists; this is not an error.
 
-- **Error Responses**: See [Error Cases](./api.md#error-cases)
+- **Error Responses**: See [Error Cases](./api#error-cases)
   - `400 Bad Request`: Malformed hash in the path, Xorb hash does not match the body, or body is incorrectly serialized.
   - `401 Unauthorized`: Refresh the token to continue making requests, or provide a token in the `Authorization` header.
   - `403 Forbidden`: Token provided but does not have a wide enough scope (for example, a `read` token was provided). Clients MUST retry with a `write` scope token.
@@ -139,7 +139,7 @@ Uploads file reconstructions and new xorb listing, serialized into the shard for
 - **Method**: `POST`
 - **Minimum Token Scope**: `write`
 - **Body**: Serialized Shard data as bytes (`application/octet-stream`).
-See [Shard format guide](./shard.md#shard-upload).
+See [Shard format guide](./shard#shard-upload).
 - **Response**: JSON (`UploadShardResponse`)
 
 ```json
@@ -154,7 +154,7 @@ See [Shard format guide](./shard.md#shard-upload).
 
 The value of `result` does not carry any meaning, if the upload shard API returns a `200 OK` status code, the upload was successful and the files listed are considered uploaded.
 
-- **Error Responses**: See [Error Cases](./api.md#error-cases)
+- **Error Responses**: See [Error Cases](./api#error-cases)
   - `400 Bad Request`: Shard is incorrectly serialized or Shard contents failed verification.
     - Can mean that a referenced Xorb doesn't exist or the shard is too large
   - `401 Unauthorized`: Refresh the token to continue making requests, or provide a token in the `Authorization` header.
 
@@ -23,7 +23,7 @@ All parameters are required to form the url.
 - `token_type`: Either `read` or `write`.
 - `revision`: Git revision (branch, tag, or commit hash; default to using `main` if no specific ref is required)
 
-To understand the distinction for between `token_type` values read onwards in this document to [Token Scope](./auth.md#token-scope).
+To understand the distinction for between `token_type` values read onwards in this document to [Token Scope](./auth#token-scope).
 
 **Example URLs:**
 
 
@@ -141,7 +141,7 @@ The [xet-team/xet-spec-reference-files](https://huggingface.co/datasets/xet-team
 
 In the same repository in file [Electric_Vehicle_Population_Data_20250917.csv.chunks](https://huggingface.co/datasets/xet-team/xet-spec-reference-files/blob/main/Electric_Vehicle_Population_Data_20250917.csv.chunks)
 the chunks produced out of [Electric_Vehicle_Population_Data_20250917.csv](https://huggingface.co/datasets/xet-team/xet-spec-reference-files/blob/main/Electric_Vehicle_Population_Data_20250917.csv) are listed.
-Each line in the file is a 64 hexadecimal hash of the chunk, followed by a space and then the number of bytes in that chunk.
+Each line in the file is a 64 hexadecimal character string version of the hash of the chunk, followed by a space and then the number of bytes in that chunk.
 
 Implementors should use the chunk lengths to determine that they are producing the right chunk boundaries for this file with their chunking implementation.
 
 
@@ -23,7 +23,7 @@ A **chunk** is a variable-sized content block derived from files using Content-D
 - **Size range**: 8KB to 128KB (minimum and maximum constraints)
 - **Identification**: Each chunk is uniquely identified by its cryptographic hash (MerkleHash)
 
-[Detailed chunking description](./chunking.md)
+[Detailed chunking description](./chunking)
 
 ### Xorbs (Extended Object Blocks)
 
@@ -143,11 +143,11 @@ They MAY know this chunk hash because they own this data, the match has made the
 ### Chunk Hash Computation
 
 Each chunk has its content hashed using a cryptographic hash function (Blake3-based MerkleHash) to create a unique identifier for content addressing.
-[See section about hashing](./hashing.md#chunk-hashes).
+[See section about hashing](./hashing#chunk-hashes).
 
 ### Xorb Formation
 
-When new chunks need to be stored, they are aggregated into xorbs based on size and count limits. If adding a new chunk would exceed the maximum xorb size or chunk count, the current xorb is finalized and uploaded. [See section about xorb formation](../xorb.md)
+When new chunks need to be stored, they are aggregated into xorbs based on size and count limits. If adding a new chunk would exceed the maximum xorb size or chunk count, the current xorb is finalized and uploaded. [See section about xorb formation](../xorb)
 
 ### File Reconstruction Information
 
@@ -164,7 +164,7 @@ This information allows the system to reconstruct files by:
 2. Extracting the specific chunk ranges from each xorb
 3. Concatenating chunks in the correct order
 
-[See section about file reconstruction](./file-reconstruction.md).
+[See section about file reconstruction](./file-reconstruction).
 
 ## Fragmentation Prevention
 
 
@@ -13,9 +13,9 @@ File download in the Xet protocol is a two-stage process:
 
 ### Single File Reconstruction
 
-To download a file given a file hash, first call the reconstruction API to get the file reconstruction. Follow the steps in [api.md](./api.md#1-get-file-reconstruction).
+To download a file given a file hash, first call the reconstruction API to get the file reconstruction. Follow the steps in [api](./api#1-get-file-reconstruction).
 
-Note that you will need at least a `read` scope auth token, [auth reference](./auth.md).
+Note that you will need at least a `read` scope auth token, [auth reference](./auth).
 
 > For large files it is RECOMMENDED to request the reconstruction in batches i.e. the first 10GB, download all the data, then the next 10GB and so on. Clients can use the `Range` header to specify a range of file data.
 
@@ -116,7 +116,7 @@ Scroll
 
 ```python
 file_id = "0123...abcdef"
-api_endpoint, token = get_token() # follow auth.md instructions
+api_endpoint, token = get_token() # follow auth instructions
 url = api_endpoint + "/reconstructions/" + file_id
 reconstruction = get(url, headers={"Authorization": "Bearer: " + token})
 
@@ -172,7 +172,7 @@ The downloaded data is in xorb format and MUST be deserialized:
 3. **Extract byte indices**: Track byte boundaries between chunks for range extraction
 4. **Validate length**: Decompressed length MUST match `unpacked_length` from the term
 
-**Note**: The specific deserialization process depends on the [Xorb format](../xorb.md).
+**Note**: The specific deserialization process depends on the [Xorb format](../xorb).
 
 ```python
 for term in terms:
@@ -340,23 +340,23 @@ Note that in this example the chunk at index 3 is used twice! This is the benefi
 ```mermaid
 sequenceDiagram
   autonumber
-  actor Client as "Client"
-  participant CAS as "CAS API"
-  participant Transfer as "Transfer Service (Xet storage)"
+  actor client as Client
+  participant S as CAS API
+  participant Transfer as Transfer Service (Xet storage)
 
-  Client->>CAS: GET /reconstructions/{file_id}<br/>Authorization: Bearer <token><br/>Range: bytes=start-end (optional)
-  CAS-->>Client: 200 OK<br/>QueryReconstructionResponse {offset_into_first_range, terms[], fetch_info{}}
+  client->>S: GET /reconstructions/{file_id}<br/>Authorization: Bearer <token><br/>Range: bytes=start-end (optional)
+  S-->>client: 200 OK<br/>QueryReconstructionResponse {offset_into_first_range, terms[], fetch_info{}}
 
   loop For each term in terms (ordered)
-    Client->>Client: Find fetch_info by xorb hash, entry whose range contains term.range
-    Client->>Transfer: GET {url}<br/>Range: bytes=url_range.start-url_range.end
-    Transfer-->>Client: 206 Partial Content<br/>xorb byte range
-    Client->>Client: Deserialize xorb → chunks for fetch_info.range
-    Client->>Client: Trim to term.range, apply offset for first term
-    Client->>Client: Append chunks to output
+    client->>client: Find fetch_info by xorb hash, entry whose range contains term.range
+    client->>Transfer: GET {url}<br/>Range: bytes=url_range.start-url_range.end
+    Transfer-->>client: 206 Partial Content<br/>xorb byte range
+    client->>client: Deserialize xorb → chunks for fetch_info.range
+    client->>client: Trim to term.range, apply offset for first term
+    client->>client: Append chunks to output
   end
 
   alt Range requested
-    Client->>Client: Truncate output to requested length
+    client->>client: Truncate output to requested length
   end
 ```
@@ -12,8 +12,8 @@ This document describes how a file can be represented and reconstructed from a c
 
 ## Core Idea
 
-After following the [chunking procedure](./chunking.md) a file can be represented as an ordering of chunks.
-Those chunks are then packed into [xorbs](./xorb.md) and given the set of xorbs we convert the file representation to "reconstruction" made up of "terms".
+After following the [chunking procedure](./chunking) a file can be represented as an ordering of chunks.
+Those chunks are then packed into [xorbs](./xorb) and given the set of xorbs we convert the file representation to "reconstruction" made up of "terms".
 When forming xorbs the ordering and grouping of chunks prioritizes contiguous runs of chunks that appear in a file such that when referencing a xorb we maximize the term range length.
 
 Any file’s raw bytes can be described as the concatenation of data produced by a sequence of terms.
@@ -105,7 +105,7 @@ A file’s reconstruction can be serialized into a shard as part of its file inf
 Conceptually, this section encodes the complete set of terms that describe the file.
 When stored this way, the representation is canonical and sufficient to reconstruct the full file solely from its referenced xorb ranges.
 
-Reference: [shard format file info](./shard.md#2-file-info-section)
+Reference: [shard format file info](./shard#2-file-info-section)
 
 ### Deserialization from the reconstruction API (JSON)
 
@@ -114,7 +114,7 @@ This response is represented by a structure named “QueryReconstructionResponse
 The `terms` list contains, for each term, the xorb identifier and the contiguous chunk index range to retrieve.
 Other fields may provide auxiliary details (such as offsets or fetch hints) that optimize retrieval without altering the meaning of the `terms` sequence.
 
-Reference: [api.md](./api.md), [download protocol](./download-protocol.md)
+Reference: [api](./api), [download protocol](./download-protocol)
 
 ## Fragmentation and Why Longer Ranges Matter
 
 
@@ -137,7 +137,7 @@ Reference files are provided in Hugging Face Dataset repository [xet-team/xet-sp
 In this repository there are a number of different samples implementors can use to verify hash computations.
 
 > Note that all hashes are represented as strings.
-To get the raw value of these hashes you must invert the endianness of each byte octet in the hash string, reversing the procedure described in [api.md](./api.md#converting-hashes-to-strings).
+To get the raw value of these hashes you must invert the endianness of each byte octet in the hash string, reversing the procedure described in [api](./api#converting-hashes-to-strings).
 
 ### Chunk Hashes Sample