Skip to content

Commit c602a2a

Browse files
committed
mcv: update binary cache docs
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
1 parent 5c88000 commit c602a2a

File tree

1 file changed

+36
-28
lines changed

1 file changed

+36
-28
lines changed

mcv/docs/vllm-binary-cache.md

Lines changed: 36 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,17 @@
33
## Overview
44

55
MCV supports two vLLM cache formats:
6-
1. **Triton Cache Format** (legacy/unpacked) - Original format with `triton_cache/` directory
7-
2. **Binary Cache Format** (new) - New format with `rank_X_Y/` directory structure
86

9-
This document describes the **Binary Cache Format** introduced in recent versions of vLLM.
7+
1. **vLLM Triton Cache Format** (legacy) - Stores `triton_cache/` and `inductor_cache/` inside rank directories
8+
2. **vLLM Binary Cache Format** (new) - Stores prefix directories (e.g., `backbone/`) inside rank directories
9+
10+
Both formats share the same top-level structure: `torch_compile_cache/{hash}/rank_{rank}_{dp_rank}/`
11+
12+
The key differences are **inside the rank directory**:
13+
- **Triton format**: Contains `triton_cache/` and `inductor_cache/` subdirectories with unpacked artifacts
14+
- **Binary format**: Contains prefix directories (e.g., `backbone/`, `eagle_head/`) with `cache_key_factors.json` and artifacts that can be either binary files or unpacked directories
15+
16+
This document describes the **vLLM Binary Cache Format** introduced in recent versions of vLLM.
1017

1118
## Binary Cache Format
1219

@@ -167,7 +174,7 @@ The `manifest.json` file contains comprehensive metadata:
167174
```
168175

169176
**Manifest Fields:**
170-
- `cacheFormat`: Cache structure type (`"binary"` for new format, `"triton"` for legacy/unpacked caches)
177+
- `cacheFormat`: vLLM cache structure type (`"binary"` for new binary cache format, `"triton"` for legacy triton cache format)
171178
- `binary[]`: Array of binary cache entries (one per rank/prefix combination)
172179
- `cache_save_format`: Actual artifact storage format (`"binary"` or `"unpacked"`)
173180
- `target_device`: Target hardware (`"cuda"`, `"rocm"`, `"tpu"`, `"cpu"`)
@@ -203,18 +210,18 @@ MCV automatically extracts hardware information from the cache metadata:
203210

204211
## Format Detection
205212

206-
MCV automatically detects the cache format by inspecting the filesystem:
213+
MCV automatically detects the vLLM cache format by inspecting the filesystem:
207214

208-
1. **Binary Format Detection**:
215+
1. **vLLM Binary Cache Detection**:
209216
- Looks for `rank_X_Y/` directories
210217
- Checks for `cache_key_factors.json`
211218
- Inspects `artifact_compile_range_*` entries
212-
- If entries are **files** → Binary format
213-
- If entries are **directories** → Unpacked format
219+
- If entries are **files** → Binary artifact storage
220+
- If entries are **directories** → Unpacked artifact storage
214221

215-
2. **Triton Format Detection** (fallback):
222+
2. **vLLM Triton Cache Detection** (fallback):
216223
- Looks for `triton_cache/` directory
217-
- Uses legacy/unpacked cache extraction logic
224+
- Uses legacy vLLM triton cache extraction logic
218225

219226
This filesystem-based detection is more reliable than environment variables, especially when caches are copied between systems.
220227

@@ -226,10 +233,10 @@ MCV uses **three distinct format indicators** to describe vLLM caches. Each serv
226233

227234
**Location**: `manifest.json``vllm[].cacheFormat`
228235
**Values**: `"binary"` or `"triton"`
229-
**Purpose**: Tells MCV extraction logic which directory structure to expect
236+
**Purpose**: Tells MCV extraction logic which vLLM cache structure to expect inside rank directories
230237

231-
- `"binary"`: New format with `rank_{rank}_{dp_rank}/{prefix}/` structure
232-
- `"triton"`: Legacy format with `triton_cache/` directory
238+
- `"binary"`: vLLM binary cache format - rank directories contain prefix subdirectories (e.g., `backbone/`)
239+
- `"triton"`: vLLM triton cache format - rank directories contain `triton_cache/` subdirectory
233240

234241
**Example**:
235242
```json
@@ -276,8 +283,8 @@ This field is informational and helps users understand the internal artifact for
276283
**Values**: `"binary"` or `"unpacked"`
277284
**Purpose**: Quick user-visible indicator of artifact storage format
278285

279-
- `"binary"`: For binary cache format with binary artifacts
280-
- `"unpacked"`: For triton cache format OR binary cache format with unpacked artifacts
286+
- `"binary"`: For vLLM binary cache format with binary artifacts
287+
- `"unpacked"`: For vLLM triton cache format OR vLLM binary cache format with unpacked artifacts
281288

282289
**Example**:
283290
```json
@@ -292,27 +299,28 @@ This label allows users to quickly inspect cache format using `docker inspect` o
292299

293300
### Format Mapping Table
294301

295-
| Cache Type | Artifact Type | Manifest `cacheFormat` | Manifest `cache_save_format` | Image Label `format` |
302+
| vLLM Cache Format | Artifact Type | Manifest `cacheFormat` | Manifest `cache_save_format` | Image Label `format` |
296303
|------------|---------------|------------------------|------------------------------|----------------------|
297-
| New binary cache with binary artifacts | Files | `"binary"` | `"binary"` | `"binary"` |
298-
| New binary cache with unpacked artifacts | Directories | `"binary"` | `"unpacked"` | `"unpacked"` |
299-
| Legacy triton cache | Directories | `"triton"` | N/A (not present) | `"unpacked"` |
304+
| vLLM binary cache with binary artifacts | Files | `"binary"` | `"binary"` | `"binary"` |
305+
| vLLM binary cache with unpacked artifacts | Directories | `"binary"` | `"unpacked"` | `"unpacked"` |
306+
| vLLM triton cache (legacy) | Directories | `"triton"` | N/A (not present) | `"unpacked"` |
300307

301308
**Why Three Indicators?**
302309

303-
- **Manifest `cacheFormat`**: Extraction logic must know the directory structure (`rank_X_Y/` vs `triton_cache/`)
310+
- **Manifest `cacheFormat`**: Extraction logic must know what's inside rank directories (`triton_cache/` subdirs vs `{prefix}/` subdirs)
304311
- **Manifest `cache_save_format`**: Detailed metadata for debugging and compatibility checking
305312
- **Image Label `format`**: Fast user-facing indicator without parsing full manifest
306313

307-
## Comparison: Binary vs Triton Cache
314+
## Comparison: vLLM Binary Cache vs vLLM Triton Cache
308315

309-
| Aspect | Triton Cache (Legacy) | Binary Cache (New) |
316+
| Aspect | vLLM Triton Cache (Legacy) | vLLM Binary Cache (New) |
310317
|--------|----------------------|-------------------|
311-
| **Structure** | `triton_cache/` + `inductor_cache/` | `rank_X_Y/{prefix}/` |
318+
| **Top-level Structure** | `torch_compile_cache/{hash}/rank_X_Y/` | `torch_compile_cache/{hash}/rank_X_Y/` |
319+
| **Inside Rank Directory** | `triton_cache/` + `inductor_cache/` | `{prefix}/` (e.g., `backbone/`) |
312320
| **Metadata** | Triton kernel JSON files | `cache_key_factors.json` |
313321
| **Storage** | Always unpacked | Binary or unpacked |
314322
| **Multiprocess** | Not guaranteed | Safe in binary mode |
315-
| **Distributed** | Limited support | Full rank/DP support |
323+
| **Distributed** | Full rank/DP support | Full rank/DP support |
316324
| **Manifest Key** | `"triton"` | `"binary"` |
317325
| **Image Label** | `"unpacked"` | `"binary"` or `"unpacked"` |
318326

@@ -371,15 +379,15 @@ Key files in vLLM that implement binary cache:
371379
4. **Verify hardware match** using image labels before deployment
372380
5. **Check cache_save_format** in manifest when extracting caches
373381

374-
## Migration from Triton Cache
382+
## Migration from vLLM Triton Cache to vLLM Binary Cache
375383

376-
To migrate from triton cache to binary cache:
384+
To migrate from vLLM triton cache format to vLLM binary cache format:
377385

378-
1. Update vLLM to a version that supports binary cache
386+
1. Update vLLM to a version that supports binary cache format
379387
2. Set `VLLM_COMPILE_CACHE_SAVE_FORMAT=binary`
380388
3. Run model warmup to generate new binary cache
381389
4. Package new cache with MCV (automatically detected)
382-
5. Both formats are supported, no breaking changes
390+
5. Both vLLM cache formats are supported, no breaking changes
383391

384392
## See Also
385393

0 commit comments

Comments
 (0)