33## Overview
44
55MCV supports two vLLM cache formats:
6- 1 . ** Triton Cache Format** (legacy/unpacked) - Original format with ` triton_cache/ ` directory
7- 2 . ** Binary Cache Format** (new) - New format with ` rank_X_Y/ ` directory structure
86
9- This document describes the ** Binary Cache Format** introduced in recent versions of vLLM.
7+ 1 . ** vLLM Triton Cache Format** (legacy) - Stores ` triton_cache/ ` and ` inductor_cache/ ` inside rank directories
8+ 2 . ** vLLM Binary Cache Format** (new) - Stores prefix directories (e.g., ` backbone/ ` ) inside rank directories
9+
10+ Both formats share the same top-level structure: ` torch_compile_cache/{hash}/rank_{rank}_{dp_rank}/ `
11+
12+ The key differences are ** inside the rank directory** :
13+ - ** Triton format** : Contains ` triton_cache/ ` and ` inductor_cache/ ` subdirectories with unpacked artifacts
14+ - ** Binary format** : Contains prefix directories (e.g., ` backbone/ ` , ` eagle_head/ ` ) with ` cache_key_factors.json ` and artifacts that can be either binary files or unpacked directories
15+
16+ This document describes the ** vLLM Binary Cache Format** introduced in recent versions of vLLM.
1017
1118## Binary Cache Format
1219
@@ -167,7 +174,7 @@ The `manifest.json` file contains comprehensive metadata:
167174```
168175
169176** Manifest Fields:**
170- - ` cacheFormat ` : Cache structure type (` "binary" ` for new format, ` "triton" ` for legacy/unpacked caches )
177+ - ` cacheFormat ` : vLLM cache structure type (` "binary" ` for new binary cache format, ` "triton" ` for legacy triton cache format )
171178- ` binary[] ` : Array of binary cache entries (one per rank/prefix combination)
172179- ` cache_save_format ` : Actual artifact storage format (` "binary" ` or ` "unpacked" ` )
173180- ` target_device ` : Target hardware (` "cuda" ` , ` "rocm" ` , ` "tpu" ` , ` "cpu" ` )
@@ -203,18 +210,18 @@ MCV automatically extracts hardware information from the cache metadata:
203210
204211## Format Detection
205212
206- MCV automatically detects the cache format by inspecting the filesystem:
213+ MCV automatically detects the vLLM cache format by inspecting the filesystem:
207214
208- 1 . ** Binary Format Detection** :
215+ 1 . ** vLLM Binary Cache Detection** :
209216 - Looks for ` rank_X_Y/ ` directories
210217 - Checks for ` cache_key_factors.json `
211218 - Inspects ` artifact_compile_range_* ` entries
212- - If entries are ** files** → Binary format
213- - If entries are ** directories** → Unpacked format
219+ - If entries are ** files** → Binary artifact storage
220+ - If entries are ** directories** → Unpacked artifact storage
214221
215- 2 . ** Triton Format Detection** (fallback):
222+ 2 . ** vLLM Triton Cache Detection** (fallback):
216223 - Looks for ` triton_cache/ ` directory
217- - Uses legacy/unpacked cache extraction logic
224+ - Uses legacy vLLM triton cache extraction logic
218225
219226This filesystem-based detection is more reliable than environment variables, especially when caches are copied between systems.
220227
@@ -226,10 +233,10 @@ MCV uses **three distinct format indicators** to describe vLLM caches. Each serv
226233
227234** Location** : ` manifest.json ` → ` vllm[].cacheFormat `
228235** Values** : ` "binary" ` or ` "triton" `
229- ** Purpose** : Tells MCV extraction logic which directory structure to expect
236+ ** Purpose** : Tells MCV extraction logic which vLLM cache structure to expect inside rank directories
230237
231- - ` "binary" ` : New format with ` rank_{ rank}_{dp_rank}/{ prefix}/ ` structure
232- - ` "triton" ` : Legacy format with ` triton_cache/ ` directory
238+ - ` "binary" ` : vLLM binary cache format - rank directories contain prefix subdirectories (e.g., ` backbone/ ` )
239+ - ` "triton" ` : vLLM triton cache format - rank directories contain ` triton_cache/ ` subdirectory
233240
234241** Example** :
235242``` json
@@ -276,8 +283,8 @@ This field is informational and helps users understand the internal artifact for
276283** Values** : ` "binary" ` or ` "unpacked" `
277284** Purpose** : Quick user-visible indicator of artifact storage format
278285
279- - ` "binary" ` : For binary cache format with binary artifacts
280- - ` "unpacked" ` : For triton cache format OR binary cache format with unpacked artifacts
286+ - ` "binary" ` : For vLLM binary cache format with binary artifacts
287+ - ` "unpacked" ` : For vLLM triton cache format OR vLLM binary cache format with unpacked artifacts
281288
282289** Example** :
283290``` json
@@ -292,27 +299,28 @@ This label allows users to quickly inspect cache format using `docker inspect` o
292299
293300### Format Mapping Table
294301
295- | Cache Type | Artifact Type | Manifest ` cacheFormat ` | Manifest ` cache_save_format ` | Image Label ` format ` |
302+ | vLLM Cache Format | Artifact Type | Manifest ` cacheFormat ` | Manifest ` cache_save_format ` | Image Label ` format ` |
296303| ------------| ---------------| ------------------------| ------------------------------| ----------------------|
297- | New binary cache with binary artifacts | Files | ` "binary" ` | ` "binary" ` | ` "binary" ` |
298- | New binary cache with unpacked artifacts | Directories | ` "binary" ` | ` "unpacked" ` | ` "unpacked" ` |
299- | Legacy triton cache | Directories | ` "triton" ` | N/A (not present) | ` "unpacked" ` |
304+ | vLLM binary cache with binary artifacts | Files | ` "binary" ` | ` "binary" ` | ` "binary" ` |
305+ | vLLM binary cache with unpacked artifacts | Directories | ` "binary" ` | ` "unpacked" ` | ` "unpacked" ` |
306+ | vLLM triton cache (legacy) | Directories | ` "triton" ` | N/A (not present) | ` "unpacked" ` |
300307
301308** Why Three Indicators?**
302309
303- - ** Manifest ` cacheFormat ` ** : Extraction logic must know the directory structure ( ` rank_X_Y /` vs ` triton_cache/ ` )
310+ - ** Manifest ` cacheFormat ` ** : Extraction logic must know what's inside rank directories ( ` triton_cache /` subdirs vs ` {prefix}/ ` subdirs )
304311- ** Manifest ` cache_save_format ` ** : Detailed metadata for debugging and compatibility checking
305312- ** Image Label ` format ` ** : Fast user-facing indicator without parsing full manifest
306313
307- ## Comparison: Binary vs Triton Cache
314+ ## Comparison: vLLM Binary Cache vs vLLM Triton Cache
308315
309- | Aspect | Triton Cache (Legacy) | Binary Cache (New) |
316+ | Aspect | vLLM Triton Cache (Legacy) | vLLM Binary Cache (New) |
310317| --------| ----------------------| -------------------|
311- | ** Structure** | ` triton_cache/ ` + ` inductor_cache/ ` | ` rank_X_Y/{prefix}/ ` |
318+ | ** Top-level Structure** | ` torch_compile_cache/{hash}/rank_X_Y/ ` | ` torch_compile_cache/{hash}/rank_X_Y/ ` |
319+ | ** Inside Rank Directory** | ` triton_cache/ ` + ` inductor_cache/ ` | ` {prefix}/ ` (e.g., ` backbone/ ` ) |
312320| ** Metadata** | Triton kernel JSON files | ` cache_key_factors.json ` |
313321| ** Storage** | Always unpacked | Binary or unpacked |
314322| ** Multiprocess** | Not guaranteed | Safe in binary mode |
315- | ** Distributed** | Limited support | Full rank/DP support |
323+ | ** Distributed** | Full rank/DP support | Full rank/DP support |
316324| ** Manifest Key** | ` "triton" ` | ` "binary" ` |
317325| ** Image Label** | ` "unpacked" ` | ` "binary" ` or ` "unpacked" ` |
318326
@@ -371,15 +379,15 @@ Key files in vLLM that implement binary cache:
3713794 . ** Verify hardware match** using image labels before deployment
3723805 . ** Check cache_save_format** in manifest when extracting caches
373381
374- ## Migration from Triton Cache
382+ ## Migration from vLLM Triton Cache to vLLM Binary Cache
375383
376- To migrate from triton cache to binary cache:
384+ To migrate from vLLM triton cache format to vLLM binary cache format :
377385
378- 1 . Update vLLM to a version that supports binary cache
386+ 1 . Update vLLM to a version that supports binary cache format
3793872 . Set ` VLLM_COMPILE_CACHE_SAVE_FORMAT=binary `
3803883 . Run model warmup to generate new binary cache
3813894 . Package new cache with MCV (automatically detected)
382- 5 . Both formats are supported, no breaking changes
390+ 5 . Both vLLM cache formats are supported, no breaking changes
383391
384392## See Also
385393
0 commit comments