Add TurboQuant KV cache backend by timonharz · Pull Request #160 · ml-explore/mlx-swift-lm

timonharz · 2026-03-25T19:37:51Z

Summary

This PR ports the TurboQuant KV-cache backend from mlx-vlm PR #858 into mlx-swift-lm, with the implementation centered in MLXLMCommon so it works for both MLXLLM and the text-decoder side of MLXVLM.

The behavior matches the upstream intent:

fractional kvBits automatically use TurboQuant
integer kvBits continue to use uniform quantization by default
kvQuantizationScheme = .turboQuant can explicitly force TurboQuant for integer bit widths

What changed

Added a new shared TurboQuantKVCache backend in MLXLMCommon
Ported the active TurboQuant codec/runtime stack, including split-codec support for .5 bit widths
Extended GenerateParameters:
- kvBits is now Float?
- added KVQuantizationScheme
- added kvQuantizationScheme
Updated dynamic KV-cache quantization to recurse into nested CacheList contents
Preserved existing skips for unsupported cache types such as RotatingKVCache, MambaCache, and plain ArraysCache
Integrated TurboQuant into the shared attention/cache-update path
Patched model-specific attention implementations that bypass the shared helper, including GPTOSS and MiMoV2Flash
Kept the existing “no attention sinks on quantized caches” behavior for TurboQuant as well
Extended prompt-cache save/load to serialize and restore TurboQuantKVCache
Updated docs/examples to cover fractional kvBits and explicit quantization scheme selection

Tests

Added coverage for:

TurboQuant codec behavior
cache selection for fractional bits and explicit .turboQuant
nested cache quantization behavior
prompt-cache round-tripping for TurboQuantKVCache

N1k1tung · 2026-03-25T23:51:18Z

Runtime test execution is still blocked in this environment because MLX fails to load default.metallib, so I could not complete the full TurboQuant test run here.

kinda explains it was in codex sandbox =)

Add TurboQuant KV cache backend

1f160f2

Run swift-format

64a3e72

davidkoski added the swift-format Swift format failure in CI label Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TurboQuant KV cache backend#160

Add TurboQuant KV cache backend#160
timonharz wants to merge 2 commits intoml-explore:mainfrom
timonharz:codex/turboquant-mlx-vlm-858

timonharz commented Mar 25, 2026 •

edited

Loading

Uh oh!

N1k1tung commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

timonharz commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Tests

Uh oh!

N1k1tung commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

timonharz commented Mar 25, 2026 •

edited

Loading