Skip to content

[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size#13529

Closed
jukofyork wants to merge 1 commit intoggml-org:masterfrom
jukofyork:mla-fa-disable-v-cache
Closed

[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size#13529
jukofyork wants to merge 1 commit intoggml-org:masterfrom
jukofyork:mla-fa-disable-v-cache

Commits

Commits on Jun 12, 2025