[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size#13529
Closed
jukofyork wants to merge 1 commit intoggml-org:masterfrom
jukofyork:mla-fa-disable-v-cache
Closed
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size#13529jukofyork wants to merge 1 commit intoggml-org:masterfrom jukofyork:mla-fa-disable-v-cache
jukofyork wants to merge 1 commit intoggml-org:masterfrom
jukofyork:mla-fa-disable-v-cache
Commits
Commits on Jun 12, 2025
- committed