Skip to content

[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size #12317

[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size

[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size #12317

Triggered via pull request June 12, 2025 11:55
@jukofyorkjukofyork
synchronize #13529
Status Success
Total duration 10s
Artifacts

labeler.yml

on: pull_request_target
labeler
6s
labeler
Fit to window
Zoom out
Zoom in