Skip to content

CUDA: Prefer vector flash decoding kernel for Gemma models#12738

Merged
JohannesGaessler merged 2 commits intoggml-org:masterfrom
gaugarg-nv:gemma_flash_attention
Apr 3, 2025
Merged

CUDA: Prefer vector flash decoding kernel for Gemma models#12738
JohannesGaessler merged 2 commits intoggml-org:masterfrom
gaugarg-nv:gemma_flash_attention

Commits

Commits on Apr 3, 2025