Skip to content

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n#15132

Merged
JohannesGaessler merged 15 commits intoggml-org:masterfrom
ORippler:osimons/optimize_reduce_rows_f32
Aug 13, 2025
Merged

CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n#15132
JohannesGaessler merged 15 commits intoggml-org:masterfrom
ORippler:osimons/optimize_reduce_rows_f32

Commits

Commits on Aug 7, 2025

Commits on Aug 11, 2025

Commits on Aug 12, 2025