CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n#15132
Merged
JohannesGaessler merged 15 commits intoggml-org:masterfrom Aug 13, 2025
Commits
Commits on Aug 7, 2025
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed