CUDA: Optimize rms_norm_f32 kernel and its fused variants, giving 1-6% perf E2E#15715
Merged
JohannesGaessler merged 11 commits intoggml-org:masterfrom Sep 3, 2025
Merged
Commits
Commits on Sep 1, 2025
Commits on Sep 2, 2025
Commits on Sep 3, 2025
- andauthored
- committed
- andauthored
- committed
- committed
- committed