Skip to content

CUDA: Optimize rms_norm_f32 kernel and its fused variants, giving 1-6% perf E2E#15715

Merged
JohannesGaessler merged 11 commits intoggml-org:masterfrom
ORippler:osimons/optimize_fused_rms_norm_f32
Sep 3, 2025
Merged

CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E#15715
JohannesGaessler merged 11 commits intoggml-org:masterfrom
ORippler:osimons/optimize_fused_rms_norm_f32

Commits

Commits on Sep 1, 2025

Commits on Sep 2, 2025

Commits on Sep 3, 2025