Skip to content

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #14625

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #14625

Triggered via pull request August 11, 2025 13:29
@ORipplerORippler
synchronize #15132
Status Success
Total duration 12m 9s
Artifacts

labeler.yml

on: pull_request_target
labeler
7s
labeler
Fit to window
Zoom out
Zoom in