Skip to content

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #14513

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #14513

Triggered via pull request August 7, 2025 12:31
@ORipplerORippler
synchronize #15132
Status Success
Total duration 14s
Artifacts

labeler.yml

on: pull_request_target
labeler
9s
labeler
Fit to window
Zoom out
Zoom in