Skip to content

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #29034

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #29034

Triggered via pull request August 7, 2025 12:31
Status Success
Total duration 18s
Artifacts

editorconfig.yml

on: pull_request
editorconfig
14s
editorconfig
Fit to window
Zoom out
Zoom in