Skip to content

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #25856

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n

CUDA: Optimize reduce_rows_f32 kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n #25856

Triggered via pull request August 7, 2025 12:31
Status Success
Total duration 1h 7m 43s
Artifacts

build.yml

on: pull_request
macOS-latest-cmake-arm64
13m 5s
macOS-latest-cmake-arm64
macOS-latest-cmake-x64
4m 46s
macOS-latest-cmake-x64
macOS-latest-cmake-arm64-webgpu
11m 57s
macOS-latest-cmake-arm64-webgpu
ubuntu-latest-llguidance
4m 5s
ubuntu-latest-llguidance
ubuntu-latest-cmake-rpc
2m 10s
ubuntu-latest-cmake-rpc
ubuntu-22-cmake-vulkan
1h 0m
ubuntu-22-cmake-vulkan
ubuntu-22-cmake-webgpu
2m 23s
ubuntu-22-cmake-webgpu
ubuntu-22-cmake-hip
16m 55s
ubuntu-22-cmake-hip
ubuntu-22-cmake-musa
15m 43s
ubuntu-22-cmake-musa
ubuntu-22-cmake-sycl
2m 31s
ubuntu-22-cmake-sycl
ubuntu-22-cmake-sycl-fp16
2m 13s
ubuntu-22-cmake-sycl-fp16
build-linux-cross  /  ubuntu-24-riscv64-cpu-cross
4m 36s
build-linux-cross / ubuntu-24-riscv64-cpu-cross
build-linux-cross  /  ubuntu-24-ppc64el-cpu-cross
4m 46s
build-linux-cross / ubuntu-24-ppc64el-cpu-cross
build-linux-cross  /  debian-13-loongarch64-cpu-cross
3m 11s
build-linux-cross / debian-13-loongarch64-cpu-cross
build-linux-cross  /  debian-13-loongarch64-vulkan-cross
8m 25s
build-linux-cross / debian-13-loongarch64-vulkan-cross
build-cmake-pkg  /  linux
3m 19s
build-cmake-pkg / linux
macOS-latest-cmake-ios
1m 16s
macOS-latest-cmake-ios
macOS-latest-cmake-tvos
1m 19s
macOS-latest-cmake-tvos
macOS-latest-cmake-visionos
1m 42s
macOS-latest-cmake-visionos
ubuntu-latest-cmake-cuda
10m 14s
ubuntu-latest-cmake-cuda
windows-latest-cmake-sycl
7m 51s
windows-latest-cmake-sycl
windows-latest-cmake-hip
26m 12s
windows-latest-cmake-hip
ios-xcode-build
18m 14s
ios-xcode-build
android-build
10m 13s
android-build
Matrix: macOS-latest-swift
Matrix: openEuler-latest-cmake-cann
Matrix: ubuntu-cpu-cmake
Matrix: ubuntu-latest-cmake-sanitizer
Matrix: windows-2022-cmake-cuda
Matrix: windows-latest-cmake
Matrix: windows-msys2
Fit to window
Zoom out
Zoom in

Annotations

11 warnings and 14 notices
macOS-latest-cmake-ios
Cache not found for keys: ccache-macOS-latest-cmake-ios-
macOS-latest-cmake-x64
curl 8.15.0 is already installed and up-to-date. To reinstall 8.15.0, run: brew reinstall curl
windows-msys2 (UCRT64, ucrt-x86_64, Release)
Cache not found for keys: ccache-windows-msys2-
macOS-latest-cmake-arm64-webgpu
curl 8.15.0 is already installed and up-to-date. To reinstall 8.15.0, run: brew reinstall curl
macOS-latest-cmake-arm64
curl 8.15.0 is already installed and up-to-date. To reinstall 8.15.0, run: brew reinstall curl
macOS-latest-cmake-tvos
Cache not found for keys: ccache-macOS-latest-cmake-tvos-
macOS-latest-swift (generic/platform=tvOS)
Cache not found for keys: ccache-macOS-latest-swift-
windows-msys2 (CLANG64, clang-x86_64, Release)
Cache not found for keys: ccache-windows-msys2-
macOS-latest-swift (generic/platform=macOS)
Cache not found for keys: ccache-macOS-latest-swift-
windows-latest-cmake (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD...
Cache not found for keys: ccache-windows-latest-cmake-vulkan-x64-
macOS-latest-swift (generic/platform=iOS)
Cache not found for keys: ccache-macOS-latest-swift-
macOS-latest-cmake-ios
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-cmake-ios
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-cmake-visionos
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-cmake-visionos
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-cmake-tvos
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-cmake-tvos
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-swift (generic/platform=tvOS)
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-swift (generic/platform=tvOS)
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-swift (generic/platform=macOS)
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-swift (generic/platform=macOS)
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-swift (generic/platform=iOS)
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
macOS-latest-swift (generic/platform=iOS)
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
ios-xcode-build
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520
ios-xcode-build
The macos-latest label will migrate to macOS 15 beginning August 4, 2025. For more information see https://github.com/actions/runner-images/issues/12520