fix: disable bf16 WMMA kernels on pre-Ampere GPUs#3367
Open
asglover wants to merge 1 commit intohuggingface:mainfrom
Open
fix: disable bf16 WMMA kernels on pre-Ampere GPUs#3367asglover wants to merge 1 commit intohuggingface:mainfrom
asglover wants to merge 1 commit intohuggingface:mainfrom
Conversation
bf16 WMMA fragment types (nv_bfloat16 with nvcuda::wmma) are only supported on sm_80+ (Ampere and later). On older architectures like sm_75 (Turing/T4), compiling these fragments produces "incomplete type" errors. moe_wmma_gguf.cu already had #ifndef NO_BF16_KERNEL guards but moe_wmma.cu was missing them, and the build script never passed the -DNO_BF16_KERNEL define. This commit: - Adds matching #ifndef NO_BF16_KERNEL guards to moe_wmma.cu - Updates build.rs to detect compute capability via cudaforge and pass -DNO_BF16_KERNEL when building for GPUs with compute cap < 80
Author
|
This is a claude generated PR, but it does fix the lack of guards on the new MOE kernels. I'm happy to pull it or modify it to make it mergable and up to your standards. It have tested that it allows for candle-kernels to be built on GitHub runners. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3366
#ifndef NO_BF16_KERNELguards tomoe_wmma.cu—moe_wmma_gguf.cualready had these guards butmoe_wmma.cuwas missing them, causing compilation failures on pre-Ampere GPUs-DNO_BF16_KERNELinbuild.rs— detect compute capability viacudaforge::detect_compute_cap()and pass the define when compute cap < 80Problem
bf16 WMMA fragment types (
nv_bfloat16withnvcuda::wmma) require compute capability >= 8.0 (Ampere). On sm_75 (Turing/T4) and older GPUs, compiling these fragments produces "incomplete type" errors:The
NO_BF16_KERNELpreprocessor guard was already present inmoe_wmma_gguf.cubut:moe_wmma.cuwas missing the guard entirelyTest plan