Skip to content

Commit 3845706

Browse files
committed
[fix](ggml-cuda): ensure min 1 block per SM
Some kernel configurations can produce zero occupancy on certain GPUs (example: RX 6700XT). This adds a safeguard to ensure at least one block is launched, preventing floating point exception. Co-authored-by: Attila Dusnoki <[email protected]>.
1 parent ababae7 commit 3845706

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

ggml/src/ggml-cuda/fattn-common.cuh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -895,6 +895,7 @@ void launch_fattn(
895895
const dim3 block_dim(warp_size, nwarps, 1);
896896
int max_blocks_per_sm = 1; // Max. number of active blocks limited by occupancy.
897897
CUDA_CHECK(cudaOccupancyMaxActiveBlocksPerMultiprocessor(&max_blocks_per_sm, fattn_kernel, block_dim.x * block_dim.y * block_dim.z, nbytes_shared));
898+
max_blocks_per_sm = std::max(max_blocks_per_sm, 1); // Safeguard, ensures at least one block can be launched.
898899
int parallel_blocks = max_blocks_per_sm;
899900

900901
dim3 blocks_num;

0 commit comments

Comments
 (0)