Skip to content

Compile bug: CUDA MUL_MAT crash on Blackwell (sm_121) - nvcc O3 optimization bug #18331

@AmesianX

Description

@AmesianX

Git commit

$ git log -1 --oneline
5ee4e43 (HEAD -> master, tag: b7524, origin/master, origin/HEAD) server: return_progress to also report 0% processing state (#18305)

$ git rev-parse HEAD
5ee4e43

Operating systems

Linux

GGML backends

CUDA

Problem description & steps to reproduce

Environment

  • GPU: NVIDIA GB10 (Blackwell)
  • Machine: NVIDIA DGX Spark
  • CUDA Architecture: sm_121
  • CUDA Toolkit: 13.0
  • llama.cpp: master branch (2025-12-24)
  • OS: Ubuntu Linux 24.04 (aarch64)
  • Quantization: MXFP4

Server Options

--embedding --pooling last
--parallel 30
-b 32768        # batch size
-ub 16384       # micro batch size (critical!)
-c 131072       # context size
--cont-batching
--defrag-thold 0.1

Symptoms

- CUDA error: an illegal memory access was encountered
- Crash in MUL_MAT (MMQ kernel) at ggml_cuda_compute_forward
- Crash occurs at ubatch boundary (batch.n_tokens = 16384)

Diagnosis

| Build Config                         | Result           |
|--------------------------------------|------------------|
| sm_121 + O3                          | Immediate crash  |
| sm_121 + O2                          | Occasional crash |
| sm_121 + O2 + CUDA_LAUNCH_BLOCKING=1 | Still crashes    |
| sm_89 + O2                           | Stable ✓         |

Key observations:
- CUDA_LAUNCH_BLOCKING=1 still crashes → NOT a race condition
- Adding assert() or printf() in MMQ kernel → crash disappears (compiler optimization affected)
- Building with sm_89 (Ada PTX fallback) → stable on Blackwell hardware

Root Cause

nvcc generates incorrect code for Blackwell architecture (sm_121), particularly in MMQ kernels with MXFP4 quantization. This appears to be a compiler optimization bug in CUDA Toolkit 13.0.

Workaround

Build with Ada architecture (PTX fallback for Blackwell):
-DCMAKE_CUDA_ARCHITECTURES=89

Related Issues

- #18310
- #18313

### First Bad Commit

_No response_

### Compile command

```shell
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:57:39_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0

# CUDA (CMAKE_CUDA_ARCHITECTURES)
# -----------------------------------------------
# 50  - Maxwell (GTX 900 series)
# 60  - Pascal (GTX 1000 series)
# 70  - Volta (V100)
# 75  - Turing (RTX 2000 series)
# 80  - Ampere (A100)
# 86  - Ampere (RTX 3000 series)
# 89  - Ada Lovelace (RTX 4000 series) - CUDA 11.8+
# 90a - Hopper (H100) - CUDA 11.8+
# 120 - Blackwell (RTX 5000 series) - CUDA 13.0+
# 121 - Blackwell (GH200) - CUDA 13.0+
# native - Automatic

# Crash-Prone Build (Avoid)
cmake -DCMAKE_BUILD_TYPE=Release \
    -DBUILD_SHARED_LIBS=ON \
    -DCMAKE_CUDA_ARCHITECTURES=121 \ # or 120
    -DGGML_VULKAN=OFF \
    -DGGML_DML=OFF \
    -DGGML_CUDA=ON \
    -DGGML_HIP=OFF \
    -DGGML_METAL=OFF \
    -DGGML_BLAS=ON \
    -DGGML_CCACHE=OFF \
    -DGGML_F16C=ON \
    -DGGML_FMA=ON \
    -DCMAKE_C_FLAGS="-O2" \
    -DCMAKE_EXE_LINKER_FLAGS="-lpthread -lm" \
    -DLLAMA_CURL=ON \
    -DLLAMA_BUILD_TESTS=OFF \
    -DLLAMA_BUILD_EXAMPLES=OFF \
    -DLLAMA_BUILD_SERVER=ON \
    -DCMAKE_VERBOSE_MAKEFILE=ON \
    ..

make -j12

# Safe Build (Recommended)
cmake -DCMAKE_BUILD_TYPE=Release \
    -DBUILD_SHARED_LIBS=ON \
    -DCMAKE_CUDA_ARCHITECTURES=89 \
    -DGGML_VULKAN=OFF \
    -DGGML_DML=OFF \
    -DGGML_CUDA=ON \
    -DCMAKE_CUDA_FLAGS="-O2"  # -O3 default crashes on Blackwell
    -DGGML_HIP=OFF \
    -DGGML_METAL=OFF \
    -DGGML_BLAS=ON \
    -DGGML_CCACHE=OFF \
    -DGGML_F16C=ON \
    -DGGML_FMA=ON \
    -DCMAKE_C_FLAGS="-O2" \
    -DCMAKE_EXE_LINKER_FLAGS="-lpthread -lm" \
    -DLLAMA_CURL=ON \
    -DLLAMA_BUILD_TESTS=OFF \
    -DLLAMA_BUILD_EXAMPLES=OFF \
    -DLLAMA_BUILD_SERVER=ON \
    -DCMAKE_VERBOSE_MAKEFILE=ON \
    ..

make -j12

Relevant log output

Related Issues

  - #18310
  - #18313

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions