Skip to content

CUDA Kernel Compatibility Error with Tesla V100 (Volta, sm_70) GPUs #1390

@deepseven

Description

@deepseven

I'm experiencing CUDA compatibility issues when trying to run KoboldCpp with Tesla V100 GPUs (Volta architecture, CUDA compute capability 7.0). Despite the log suggesting arch 700 is supported, the kernel fails to load.

Environment
Hardware: NVIDIA DGX node with 8x Tesla V100-SXM2-32GB GPUs (Volta architecture)
GPU Compute Capability: 7.0 (sm_70)
OS: Ubuntu 22.04.5 LTS
KoboldCpp versions tried: 1.84.2 and 1.83.1
CUDA version: 12.4.131 (from OpenCL info)

Error Message
When using mmq with CUDA, I get the following error:

CopyERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compiled for: 500,520,530,600,610,620,700,720,750,800,860,870,890,900

This is confusing because the error message states that the binary was compiled for arch 700 (among others), yet it claims there's no compatible device code for arch 700.

Steps to Reproduce

  • Configure KoboldCpp with CUDA backend using "usecublas": ["normal", "0", "mmq", "1"]
  • Attempt to load a DeepSeek R1 model
  • During inference, the error occurs repeatedly

Additional Information

  • I've tried multiple context sizes from 4096 to 8192
  • Setting nommq instead of mmq prevents the specific error, but causes very slow performance (2 tokens/sec)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions