CUDA Kernel Compatibility Error with Tesla V100 (Volta, sm_70) GPUs


I'm experiencing CUDA compatibility issues when trying to run KoboldCpp with Tesla V100 GPUs (Volta architecture, CUDA compute capability 7.0). Despite the log suggesting arch 700 is supported, the kernel fails to load.

**Environment**
Hardware: NVIDIA DGX node with 8x Tesla V100-SXM2-32GB GPUs (Volta architecture)
GPU Compute Capability: 7.0 (sm_70)
OS: Ubuntu 22.04.5 LTS
KoboldCpp versions tried: 1.84.2 and 1.83.1
CUDA version: 12.4.131 (from OpenCL info)

**Error Message**
When using mmq with CUDA, I get the following error:
> CopyERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compiled for: 500,520,530,600,610,620,700,720,750,800,860,870,890,900

This is confusing because the error message states that the binary was compiled for arch 700 (among others), yet it claims there's no compatible device code for arch 700.

**Steps to Reproduce**
- Configure KoboldCpp with CUDA backend using "usecublas": ["normal", "0", "mmq", "1"]
- Attempt to load a DeepSeek R1 model
- During inference, the error occurs repeatedly

**Additional Information**
- I've tried multiple context sizes from 4096 to 8192
- Setting nommq instead of mmq prevents the specific error, but causes very slow performance (2 tokens/sec)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Kernel Compatibility Error with Tesla V100 (Volta, sm_70) GPUs #1390

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA Kernel Compatibility Error with Tesla V100 (Volta, sm_70) GPUs #1390

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions