-
Notifications
You must be signed in to change notification settings - Fork 580
Description
I'm experiencing CUDA compatibility issues when trying to run KoboldCpp with Tesla V100 GPUs (Volta architecture, CUDA compute capability 7.0). Despite the log suggesting arch 700 is supported, the kernel fails to load.
Environment
Hardware: NVIDIA DGX node with 8x Tesla V100-SXM2-32GB GPUs (Volta architecture)
GPU Compute Capability: 7.0 (sm_70)
OS: Ubuntu 22.04.5 LTS
KoboldCpp versions tried: 1.84.2 and 1.83.1
CUDA version: 12.4.131 (from OpenCL info)
Error Message
When using mmq with CUDA, I get the following error:
CopyERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compiled for: 500,520,530,600,610,620,700,720,750,800,860,870,890,900
This is confusing because the error message states that the binary was compiled for arch 700 (among others), yet it claims there's no compatible device code for arch 700.
Steps to Reproduce
- Configure KoboldCpp with CUDA backend using "usecublas": ["normal", "0", "mmq", "1"]
- Attempt to load a DeepSeek R1 model
- During inference, the error occurs repeatedly
Additional Information
- I've tried multiple context sizes from 4096 to 8192
- Setting nommq instead of mmq prevents the specific error, but causes very slow performance (2 tokens/sec)