-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Is there a way to use this code on legacy Tesla P40?
Motivation
Is there a way to use this repo on a old Tesla P40? I tried to deactivate flash attention and use:
cmake -B build ^
-DGGML_CUDA=ON ^
-DGGML_BLAS=OFF ^
-DGGML_CUDA_ARCH=61 ^
-DGGML_CUDA_GRAPH=OFF ^
-DGGML_CUDA_FORCE_MMQ=OFF ^
-DGGML_CUDA_DMMV_X=32 ^
-DGGML_CUDA_MMQ_ENABLE=OFF ^
according to Chat GPT, is there a way to compile it for old devices?
I only get CUDA errors like:
CUDA error: an illegal memory access was encountered
current device: 0, in function launch_mul_mat_q at D:\ik_llama.cpp\ggml\src\ggml-cuda\template-instances../mmq.cuh:4008
cudaFuncSetAttribute(mul_mat_q<type, mmq_x, 8, false>, cudaFuncAttributeMaxDynamicSharedMemorySize, shmem)
D:\ik_llama.cpp\ggml\src\ggml-cuda.cu:110: CUDA error
Possible Implementation
No response