Feature Request: Way to use on Tesla P40

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Is there a way to use this code on legacy Tesla P40?

### Motivation

Is there a way to use this repo on a old Tesla P40? I tried to deactivate flash attention and use:
cmake -B build ^
  -DGGML_CUDA=ON ^
  -DGGML_BLAS=OFF ^
  -DGGML_CUDA_ARCH=61 ^
  -DGGML_CUDA_GRAPH=OFF ^
  -DGGML_CUDA_FORCE_MMQ=OFF ^
  -DGGML_CUDA_DMMV_X=32 ^
  -DGGML_CUDA_MMQ_ENABLE=OFF ^

according to Chat GPT, is there a way to compile it for old devices?

I only get CUDA errors like:
CUDA error: an illegal memory access was encountered
  current device: 0, in function launch_mul_mat_q at D:\ik_llama.cpp\ggml\src\ggml-cuda\template-instances\../mmq.cuh:4008
  cudaFuncSetAttribute(mul_mat_q<type, mmq_x, 8, false>, cudaFuncAttributeMaxDynamicSharedMemorySize, shmem)
D:\ik_llama.cpp\ggml\src\ggml-cuda.cu:110: CUDA error

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Way to use on Tesla P40 #644

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: Way to use on Tesla P40 #644

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions