Bug: GGML_ASSERT when running quantized K cache on CUDA with no fa

### What happened?

I managed to reproduce the bug that was mentioned in #645 without a draft model at all. With my 3090 on Windows

`.\llama-server.exe -m "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf" -t 6 -ngl 99 -ctk q8_0`

### Name and Version

version: 3766 (cac763fc)
built with MSVC 19.28.29335.0 for x64

and 

version: 3852 (a694d7d0) AKA #645
built with MSVC 19.28.29335.0 for x64

### What operating system are you seeing the problem on?

Windows

### Relevant log output

```shell
ik_llama.cpp\ggml\src\ggml-cuda\mmvq.cu:595: GGML_ASSERT(src0->ne[2] == src1->ne[2] && src0->ne[2] == dst->ne[2]) failed
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: GGML_ASSERT when running quantized K cache on CUDA with no fa #679

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: GGML_ASSERT when running quantized K cache on CUDA with no fa #679

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions