Eval bug: b5335 break flash attention on 4070

### Name and Version

b5335 server


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

4070

### Models

any (tested with Qwen3 8B)

### Problem description & steps to reproduce

gibberish is generation when FA is turned on.

The problem goes away if making the following change in the cuda source file :

fattn-mma-f16.cuh

line 550 at b5335 

 //constexpr bool use_cp_async = nstages == 1;
   constexpr bool use_cp_async = 0;             



### First Bad Commit

Unknown

### Relevant log output

```shell
flash attention on:

bash-5.1$ lm Hello
郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦^Cbash-5.1$ 

flash attention off:

bash-5.1$ 
bash-5.1$ 
bash-5.1$ 
bash-5.1$ lm Hello
<think>
Okay, the user said "Hello". I need to respond appropriately. Since it's a greeting, I should acknowledge it and offer assistance. Let me keep it friendly and open-ended. Maybe ask how I can help them today. That way, they know I'm here to assist with any questions or tasks they might have. I should make sure the response is welcoming and not too formal. Let me check for any typos or errors. Alright, that should work.
</think>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: b5335 break flash attention on 4070 #13430

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: b5335 break flash attention on 4070 #13430

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions