-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
Description
Name and Version
b5335 server
Operating systems
Linux
GGML backends
CUDA
Hardware
4070
Models
any (tested with Qwen3 8B)
Problem description & steps to reproduce
gibberish is generation when FA is turned on.
The problem goes away if making the following change in the cuda source file :
fattn-mma-f16.cuh
line 550 at b5335
//constexpr bool use_cp_async = nstages == 1;
constexpr bool use_cp_async = 0;
First Bad Commit
Unknown
Relevant log output
flash attention on:
bash-5.1$ lm Hello
郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦郦^Cbash-5.1$
flash attention off:
bash-5.1$
bash-5.1$
bash-5.1$
bash-5.1$ lm Hello
<think>
Okay, the user said "Hello". I need to respond appropriately. Since it's a greeting, I should acknowledge it and offer assistance. Let me keep it friendly and open-ended. Maybe ask how I can help them today. That way, they know I'm here to assist with any questions or tasks they might have. I should make sure the response is welcoming and not too formal. Let me check for any typos or errors. Alright, that should work.
</think>