llama : enable flash attn automatically when supported (WIP) #10101

slaren · 2024-10-30T22:32:13Z

Currently, the CUDA backend supports_op function for flash attention is too inaccurate. Until this is fixed, I don't think this can be reliably implemented.

llama : enable flash attn automatically when supported

afc4a7d

slaren mentioned this pull request Aug 19, 2025

llama: use FA + max. GPU layers by default #15434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : enable flash attn automatically when supported (WIP) #10101

llama : enable flash attn automatically when supported (WIP) #10101

Uh oh!

slaren commented Oct 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

llama : enable flash attn automatically when supported (WIP) #10101

Are you sure you want to change the base?

llama : enable flash attn automatically when supported (WIP) #10101

Uh oh!

Conversation

slaren commented Oct 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant