support GQA and realize flash attention v2 with Causal masking #10

FoolyAndCooly · 2025-03-10T02:36:39Z

This PR implements GQA for flash attention. and implements flash attention v2 with Causal masking
I got these results on my RTX 1650 (flash attention v2)with
batch_size = 16
n_q_head = 16
n_kv_head = 8
seq_len = 256
head_embd = 64

FoolyAndCooly added 4 commits March 8, 2025 20:03

支持GQA,实现flash_v2

41dd5d1

优化冗余同步

ec06878

flash_v2 with Causal masking

3d17ef7

add v1 bench mark

8320993

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support GQA and realize flash attention v2 with Causal masking #10

support GQA and realize flash attention v2 with Causal masking #10

Uh oh!

FoolyAndCooly commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

support GQA and realize flash attention v2 with Causal masking #10

Are you sure you want to change the base?

support GQA and realize flash attention v2 with Causal masking #10

Uh oh!

Conversation

FoolyAndCooly commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant