Commit a0475b2

committed

Separates mask and bias memory operations in attention kernel

Refactors combined mask-bias memory operations into separate dedicated operations to improve performance and maintainability. Introduces specialized copy functions for mask and bias operations with proper bounds checking and OR-reduction for mask activity detection. Removes redundant synchronization points by leveraging built-in synchronization in the new copy functions. Adds predicate tensor allocation for proper boundary handling in both regular and split-KV attention kernels.

1 parent a148a3a commit a0475b2Copy full SHA for a0475b2

1 file changed

+195

-136

lines changed

csrc/flash_dmattn/src
- flash_fwd_kernel.h

1 file changed

+195

-136

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit a0475b2

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments