Commit a0475b2
committed
Separates mask and bias memory operations in attention kernel
Refactors combined mask-bias memory operations into separate dedicated operations to improve performance and maintainability.
Introduces specialized copy functions for mask and bias operations with proper bounds checking and OR-reduction for mask activity detection.
Removes redundant synchronization points by leveraging built-in synchronization in the new copy functions.
Adds predicate tensor allocation for proper boundary handling in both regular and split-KV attention kernels.1 parent a148a3a commit a0475b2
1 file changed
+195
-136
lines changed
0 commit comments