Skip to content

Commit 8e4d703

Browse files
authored
Jacchang/pa ragged experimental (ROCm#1479)
Files: Upload experimental pa_ragged kernels and unit test Technical Details: 1. Added double buffering for K-cache loading. 2. Used 64 threads to load the continuous K-cache into LDS and then distributed the data to thread registers to match the MFMA layout. 3. Turn on non-temporal loads for KV cache.
1 parent 43b924a commit 8e4d703

File tree

5 files changed

+1726
-5
lines changed

5 files changed

+1726
-5
lines changed

0 commit comments

Comments
 (0)