Commit 8e4d703
authored
Jacchang/pa ragged experimental (ROCm#1479)
Files: Upload experimental pa_ragged kernels and unit test
Technical Details:
1. Added double buffering for K-cache loading.
2. Used 64 threads to load the continuous K-cache into LDS and then distributed the data to thread registers to match the MFMA layout.
3. Turn on non-temporal loads for KV cache.1 parent 43b924a commit 8e4d703
File tree
5 files changed
+1726
-5
lines changed- csrc/cpp_itfs/pa
- op_tests
5 files changed
+1726
-5
lines changed
0 commit comments