Commit fe45cb1
committed
[AMDGPU] Identify vector idiom to unlock SROA
HIP vector types often lower to aggregates and get copied with memcpy.
When the source or destination is chosen via a pointer select, SROA
cannot split the aggregate. This keeps data in stack slots and increases
scratch traffic. By rewriting these memcpy idioms, we enable SROA to
promote values, reducing stack usage and improving occupancy and
bandwidth on AMD GPUs.
For example:
%p = select i1 %cond, ptr %A, ptr %B
call void @llvm.memcpy.p0.p0.i32(ptr %dst, ptr %p, i32 16, i1 false)
When the source is a pointer select and conditions allow, the pass
replaces the memcpy with two aligned loads, a value-level select of the
loaded vector, and one aligned store. If it is not safe to speculate
both loads, it splits control flow and emits a memcpy in each arm. When
the destination is a select, it always splits control flow to avoid
speculative stores. Vector element types are chosen based on size and
minimum proven alignment to minimize the number of operations.
The pass handles non-volatile, constant-length memcpy up to a small size
cap. Source and destination must be in the same address space. It runs
early, after inlining and before InferAddressSpaces and SROA. Volatile
and cross-address-space memcpys are skipped.
The size cap is controlled by -amdgpu-vector-idiom-max-bytes (default
32), allowing tuning for different workloads.
Fixes: SWDEV-5501341 parent f4087f6 commit fe45cb1
File tree
6 files changed
+965
-0
lines changed- llvm
- lib/Target/AMDGPU
- test/CodeGen/AMDGPU
6 files changed
+965
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
| 73 | + | |
72 | 74 | | |
73 | 75 | | |
74 | 76 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
849 | 850 | | |
850 | 851 | | |
851 | 852 | | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
852 | 859 | | |
853 | 860 | | |
854 | 861 | | |
| |||
911 | 918 | | |
912 | 919 | | |
913 | 920 | | |
| 921 | + | |
| 922 | + | |
914 | 923 | | |
915 | 924 | | |
916 | 925 | | |
| |||
0 commit comments