-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Open
Labels
backend:AMDGPUgood first issuehttps://github.com/llvm/llvm-project/contributehttps://github.com/llvm/llvm-project/contributellvm:globalisel
Description
GISel fails to use the max3 (and probably min3) instruction on AMDGPU. Instead it uses a sequence of 2 max instructions.
SDISel gets this right.
I believe the AMDGPU port miss a port of the SITargetLowering::performMinMaxCombine
optimization.
To Reproduce
Download the attached IR or copy/paste it from below.
Then run:
llc -O3 -march=amdgcn -mcpu=gfx942 -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> repro.ll -o -
Result
GISel:
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_max_f32_e32 v0, v0, v0
v_max_f32_e32 v0, 0, v0
v_mov_b32_e32 v4, v1
v_mov_b32_e32 v5, v2
v_max_f32_e32 v0, 0, v0
flat_store_dword v[4:5], v0
s_waitcnt vmcnt(0) lgkmcnt(0)
s_setpc_b64 s[30:31]
SDISel:
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_mov_b32_e32 v3, v2
v_mov_b32_e32 v2, v1
v_max3_f32 v0, v0, 0, 0
flat_store_dword v[2:3], v0
s_waitcnt vmcnt(0) lgkmcnt(0)
s_setpc_b64 s[30:31]
GISel uses 3 max
instructions where SDISel manages to do the same thing with just one max3
instruction.
Note: The test case was automatically reduced hence the input values constants are not representative of the real workload.
Note
Input IR:
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <1 x float> @llvm.maxnum.v1f32(<1 x float>, <1 x float>) #0
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare float @llvm.maxnum.f32(float, float) #0
define void @foo.bb374(<1 x float> %i466, ptr %out) {
newFuncRoot:
%i497 = tail call <1 x float> @llvm.maxnum.v1f32(<1 x float> %i466, <1 x float> zeroinitializer)
%i503 = extractelement <1 x float> %i497, i64 0
%i507 = tail call float @llvm.maxnum.f32(float %i503, float 0.000000e+00)
store float %i507, ptr %out
ret void
}
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
Metadata
Metadata
Labels
backend:AMDGPUgood first issuehttps://github.com/llvm/llvm-project/contributehttps://github.com/llvm/llvm-project/contributellvm:globalisel