Skip to content

[AMDGPU][GISel] Missing FMAX3 use #123079

@qcolombet

Description

@qcolombet

GISel fails to use the max3 (and probably min3) instruction on AMDGPU. Instead it uses a sequence of 2 max instructions.
SDISel gets this right.

I believe the AMDGPU port miss a port of the SITargetLowering::performMinMaxCombine optimization.

To Reproduce

Download the attached IR or copy/paste it from below.
Then run:

llc -O3 -march=amdgcn -mcpu=gfx942  -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> repro.ll -o -

repro.ll.txt

Result

GISel:

	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_max_f32_e32 v0, v0, v0
	v_max_f32_e32 v0, 0, v0
	v_mov_b32_e32 v4, v1
	v_mov_b32_e32 v5, v2
	v_max_f32_e32 v0, 0, v0
	flat_store_dword v[4:5], v0
	s_waitcnt vmcnt(0) lgkmcnt(0)
	s_setpc_b64 s[30:31]

SDISel:

	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_mov_b32_e32 v3, v2
	v_mov_b32_e32 v2, v1
	v_max3_f32 v0, v0, 0, 0
	flat_store_dword v[2:3], v0
	s_waitcnt vmcnt(0) lgkmcnt(0)
	s_setpc_b64 s[30:31]

GISel uses 3 max instructions where SDISel manages to do the same thing with just one max3 instruction.

Note: The test case was automatically reduced hence the input values constants are not representative of the real workload.

Note

Input IR:

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <1 x float> @llvm.maxnum.v1f32(<1 x float>, <1 x float>) #0

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare float @llvm.maxnum.f32(float, float) #0

define void @foo.bb374(<1 x float> %i466, ptr %out) {
newFuncRoot:
  %i497 = tail call <1 x float> @llvm.maxnum.v1f32(<1 x float> %i466, <1 x float> zeroinitializer)
  %i503 = extractelement <1 x float> %i497, i64 0
  %i507 = tail call float @llvm.maxnum.f32(float %i503, float 0.000000e+00)
  store float %i507, ptr %out
  ret void
}

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions