Skip to content

Conversation

@dhruvachak
Copy link
Contributor

@dhruvachak dhruvachak commented Nov 22, 2025

Addresses SWDEV-523888.

The LIT tests have been generally updated in one of the following ways:

  • If no explicit amdgpu-tracker option was already present and the test was auto-generated, it has now been auto regenerated.
  • If it was not already auto regenerated, the explicit option -amdgpu-use-amdgpu-trackers=0 was added to avoid any other update.
  • If the tracker option was already present in the test, the option was updated to reflect the change in the default.

There was an exception to the above:

  • materialize-frame-index-sgpr.ll: This test uses inline assembly to limit the number of available registers for other instructions but GCN trackers do not account for physical registers, leading to out of registers error during RA. Hence GCN trackers have been disabled for this test.

Addresses SWDEV-523888.

The LIT tests have been generally updated in one of the following ways:
- If no explicit amdgpu-tracker option was already present and the test
  was auto-generated, it has now been auto regenerated.
- If it was not already auto regenerated, the explicit option
  -amdgpu-use-amdgpu-trackers=0 was added to avoid any other update.
- If the tracker option was already present in the test, the option was
  updated to reflect the change in the default.

There was an exception to the above:
- materialize-frame-index-sgpr.ll: This test uses inline assembly but
  GCN trackers do not account for physical registers, leading to out of
  registers error during RA. Hence GCN trackers have been disabled for
  this test.
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Dhruva Chakrabarti (dhruvachak)

Changes

Addresses SWDEV-523888.

The LIT tests have been generally updated in one of the following ways:

  • If no explicit amdgpu-tracker option was already present and the test was auto-generated, it has now been auto regenerated.
  • If it was not already auto regenerated, the explicit option -amdgpu-use-amdgpu-trackers=0 was added to avoid any other update.
  • If the tracker option was already present in the test, the option was updated to reflect the change in the default.

There was an exception to the above:

  • materialize-frame-index-sgpr.ll: This test uses inline assembly but GCN trackers do not account for physical registers, leading to out of registers error during RA. Hence GCN trackers have been disabled for this test.

Patch is 20.46 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169187.diff

127 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll (+105-105)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll (+45-45)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+231-231)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+245-245)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll (+80-80)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll (+138-138)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/load-uniform-in-vgpr.ll (+26-27)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll (+189-183)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll (+676-676)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll (+196-196)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll (+660-660)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll (+243-243)
  • (modified) llvm/test/CodeGen/AMDGPU/a-v-ds-atomicrmw.ll (+136-136)
  • (modified) llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll (+307-313)
  • (modified) llvm/test/CodeGen/AMDGPU/a-v-global-atomicrmw.ll (+307-313)
  • (modified) llvm/test/CodeGen/AMDGPU/add.ll (+32-32)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll (+80328-80770)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll (+130-132)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll (+11-10)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll (+3565-3543)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll (+155-154)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll (+1020-1030)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll (+135-133)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll (+332-328)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.448bit.ll (+292-293)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll (+10234-10445)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.576bit.ll (+808-804)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.640bit.ll (+2136-2131)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.704bit.ll (+2563-2568)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll (+2515-2541)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll (+3478-3523)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll (+4282-4333)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll (+5650-5665)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll (+28-28)
  • (modified) llvm/test/CodeGen/AMDGPU/av-split-dead-valno-crash.ll (+38-38)
  • (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+1197-1200)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll (+38-22)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-contents-legalization.ll (+65-61)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-memcpy.ll (+36-36)
  • (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+112-112)
  • (modified) llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir (+21-21)
  • (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+163-163)
  • (modified) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+1218-1218)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_permute_a_v.ll (+142-6)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_write2_a_v.ll (+138-138)
  • (modified) llvm/test/CodeGen/AMDGPU/extract-subvector.ll (+28-28)
  • (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-f16.ll (+27-26)
  • (modified) llvm/test/CodeGen/AMDGPU/fceil64.ll (+311-313)
  • (modified) llvm/test/CodeGen/AMDGPU/freeze.ll (+81-78)
  • (modified) llvm/test/CodeGen/AMDGPU/function-args.ll (+47-47)
  • (modified) llvm/test/CodeGen/AMDGPU/function-returns.ll (+237-237)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll (+268-233)
  • (modified) llvm/test/CodeGen/AMDGPU/half.ll (+205-201)
  • (modified) llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll (+49-82)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+480-485)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2bf16.ll (+112-112)
  • (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll (+101-101)
  • (modified) llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll (+114-114)
  • (modified) llvm/test/CodeGen/AMDGPU/live-interval-bug-in-rename-independent-subregs.mir (+38-34)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir (+808-807)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.small.mir (+361-361)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.single.2c.mir (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+10-12)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.tensor.load.store.ll (+87-172)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll (+39-40)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i1.ll (+2071-1745)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i16.ll (+1545-1550)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i32.ll (+415-409)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i64.ll (+48-47)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i8.ll (+1113-1112)
  • (modified) llvm/test/CodeGen/AMDGPU/load-global-i16.ll (+1662-1762)
  • (modified) llvm/test/CodeGen/AMDGPU/load-global-i32.ll (+649-789)
  • (modified) llvm/test/CodeGen/AMDGPU/load-global-i8.ll (+1634-1710)
  • (modified) llvm/test/CodeGen/AMDGPU/load-local-i16.ll (+2717-2924)
  • (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll (+12-8)
  • (modified) llvm/test/CodeGen/AMDGPU/memcpy-libcall.ll (+92-90)
  • (modified) llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll (+1281-1278)
  • (modified) llvm/test/CodeGen/AMDGPU/mfma-cd-select.ll (+30-36)
  • (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+9-9)
  • (modified) llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/packed-fp32.ll (+364-355)
  • (modified) llvm/test/CodeGen/AMDGPU/pr51516.mir (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+16-16)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+104-103)
  • (modified) llvm/test/CodeGen/AMDGPU/rem_i128.ll (+107-107)
  • (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll (+15-18)
  • (modified) llvm/test/CodeGen/AMDGPU/rsq.f64.ll (+108-106)
  • (modified) llvm/test/CodeGen/AMDGPU/sched-assert-dead-def-subreg-use-other-subreg.mir (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-subreg-def-across-subreg-def.mir (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/schedule-barrier.mir (+13-13)
  • (modified) llvm/test/CodeGen/AMDGPU/schedule-regpressure-ilp-metric-spills.mir (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/schedule-relaxed-occupancy.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/scratch-simple.ll (+644-616)
  • (modified) llvm/test/CodeGen/AMDGPU/sdiv.ll (+210-210)
  • (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/sema-v-unsched-bundle.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/shift-i128.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/shl.ll (+11-11)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v4i64.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll (+423-423)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2p0.v4p0.ll (+7-7)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v3i64.v4i64.ll (+69-69)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v3p0.v4p0.ll (+69-69)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll (+146-145)
  • (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll (+146-145)
  • (modified) llvm/test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-agpr.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/sra.ll (+31-31)
  • (modified) llvm/test/CodeGen/AMDGPU/srem.ll (+113-113)
  • (modified) llvm/test/CodeGen/AMDGPU/srl.ll (+11-11)
  • (modified) llvm/test/CodeGen/AMDGPU/udiv.ll (+20-20)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-mul.ll (+318-318)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-smax.ll (+82-82)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-smin.ll (+82-82)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-umax.ll (+82-82)
  • (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-umin.ll (+82-82)
  • (modified) llvm/test/CodeGen/AMDGPU/vni8-across-blocks.ll (+36-35)
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index c8ce3aab3f303..e03ce8c06fed5 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -67,7 +67,7 @@ static cl::opt<bool>
 static cl::opt<bool> GCNTrackers(
     "amdgpu-use-amdgpu-trackers", cl::Hidden,
     cl::desc("Use the AMDGPU specific RPTrackers during scheduling"),
-    cl::init(false));
+    cl::init(true));
 
 static cl::opt<unsigned> PendingQueueLimit(
     "amdgpu-scheduler-pending-queue-limit", cl::Hidden,
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
index b67080bd4798d..cb4db0bac4730 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
@@ -356,62 +356,62 @@ define void @addv_7i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addrs
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
 ; GFX8-NEXT:    flat_load_ushort v17, v[6:7]
 ; GFX8-NEXT:    flat_load_ushort v18, v[8:9]
-; GFX8-NEXT:    flat_load_ushort v19, v[10:11]
-; GFX8-NEXT:    flat_load_ushort v20, v[12:13]
-; GFX8-NEXT:    flat_load_ushort v21, v[14:15]
-; GFX8-NEXT:    flat_load_ushort v22, v[0:1]
+; GFX8-NEXT:    flat_load_ushort v10, v[10:11]
+; GFX8-NEXT:    flat_load_ushort v11, v[12:13]
+; GFX8-NEXT:    flat_load_ushort v12, v[14:15]
+; GFX8-NEXT:    flat_load_ushort v13, v[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 2, v2
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
 ; GFX8-NEXT:    v_add_u32_e32 v6, vcc, 4, v2
 ; GFX8-NEXT:    v_addc_u32_e32 v7, vcc, 0, v3, vcc
-; GFX8-NEXT:    v_add_u32_e32 v8, vcc, 6, v2
+; GFX8-NEXT:    flat_load_ushort v14, v[2:3]
+; GFX8-NEXT:    flat_load_ushort v15, v[0:1]
+; GFX8-NEXT:    flat_load_ushort v19, v[6:7]
+; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 6, v2
+; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT:    v_add_u32_e32 v6, vcc, 8, v2
+; GFX8-NEXT:    v_addc_u32_e32 v7, vcc, 0, v3, vcc
+; GFX8-NEXT:    v_add_u32_e32 v8, vcc, 10, v2
 ; GFX8-NEXT:    v_addc_u32_e32 v9, vcc, 0, v3, vcc
-; GFX8-NEXT:    v_add_u32_e32 v10, vcc, 8, v2
-; GFX8-NEXT:    v_addc_u32_e32 v11, vcc, 0, v3, vcc
-; GFX8-NEXT:    v_add_u32_e32 v12, vcc, 10, v2
-; GFX8-NEXT:    v_addc_u32_e32 v13, vcc, 0, v3, vcc
-; GFX8-NEXT:    v_add_u32_e32 v14, vcc, 12, v2
-; GFX8-NEXT:    v_addc_u32_e32 v15, vcc, 0, v3, vcc
-; GFX8-NEXT:    flat_load_ushort v2, v[2:3]
-; GFX8-NEXT:    flat_load_ushort v3, v[0:1]
+; GFX8-NEXT:    v_add_u32_e32 v2, vcc, 12, v2
+; GFX8-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GFX8-NEXT:    flat_load_ushort v20, v[0:1]
 ; GFX8-NEXT:    flat_load_ushort v6, v[6:7]
 ; GFX8-NEXT:    flat_load_ushort v7, v[8:9]
-; GFX8-NEXT:    flat_load_ushort v8, v[10:11]
-; GFX8-NEXT:    flat_load_ushort v9, v[12:13]
-; GFX8-NEXT:    flat_load_ushort v10, v[14:15]
+; GFX8-NEXT:    flat_load_ushort v2, v[2:3]
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 2, v4
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v5, vcc
 ; GFX8-NEXT:    s_waitcnt vmcnt(6)
-; GFX8-NEXT:    v_add_u16_e32 v2, v16, v2
+; GFX8-NEXT:    v_add_u16_e32 v3, v16, v14
 ; GFX8-NEXT:    s_waitcnt vmcnt(5)
-; GFX8-NEXT:    v_add_u16_e32 v3, v17, v3
-; GFX8-NEXT:    flat_store_short v[4:5], v2
-; GFX8-NEXT:    flat_store_short v[0:1], v3
+; GFX8-NEXT:    v_add_u16_e32 v8, v17, v15
+; GFX8-NEXT:    flat_store_short v[4:5], v3
+; GFX8-NEXT:    flat_store_short v[0:1], v8
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 4, v4
 ; GFX8-NEXT:    s_waitcnt vmcnt(6)
-; GFX8-NEXT:    v_add_u16_e32 v6, v18, v6
+; GFX8-NEXT:    v_add_u16_e32 v9, v18, v19
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT:    flat_store_short v[0:1], v6
+; GFX8-NEXT:    flat_store_short v[0:1], v9
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 6, v4
-; GFX8-NEXT:    s_waitcnt vmcnt(6)
-; GFX8-NEXT:    v_add_u16_e32 v7, v19, v7
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT:    flat_store_short v[0:1], v7
+; GFX8-NEXT:    s_waitcnt vmcnt(6)
+; GFX8-NEXT:    v_add_u16_e32 v10, v10, v20
+; GFX8-NEXT:    flat_store_short v[0:1], v10
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 8, v4
 ; GFX8-NEXT:    s_waitcnt vmcnt(6)
-; GFX8-NEXT:    v_add_u16_e32 v8, v20, v8
+; GFX8-NEXT:    v_add_u16_e32 v6, v11, v6
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT:    flat_store_short v[0:1], v8
+; GFX8-NEXT:    flat_store_short v[0:1], v6
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 10, v4
 ; GFX8-NEXT:    s_waitcnt vmcnt(6)
-; GFX8-NEXT:    v_add_u16_e32 v9, v21, v9
+; GFX8-NEXT:    v_add_u16_e32 v7, v12, v7
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT:    flat_store_short v[0:1], v9
+; GFX8-NEXT:    flat_store_short v[0:1], v7
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 12, v4
 ; GFX8-NEXT:    s_waitcnt vmcnt(6)
-; GFX8-NEXT:    v_add_u16_e32 v10, v22, v10
+; GFX8-NEXT:    v_add_u16_e32 v2, v13, v2
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT:    flat_store_short v[0:1], v10
+; GFX8-NEXT:    flat_store_short v[0:1], v2
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -513,29 +513,29 @@ define void @add_v9i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addrs
 ; GFX8-NEXT:    flat_load_dwordx4 v[10:13], v[2:3]
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 16, v0
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; GFX8-NEXT:    flat_load_ushort v14, v[0:1]
+; GFX8-NEXT:    flat_load_ushort v16, v[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 16, v2
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
 ; GFX8-NEXT:    flat_load_ushort v0, v[0:1]
+; GFX8-NEXT:    v_add_u32_e32 v14, vcc, 16, v4
+; GFX8-NEXT:    v_addc_u32_e32 v15, vcc, 0, v5, vcc
 ; GFX8-NEXT:    s_waitcnt vmcnt(2)
 ; GFX8-NEXT:    v_add_u16_e32 v1, v6, v10
 ; GFX8-NEXT:    v_add_u16_sdwa v2, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
 ; GFX8-NEXT:    v_add_u16_e32 v3, v7, v11
-; GFX8-NEXT:    v_add_u16_sdwa v10, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u16_e32 v11, v8, v12
+; GFX8-NEXT:    v_add_u16_sdwa v6, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_add_u16_e32 v7, v8, v12
 ; GFX8-NEXT:    v_add_u16_sdwa v8, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u16_e32 v12, v9, v13
+; GFX8-NEXT:    v_add_u16_e32 v10, v9, v13
 ; GFX8-NEXT:    v_add_u16_sdwa v9, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u32_e32 v6, vcc, 16, v4
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_u16_e32 v13, v14, v0
+; GFX8-NEXT:    v_add_u16_e32 v11, v16, v0
 ; GFX8-NEXT:    v_or_b32_e32 v0, v1, v2
-; GFX8-NEXT:    v_or_b32_e32 v1, v3, v10
-; GFX8-NEXT:    v_or_b32_e32 v2, v11, v8
-; GFX8-NEXT:    v_or_b32_e32 v3, v12, v9
-; GFX8-NEXT:    v_addc_u32_e32 v7, vcc, 0, v5, vcc
+; GFX8-NEXT:    v_or_b32_e32 v1, v3, v6
+; GFX8-NEXT:    v_or_b32_e32 v2, v7, v8
+; GFX8-NEXT:    v_or_b32_e32 v3, v10, v9
 ; GFX8-NEXT:    flat_store_dwordx4 v[4:5], v[0:3]
-; GFX8-NEXT:    flat_store_short v[6:7], v13
+; GFX8-NEXT:    flat_store_short v[14:15], v11
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -661,55 +661,55 @@ define void @add_v11i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addr
 ; GFX8-LABEL: add_v11i16:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    v_add_u32_e32 v14, vcc, 16, v0
+; GFX8-NEXT:    v_addc_u32_e32 v15, vcc, 0, v1, vcc
+; GFX8-NEXT:    v_add_u32_e32 v16, vcc, 18, v0
+; GFX8-NEXT:    v_addc_u32_e32 v17, vcc, 0, v1, vcc
 ; GFX8-NEXT:    flat_load_dwordx4 v[6:9], v[0:1]
-; GFX8-NEXT:    flat_load_dwordx4 v[10:13], v[2:3]
-; GFX8-NEXT:    v_add_u32_e32 v14, vcc, 16, v2
-; GFX8-NEXT:    v_addc_u32_e32 v15, vcc, 0, v3, vcc
-; GFX8-NEXT:    v_add_u32_e32 v16, vcc, 18, v2
-; GFX8-NEXT:    v_addc_u32_e32 v17, vcc, 0, v3, vcc
-; GFX8-NEXT:    v_add_u32_e32 v2, vcc, 20, v2
-; GFX8-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT:    flat_load_ushort v14, v[14:15]
-; GFX8-NEXT:    flat_load_ushort v15, v[16:17]
-; GFX8-NEXT:    flat_load_ushort v16, v[2:3]
-; GFX8-NEXT:    v_add_u32_e32 v2, vcc, 16, v0
-; GFX8-NEXT:    v_addc_u32_e32 v3, vcc, 0, v1, vcc
-; GFX8-NEXT:    s_waitcnt vmcnt(3)
-; GFX8-NEXT:    v_add_u16_e32 v17, v6, v10
-; GFX8-NEXT:    v_add_u16_sdwa v10, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u32_e32 v6, vcc, 18, v0
-; GFX8-NEXT:    v_add_u16_e32 v18, v7, v11
-; GFX8-NEXT:    v_add_u16_sdwa v11, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_addc_u32_e32 v7, vcc, 0, v1, vcc
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 20, v0
-; GFX8-NEXT:    flat_load_ushort v2, v[2:3]
-; GFX8-NEXT:    flat_load_ushort v3, v[6:7]
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; GFX8-NEXT:    flat_load_ushort v21, v[0:1]
-; GFX8-NEXT:    v_add_u32_e32 v6, vcc, 16, v4
-; GFX8-NEXT:    v_addc_u32_e32 v7, vcc, 0, v5, vcc
-; GFX8-NEXT:    v_add_u16_e32 v19, v8, v12
+; GFX8-NEXT:    flat_load_dwordx4 v[10:13], v[2:3]
+; GFX8-NEXT:    flat_load_ushort v18, v[14:15]
+; GFX8-NEXT:    flat_load_ushort v16, v[16:17]
+; GFX8-NEXT:    flat_load_ushort v17, v[0:1]
+; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 16, v2
+; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT:    v_add_u32_e32 v14, vcc, 18, v2
+; GFX8-NEXT:    v_addc_u32_e32 v15, vcc, 0, v3, vcc
+; GFX8-NEXT:    flat_load_ushort v19, v[0:1]
+; GFX8-NEXT:    flat_load_ushort v20, v[14:15]
+; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 20, v2
+; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT:    flat_load_ushort v0, v[0:1]
+; GFX8-NEXT:    v_add_u32_e32 v14, vcc, 16, v4
+; GFX8-NEXT:    v_addc_u32_e32 v15, vcc, 0, v5, vcc
+; GFX8-NEXT:    s_waitcnt vmcnt(6)
+; GFX8-NEXT:    v_add_u16_e32 v1, v6, v10
+; GFX8-NEXT:    v_add_u16_sdwa v2, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_add_u32_e32 v6, vcc, 18, v4
+; GFX8-NEXT:    v_add_u16_e32 v3, v7, v11
+; GFX8-NEXT:    v_add_u16_sdwa v10, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_add_u16_e32 v11, v8, v12
 ; GFX8-NEXT:    v_add_u16_sdwa v12, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u32_e32 v8, vcc, 18, v4
-; GFX8-NEXT:    v_add_u16_e32 v20, v9, v13
+; GFX8-NEXT:    v_add_u16_e32 v21, v9, v13
 ; GFX8-NEXT:    v_add_u16_sdwa v13, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_addc_u32_e32 v9, vcc, 0, v5, vcc
-; GFX8-NEXT:    v_or_b32_e32 v0, v17, v10
-; GFX8-NEXT:    v_or_b32_e32 v1, v18, v11
-; GFX8-NEXT:    v_add_u32_e32 v10, vcc, 20, v4
-; GFX8-NEXT:    v_addc_u32_e32 v11, vcc, 0, v5, vcc
+; GFX8-NEXT:    v_addc_u32_e32 v7, vcc, 0, v5, vcc
+; GFX8-NEXT:    v_add_u32_e32 v8, vcc, 20, v4
 ; GFX8-NEXT:    s_waitcnt vmcnt(2)
-; GFX8-NEXT:    v_add_u16_e32 v14, v2, v14
+; GFX8-NEXT:    v_add_u16_e32 v18, v18, v19
 ; GFX8-NEXT:    s_waitcnt vmcnt(1)
-; GFX8-NEXT:    v_add_u16_e32 v15, v3, v15
-; GFX8-NEXT:    v_or_b32_e32 v2, v19, v12
-; GFX8-NEXT:    v_or_b32_e32 v3, v20, v13
+; GFX8-NEXT:    v_add_u16_e32 v16, v16, v20
+; GFX8-NEXT:    v_addc_u32_e32 v9, vcc, 0, v5, vcc
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
-; GFX8-NEXT:    v_add_u16_e32 v16, v21, v16
+; GFX8-NEXT:    v_add_u16_e32 v17, v17, v0
+; GFX8-NEXT:    v_or_b32_e32 v0, v1, v2
+; GFX8-NEXT:    v_or_b32_e32 v1, v3, v10
+; GFX8-NEXT:    v_or_b32_e32 v2, v11, v12
+; GFX8-NEXT:    v_or_b32_e32 v3, v21, v13
 ; GFX8-NEXT:    flat_store_dwordx4 v[4:5], v[0:3]
-; GFX8-NEXT:    flat_store_short v[6:7], v14
-; GFX8-NEXT:    flat_store_short v[8:9], v15
-; GFX8-NEXT:    flat_store_short v[10:11], v16
+; GFX8-NEXT:    flat_store_short v[14:15], v18
+; GFX8-NEXT:    flat_store_short v[6:7], v16
+; GFX8-NEXT:    flat_store_short v[8:9], v17
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -794,34 +794,34 @@ define void @add_v12i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addr
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    flat_load_dwordx4 v[6:9], v[0:1]
 ; GFX8-NEXT:    flat_load_dwordx4 v[10:13], v[2:3]
-; GFX8-NEXT:    v_add_u32_e32 v2, vcc, 16, v2
-; GFX8-NEXT:    v_addc_u32_e32 v3, vcc, 0, v3, vcc
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 16, v0
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; GFX8-NEXT:    flat_load_dwordx2 v[14:15], v[2:3]
-; GFX8-NEXT:    s_waitcnt vmcnt(1)
-; GFX8-NEXT:    v_add_u16_e32 v2, v6, v10
-; GFX8-NEXT:    v_add_u16_sdwa v3, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u16_e32 v10, v7, v11
-; GFX8-NEXT:    v_add_u16_sdwa v11, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    flat_load_dwordx2 v[6:7], v[0:1]
-; GFX8-NEXT:    v_add_u16_e32 v16, v8, v12
-; GFX8-NEXT:    v_add_u16_sdwa v8, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u16_e32 v12, v9, v13
+; GFX8-NEXT:    flat_load_dwordx2 v[14:15], v[0:1]
+; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 16, v2
+; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT:    flat_load_dwordx2 v[16:17], v[0:1]
+; GFX8-NEXT:    s_waitcnt vmcnt(2)
+; GFX8-NEXT:    v_add_u16_e32 v0, v6, v10
+; GFX8-NEXT:    v_add_u16_sdwa v1, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_add_u16_e32 v2, v7, v11
+; GFX8-NEXT:    v_add_u16_sdwa v3, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_add_u16_e32 v6, v8, v12
+; GFX8-NEXT:    v_add_u16_sdwa v7, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_add_u16_e32 v8, v9, v13
 ; GFX8-NEXT:    v_add_u16_sdwa v9, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_or_b32_e32 v0, v2, v3
-; GFX8-NEXT:    v_or_b32_e32 v1, v10, v11
-; GFX8-NEXT:    v_or_b32_e32 v2, v16, v8
-; GFX8-NEXT:    v_or_b32_e32 v3, v12, v9
+; GFX8-NEXT:    v_or_b32_e32 v0, v0, v1
+; GFX8-NEXT:    v_or_b32_e32 v1, v2, v3
+; GFX8-NEXT:    v_or_b32_e32 v2, v6, v7
+; GFX8-NEXT:    v_or_b32_e32 v3, v8, v9
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
+; GFX8-NEXT:    v_add_u16_e32 v6, v14, v16
+; GFX8-NEXT:    v_add_u16_sdwa v7, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_add_u16_e32 v8, v15, v17
+; GFX8-NEXT:    v_add_u16_sdwa v9, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
 ; GFX8-NEXT:    flat_store_dwordx4 v[4:5], v[0:3]
-; GFX8-NEXT:    s_waitcnt vmcnt(1)
-; GFX8-NEXT:    v_add_u16_e32 v8, v6, v14
-; GFX8-NEXT:    v_add_u16_sdwa v6, v6, v14 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT:    v_add_u16_e32 v9, v7, v15
-; GFX8-NEXT:    v_add_u16_sdwa v7, v7, v15 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT:    v_or_b32_e32 v6, v6, v7
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 16, v4
-; GFX8-NEXT:    v_or_b32_e32 v6, v8, v6
-; GFX8-NEXT:    v_or_b32_e32 v7, v9, v7
+; GFX8-NEXT:    v_or_b32_e32 v7, v8, v9
 ; GFX8-NEXT:    v_addc_u32_e32 v1, vcc, 0, v5, vcc
 ; GFX8-NEXT:    flat_store_dwordx2 v[0:1], v[6:7]
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll
index 6ea0a9446ff9d..fc42801fd3642 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll
@@ -711,8 +711,8 @@ define <4 x double> @test_f64_add_mul(<4 x double> %a, <4 x double> %b, <4 x dou
 ; GFX9-CONTRACT-NEXT:    v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
 ; GFX9-CONTRACT-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-CONTRACT-NEXT:    v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25]
-; GFX9-CONTRACT-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-CONTRACT-NEXT:    buffer_load_dword v24, off, s[0:3], s32 offset:28
+; GFX9-CONTRACT-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-CONTRACT-NEXT:    buffer_load_dword v25, off, s[0:3], s32 offset:32
 ; GFX9-CONTRACT-NEXT:    v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21]
 ; GFX9-CONTRACT-NEXT:    s_waitcnt vmcnt(0)
@@ -737,8 +737,8 @@ define <4 x double> @test_f64_add_mul(<4 x double> %a, <4 x double> %b, <4 x dou
 ; GFX9-DENORM-NEXT:    v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
 ; GFX9-DENORM-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-DENORM-NEXT:    v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25]
-; GFX9-DENORM-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-DENORM-NEXT:    buffer_load_dword v24, off, s[0:3], s32 offset:28
+; GFX9-DENORM-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-DENORM-NEXT:    buffer_load_dword v25, off, s[0:3], s32 offset:32
 ; GFX9-DENORM-NEXT:    v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21]
 ; GFX9-DENORM-NEXT:    s_waitcnt vmcnt(0)
@@ -882,8 +882,8 @@ define <4 x double> @test_f64_add_mul_rhs(<4 x double> %a, <4 x double> %b, <4 x
 ; GFX9-CONTRACT-NEXT:    v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
 ; GFX9-CONTRACT-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-CONTRACT-NEXT:    v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25]
-; GFX9-CONTRACT-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-CONTRACT-NEXT:    buffer_load_dword v24, off, s[0:3], s32 offset:28
+; GFX9-CONTRACT-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-CONTRACT-NEXT:    buffer_load_dword v25, off, s[0:3], s32 offset:32
 ; GFX9-CONTRACT-NEXT:    v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21]
 ; GFX9-CONTRACT-NEXT:    s_waitcnt vmcnt(0)
@@ -908,8 +908,8 @@ define <4 x double> @test_f64_add_mul_rhs(<4 x double> %a, <4 x double> %b, <4 x
 ; GFX9-DENORM-NEXT:    v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
 ; GFX9-DENORM-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-DENORM-NEXT:    v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25]
-; GFX9-DENORM-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-DENORM-NEXT:    buffer_load_dword v24, off, s[0:3], s32 offset:28
+; GFX9-DENORM-NEXT:    buffer_load_dword v31, off, s[0:3], s32
 ; GFX9-DENORM-NEXT:    buffer_load_dword v25, off, s[0:3], s32 offset:32
 ; GFX9-DENORM-NEXT:    v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21]
 ; GFX9-DENORM-NEXT:    s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll
index ea149cc2f4a9e..f6396be103ae5 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll
@@ -717,9 +717,9 @@ define <2 x double> @v_fdiv_v2f64(<2 x double> %a, <2 x double> %b) {
 ; GFX6-NEXT:    v_div_scale_f64 v[18:19], s[4:5], v[0:1], v[4:5], v[0:1]
 ; GFX6-NEXT:    v_rcp_f64_e32 v[16:17], v[14:15]
 ; GFX6-NEXT:    v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
-; GFX6-NEXT:    v_cmp_eq_u32_e32 vcc, v1, v19
-; GFX6-NEXT:    v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
 ; GFX6-NEXT:    v_cmp_eq_u32_e64 s[4:5], v5, v9
+; GFX6-NEXT:    v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
+; GFX6-NEXT:    v_cmp_eq_u32_e32 vcc, v1, v19
 ; GFX6-NEXT:    v_fma_f64 v[12:13], -v[8:9], v[10:11], 1.0
 ; GFX6-NEXT:    s_xor_b64 vcc, vcc, s[4:5]
 ; GFX6-NEXT:    v_fma_f64 v[10:11], v[10:11], v[12:13], v[10:11]
@@ -727,9 +727,9 @@ define <2 x double> @v_fdiv_v2f64(<2 x double> %a, <2 x double> %b) {
 ; GFX6-NEXT:    v_cmp_eq_u32_e64 s[4:5], v7, v15
 ; GFX6-NEXT:    v_fma_f64 v[12:13], v[16:17], v[12:13], v[16:17]
 ; GFX6-NEXT:    v_mul_f64 v[16:17], v[18:19], v[10:11]
-; GFX6-NEXT:    v_fma_f64 v[18:19], -v[8:9], v[16:17], v[18:19]
+; GFX6-NEXT:    v_fma_f64 v[20:21], -v[8:9], v[16:17], v[18:19]
 ; GFX6-NEXT:    v_fma_f64 v[8:9], -v[14:15], v[12:13], 1.0
-; GFX6-NEXT:    v_div_fmas_f64 v[10:11], v[18:19], v[10:11], v[16:17]
+; GFX6-NEXT:    v_div_fmas_f64 v[10:11], v[20:21], v[10:11], v[16:17]
 ; GFX6-NEXT:    v_fma_f64 v[8:9], v[12:13], v[8:9], v[12:13]
 ; GFX6-NEXT:    v_div_scale_f64 v[12:13], s[6:7], v[2:3], v[6:7], v[2:3]
 ; GFX6-NEXT:    v_div_fixup_f64 v[0:1], v[10:11], v[4:5], v[0:1]
@@ -950,9 +950,9 @@ define <2 x double>...
[truncated]

@jayfoad
Copy link
Contributor

jayfoad commented Nov 24, 2025

  • materialize-frame-index-sgpr.ll: This test uses inline assembly but GCN trackers do not account for physical registers, leading to out of registers error during RA. Hence GCN trackers have been disabled for this test.

What about real world code that uses inline assembly - will that also fail to compile after this change?

@dhruvachak
Copy link
Contributor Author

  • materialize-frame-index-sgpr.ll: This test uses inline assembly but GCN trackers do not account for physical registers, leading to out of registers error during RA. Hence GCN trackers have been disabled for this test.

What about real world code that uses inline assembly - will that also fail to compile after this change?

Real world code should be fine. This test intentionally limits available registers which usually won't happen for real code. Other LIT tests have inline asm and they compile fine.

I updated my comment in the summary to reflect this aspect.

@arsenm
Copy link
Contributor

arsenm commented Nov 24, 2025

  • materialize-frame-index-sgpr.ll: This test uses inline assembly but GCN trackers do not account for physical registers, leading to out of registers error during RA. Hence GCN trackers have been disabled for this test.

This is a showstopper. The scheduler cannot make physical register constraints worse than in the original MIR if it cannot guarantee the function will be allocatable after

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants