Skip to content

Commit 9f0ba94

Browse files
jrbyrnesarsenm
authored andcommitted
[AMDGPU] Allocate AVRegClass last (llvm#146606)
This changes the RC priorities such that AVRegClass is the least prioritized. These registers are less constrained than the VRegClass and ARegClass as they can be either agpr or vgpr. Thus, assigning them last removes unnecessary constraints from VRegClass and ARegClass assignments, and allows the RA to make smarter decisions about whether to use vgpr / agpr for AVRegClass. We only have 5 bits for RC priorities, and we still want to prioritize larger RCs over smaller ones. Since this new prioritization uses the 5th bit for AVRegClass vs ARegClass / VRegClass, we only have 4 bits to encode the size priorities. Previously, each RC with a distinct size, had a distinct priority. However, this PR groups together multiple sizes to the same priority. Currently, this will have no effect on prioritization in practice because we only have one actually defined RC per group per vector register type. For example, a register class with 15 or 16 32bit registers will have the same size priority (14). However, we only have VReg_512 (VReg_480 doesn't exist), so only one actual RC in VRegClass has this priority. Similarly, we give register class with 17-32+ 32 bit registers a size priority of 15, but we only have VReg_1024. The effect of this PR is to prioritize first the vector register type (VReg & Areg have top priority, then AVReg), with the size of the register class having second priority. Passes PSDB. --------- Co-authored-by: Matt Arsenault <[email protected]> Change-Id: Ia78c720445346c9800549c05f5b85b32434f2b0d
1 parent e427f3e commit 9f0ba94

7 files changed

+1407
-455
lines changed

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx90a.ll

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -829,6 +829,7 @@ define amdgpu_kernel void @test_mfma_f64_16x16x4f64(ptr addrspace(1) %arg, doubl
829829
; GFX942-NEXT: global_store_dwordx4 v0, a[4:7], s[8:9] offset:16
830830
; GFX942-NEXT: global_store_dwordx4 v0, a[0:3], s[8:9]
831831
; GFX942-NEXT: s_endpgm
832+
<<<<<<< HEAD
832833
;
833834
; GFX90A-VGPR-LABEL: test_mfma_f64_16x16x4f64:
834835
; GFX90A-VGPR: ; %bb.0: ; %bb
@@ -877,6 +878,8 @@ define amdgpu_kernel void @test_mfma_f64_16x16x4f64(ptr addrspace(1) %arg, doubl
877878
; GFX942-VGPR-NEXT: global_store_dwordx4 v8, v[4:7], s[8:9] offset:16
878879
; GFX942-VGPR-NEXT: global_store_dwordx4 v8, v[0:3], s[8:9]
879880
; GFX942-VGPR-NEXT: s_endpgm
881+
=======
882+
>>>>>>> 9819653bbb9b ([AMDGPU] Allocate AVRegClass last (#146606))
880883
bb:
881884
%in.1 = load <4 x double>, ptr addrspace(1) %arg
882885
%mai.1 = tail call <4 x double> @llvm.amdgcn.mfma.f64.16x16x4f64(double %a, double %b, <4 x double> %in.1, i32 1, i32 2, i32 3)
@@ -1672,6 +1675,7 @@ define amdgpu_kernel void @test_mfma_f64_16x16x4f64_imm(ptr addrspace(1) %arg, d
16721675
; GFX942-NEXT: global_store_dwordx4 v0, a[4:7], s[0:1] offset:16
16731676
; GFX942-NEXT: global_store_dwordx4 v0, a[0:3], s[0:1]
16741677
; GFX942-NEXT: s_endpgm
1678+
<<<<<<< HEAD
16751679
;
16761680
; GFX90A-VGPR-LABEL: test_mfma_f64_16x16x4f64_imm:
16771681
; GFX90A-VGPR: ; %bb.0: ; %bb
@@ -1730,6 +1734,8 @@ define amdgpu_kernel void @test_mfma_f64_16x16x4f64_imm(ptr addrspace(1) %arg, d
17301734
; GFX942-VGPR-NEXT: global_store_dwordx4 v0, v[6:9], s[0:1] offset:16
17311735
; GFX942-VGPR-NEXT: global_store_dwordx4 v0, v[2:5], s[0:1]
17321736
; GFX942-VGPR-NEXT: s_endpgm
1737+
=======
1738+
>>>>>>> 9819653bbb9b ([AMDGPU] Allocate AVRegClass last (#146606))
17331739
bb:
17341740
%mai.1 = tail call <4 x double> @llvm.amdgcn.mfma.f64.16x16x4f64(double %a, double %b, <4 x double> <double 0.0, double 0.0, double 0.0, double 1.0>, i32 0, i32 0, i32 0)
17351741
store <4 x double> %mai.1, ptr addrspace(1) %arg
@@ -1790,6 +1796,7 @@ define amdgpu_kernel void @test_mfma_f64_16x16x4f64_splat_lit(ptr addrspace(1) %
17901796
; GFX942-NEXT: global_store_dwordx4 v0, a[4:7], s[0:1] offset:16
17911797
; GFX942-NEXT: global_store_dwordx4 v0, a[0:3], s[0:1]
17921798
; GFX942-NEXT: s_endpgm
1799+
<<<<<<< HEAD
17931800
;
17941801
; GFX90A-VGPR-LABEL: test_mfma_f64_16x16x4f64_splat_lit:
17951802
; GFX90A-VGPR: ; %bb.0: ; %bb
@@ -1848,6 +1855,8 @@ define amdgpu_kernel void @test_mfma_f64_16x16x4f64_splat_lit(ptr addrspace(1) %
18481855
; GFX942-VGPR-NEXT: global_store_dwordx4 v0, v[6:9], s[0:1] offset:16
18491856
; GFX942-VGPR-NEXT: global_store_dwordx4 v0, v[2:5], s[0:1]
18501857
; GFX942-VGPR-NEXT: s_endpgm
1858+
=======
1859+
>>>>>>> 9819653bbb9b ([AMDGPU] Allocate AVRegClass last (#146606))
18511860
bb:
18521861
%mai.1 = tail call <4 x double> @llvm.amdgcn.mfma.f64.16x16x4f64(double %a, double %b, <4 x double> <double 123.0, double 123.0, double 123.0, double 123.0>, i32 0, i32 0, i32 0)
18531862
store <4 x double> %mai.1, ptr addrspace(1) %arg

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.bf16.ll

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,7 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_bf16__vgprcd(<8 x bfloat> %arg
252252
; GCN-NEXT: s_load_dwordx8 s[24:31], s[4:5], 0x24
253253
; GCN-NEXT: s_load_dwordx16 s[8:23], s[4:5], 0x64
254254
; GCN-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0xa4
255+
<<<<<<< HEAD
255256
; GCN-NEXT: v_mov_b32_e32 v44, 0
256257
; GCN-NEXT: s_waitcnt lgkmcnt(0)
257258
; GCN-NEXT: v_mov_b64_e32 v[34:35], s[26:27]
@@ -301,6 +302,64 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_bf16__vgprcd(<8 x bfloat> %arg
301302
; GCN-NEXT: global_store_dwordx4 v44, v[0:3], s[0:1] sc0 sc1
302303
; GCN-NEXT: s_waitcnt vmcnt(0)
303304
; GCN-NEXT: global_store_dwordx4 v44, v[4:7], s[0:1] offset:16 sc0 sc1
305+
=======
306+
; GCN-NEXT: v_mov_b32_e32 v8, 0
307+
; GCN-NEXT: s_waitcnt lgkmcnt(0)
308+
; GCN-NEXT: v_mov_b64_e32 v[0:1], s[24:25]
309+
; GCN-NEXT: v_mov_b64_e32 v[2:3], s[26:27]
310+
; GCN-NEXT: v_mov_b64_e32 v[4:5], s[28:29]
311+
; GCN-NEXT: v_accvgpr_write_b32 a31, s23
312+
; GCN-NEXT: v_mov_b64_e32 v[6:7], s[30:31]
313+
; GCN-NEXT: v_accvgpr_write_b32 a30, s22
314+
; GCN-NEXT: v_accvgpr_write_b32 a29, s21
315+
; GCN-NEXT: v_accvgpr_write_b32 a28, s20
316+
; GCN-NEXT: v_accvgpr_write_b32 a27, s19
317+
; GCN-NEXT: v_accvgpr_write_b32 a26, s18
318+
; GCN-NEXT: v_accvgpr_write_b32 a25, s17
319+
; GCN-NEXT: v_accvgpr_write_b32 a24, s16
320+
; GCN-NEXT: v_accvgpr_write_b32 a23, s15
321+
; GCN-NEXT: v_accvgpr_write_b32 a22, s14
322+
; GCN-NEXT: v_accvgpr_write_b32 a21, s13
323+
; GCN-NEXT: v_accvgpr_write_b32 a20, s12
324+
; GCN-NEXT: v_accvgpr_write_b32 a19, s11
325+
; GCN-NEXT: v_accvgpr_write_b32 a18, s10
326+
; GCN-NEXT: v_accvgpr_write_b32 a17, s9
327+
; GCN-NEXT: v_accvgpr_write_b32 a16, s8
328+
; GCN-NEXT: v_mov_b32_e32 v10, s20
329+
; GCN-NEXT: v_mov_b32_e32 v11, s21
330+
; GCN-NEXT: v_mfma_f32_32x32x16_bf16 a[0:15], v[0:3], v[4:7], a[16:31]
331+
; GCN-NEXT: v_mov_b32_e32 v12, s22
332+
; GCN-NEXT: v_mov_b32_e32 v13, s23
333+
; GCN-NEXT: v_mov_b32_e32 v0, s16
334+
; GCN-NEXT: v_mov_b32_e32 v1, s17
335+
; GCN-NEXT: v_mov_b32_e32 v2, s18
336+
; GCN-NEXT: v_mov_b32_e32 v3, s19
337+
; GCN-NEXT: global_store_dwordx4 v8, v[10:13], s[0:1] offset:48 sc0 sc1
338+
; GCN-NEXT: s_waitcnt vmcnt(0)
339+
; GCN-NEXT: global_store_dwordx4 v8, v[0:3], s[0:1] offset:32 sc0 sc1
340+
; GCN-NEXT: s_waitcnt vmcnt(0)
341+
; GCN-NEXT: s_nop 0
342+
; GCN-NEXT: v_mov_b32_e32 v0, s12
343+
; GCN-NEXT: v_mov_b32_e32 v1, s13
344+
; GCN-NEXT: v_mov_b32_e32 v2, s14
345+
; GCN-NEXT: v_mov_b32_e32 v3, s15
346+
; GCN-NEXT: global_store_dwordx4 v8, v[0:3], s[0:1] offset:16 sc0 sc1
347+
; GCN-NEXT: s_waitcnt vmcnt(0)
348+
; GCN-NEXT: s_nop 0
349+
; GCN-NEXT: v_mov_b32_e32 v0, s8
350+
; GCN-NEXT: v_mov_b32_e32 v1, s9
351+
; GCN-NEXT: v_mov_b32_e32 v2, s10
352+
; GCN-NEXT: v_mov_b32_e32 v3, s11
353+
; GCN-NEXT: global_store_dwordx4 v8, v[0:3], s[0:1] sc0 sc1
354+
; GCN-NEXT: s_waitcnt vmcnt(0)
355+
; GCN-NEXT: global_store_dwordx4 v8, a[8:11], s[0:1] offset:32 sc0 sc1
356+
; GCN-NEXT: s_waitcnt vmcnt(0)
357+
; GCN-NEXT: global_store_dwordx4 v8, a[12:15], s[0:1] offset:48 sc0 sc1
358+
; GCN-NEXT: s_waitcnt vmcnt(0)
359+
; GCN-NEXT: global_store_dwordx4 v8, a[0:3], s[0:1] sc0 sc1
360+
; GCN-NEXT: s_waitcnt vmcnt(0)
361+
; GCN-NEXT: global_store_dwordx4 v8, a[4:7], s[0:1] offset:16 sc0 sc1
362+
>>>>>>> 9819653bbb9b ([AMDGPU] Allocate AVRegClass last (#146606))
304363
; GCN-NEXT: s_waitcnt vmcnt(0)
305364
; GCN-NEXT: s_endpgm
306365
%result = call <16 x float> @llvm.amdgcn.mfma.f32.32x32x16.bf16(<8 x bfloat> %arg0, <8 x bfloat> %arg1, <16 x float> %arg2, i32 0, i32 0, i32 0)
@@ -315,6 +374,7 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_bf16__vgprcd__flags(<8 x bfloa
315374
; GCN-NEXT: s_load_dwordx8 s[24:31], s[4:5], 0x24
316375
; GCN-NEXT: s_load_dwordx16 s[8:23], s[4:5], 0x64
317376
; GCN-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0xa4
377+
<<<<<<< HEAD
318378
; GCN-NEXT: v_mov_b32_e32 v44, 0
319379
; GCN-NEXT: s_waitcnt lgkmcnt(0)
320380
; GCN-NEXT: v_mov_b64_e32 v[34:35], s[26:27]
@@ -364,6 +424,64 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_bf16__vgprcd__flags(<8 x bfloa
364424
; GCN-NEXT: global_store_dwordx4 v44, v[0:3], s[0:1] sc0 sc1
365425
; GCN-NEXT: s_waitcnt vmcnt(0)
366426
; GCN-NEXT: global_store_dwordx4 v44, v[4:7], s[0:1] offset:16 sc0 sc1
427+
=======
428+
; GCN-NEXT: v_mov_b32_e32 v8, 0
429+
; GCN-NEXT: s_waitcnt lgkmcnt(0)
430+
; GCN-NEXT: v_mov_b64_e32 v[0:1], s[24:25]
431+
; GCN-NEXT: v_mov_b64_e32 v[2:3], s[26:27]
432+
; GCN-NEXT: v_mov_b64_e32 v[4:5], s[28:29]
433+
; GCN-NEXT: v_accvgpr_write_b32 a31, s23
434+
; GCN-NEXT: v_mov_b64_e32 v[6:7], s[30:31]
435+
; GCN-NEXT: v_accvgpr_write_b32 a30, s22
436+
; GCN-NEXT: v_accvgpr_write_b32 a29, s21
437+
; GCN-NEXT: v_accvgpr_write_b32 a28, s20
438+
; GCN-NEXT: v_accvgpr_write_b32 a27, s19
439+
; GCN-NEXT: v_accvgpr_write_b32 a26, s18
440+
; GCN-NEXT: v_accvgpr_write_b32 a25, s17
441+
; GCN-NEXT: v_accvgpr_write_b32 a24, s16
442+
; GCN-NEXT: v_accvgpr_write_b32 a23, s15
443+
; GCN-NEXT: v_accvgpr_write_b32 a22, s14
444+
; GCN-NEXT: v_accvgpr_write_b32 a21, s13
445+
; GCN-NEXT: v_accvgpr_write_b32 a20, s12
446+
; GCN-NEXT: v_accvgpr_write_b32 a19, s11
447+
; GCN-NEXT: v_accvgpr_write_b32 a18, s10
448+
; GCN-NEXT: v_accvgpr_write_b32 a17, s9
449+
; GCN-NEXT: v_accvgpr_write_b32 a16, s8
450+
; GCN-NEXT: v_mov_b32_e32 v10, s20
451+
; GCN-NEXT: v_mov_b32_e32 v11, s21
452+
; GCN-NEXT: v_mfma_f32_32x32x16_bf16 a[0:15], v[0:3], v[4:7], a[16:31] cbsz:1 abid:2 blgp:3
453+
; GCN-NEXT: v_mov_b32_e32 v12, s22
454+
; GCN-NEXT: v_mov_b32_e32 v13, s23
455+
; GCN-NEXT: v_mov_b32_e32 v0, s16
456+
; GCN-NEXT: v_mov_b32_e32 v1, s17
457+
; GCN-NEXT: v_mov_b32_e32 v2, s18
458+
; GCN-NEXT: v_mov_b32_e32 v3, s19
459+
; GCN-NEXT: global_store_dwordx4 v8, v[10:13], s[0:1] offset:48 sc0 sc1
460+
; GCN-NEXT: s_waitcnt vmcnt(0)
461+
; GCN-NEXT: global_store_dwordx4 v8, v[0:3], s[0:1] offset:32 sc0 sc1
462+
; GCN-NEXT: s_waitcnt vmcnt(0)
463+
; GCN-NEXT: s_nop 0
464+
; GCN-NEXT: v_mov_b32_e32 v0, s12
465+
; GCN-NEXT: v_mov_b32_e32 v1, s13
466+
; GCN-NEXT: v_mov_b32_e32 v2, s14
467+
; GCN-NEXT: v_mov_b32_e32 v3, s15
468+
; GCN-NEXT: global_store_dwordx4 v8, v[0:3], s[0:1] offset:16 sc0 sc1
469+
; GCN-NEXT: s_waitcnt vmcnt(0)
470+
; GCN-NEXT: s_nop 0
471+
; GCN-NEXT: v_mov_b32_e32 v0, s8
472+
; GCN-NEXT: v_mov_b32_e32 v1, s9
473+
; GCN-NEXT: v_mov_b32_e32 v2, s10
474+
; GCN-NEXT: v_mov_b32_e32 v3, s11
475+
; GCN-NEXT: global_store_dwordx4 v8, v[0:3], s[0:1] sc0 sc1
476+
; GCN-NEXT: s_waitcnt vmcnt(0)
477+
; GCN-NEXT: global_store_dwordx4 v8, a[8:11], s[0:1] offset:32 sc0 sc1
478+
; GCN-NEXT: s_waitcnt vmcnt(0)
479+
; GCN-NEXT: global_store_dwordx4 v8, a[12:15], s[0:1] offset:48 sc0 sc1
480+
; GCN-NEXT: s_waitcnt vmcnt(0)
481+
; GCN-NEXT: global_store_dwordx4 v8, a[0:3], s[0:1] sc0 sc1
482+
; GCN-NEXT: s_waitcnt vmcnt(0)
483+
; GCN-NEXT: global_store_dwordx4 v8, a[4:7], s[0:1] offset:16 sc0 sc1
484+
>>>>>>> 9819653bbb9b ([AMDGPU] Allocate AVRegClass last (#146606))
367485
; GCN-NEXT: s_waitcnt vmcnt(0)
368486
; GCN-NEXT: s_endpgm
369487
%result = call <16 x float> @llvm.amdgcn.mfma.f32.32x32x16.bf16(<8 x bfloat> %arg0, <8 x bfloat> %arg1, <16 x float> %arg2, i32 1, i32 2, i32 3)

0 commit comments

Comments
 (0)