Skip to content

Conversation

@broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented May 21, 2025

select vgpr16 for asm inline 16bit vreg in true16 mode

@broxigarchen broxigarchen changed the title 16bit for asm inline reg [AMDGPU][True16][CodeGen] select vgpr16 for asm inline 16bit reg May 21, 2025
@broxigarchen broxigarchen marked this pull request as ready for review May 21, 2025 18:49
@broxigarchen broxigarchen requested review from Sisyph and rampitec May 21, 2025 18:49
@broxigarchen broxigarchen requested review from arsenm and kosarev May 21, 2025 18:49
@llvmbot
Copy link
Member

llvmbot commented May 21, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

select vgpr16 for asm inline 16bit reg


Full diff: https://github.com/llvm/llvm-project/pull/140946.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+4-1)
  • (added) llvm/test/CodeGen/AMDGPU/inlineasm-16-fake16.ll (+48)
  • (added) llvm/test/CodeGen/AMDGPU/inlineasm-16-true16.ll (+48)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index ba7e11a853347..a4b62454e782d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16062,7 +16062,10 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
     case 'v':
       switch (BitWidth) {
       case 16:
-        RC = &AMDGPU::VGPR_32RegClass;
+        if (Subtarget->useRealTrue16Insts())
+          RC = &AMDGPU::VGPR_16RegClass;
+        else
+          RC = &AMDGPU::VGPR_32RegClass;
         break;
       default:
         RC = TRI->getVGPRClassForBitWidth(BitWidth);
diff --git a/llvm/test/CodeGen/AMDGPU/inlineasm-16-fake16.ll b/llvm/test/CodeGen/AMDGPU/inlineasm-16-fake16.ll
new file mode 100644
index 0000000000000..0f268c796c695
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inlineasm-16-fake16.ll
@@ -0,0 +1,48 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=-real-true16 -verify-machineinstrs < %s 2>&1 | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
+
+; GFX11-LABEL: {{^}}s_input_output_i16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_i16() #0 {
+  %v = tail call i16 asm sideeffect "s_mov_b32 $0, -1", "=s"()
+  tail call void asm sideeffect "; use $0", "s"(i16 %v) #0
+  ret void
+}
+
+; GFX11-LABEL: {{^}}s_input_output_f16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_f16() #0 {
+  %v = tail call half asm sideeffect "s_mov_b32 $0, -1", "=s"() #0
+  tail call void asm sideeffect "; use $0", "s"(half %v)
+  ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_f16:
+; GFX11: v_mov_b32 v[[REG:[0-9]+]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_f16() #0 {
+  %v = tail call half asm sideeffect "v_mov_b32 $0, -1", "=v"() #0
+  tail call void asm sideeffect "; use $0", "v"(half %v)
+  ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_i16:
+; GFX11: v_mov_b32 v[[REG:[0-9]+]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_i16() #0 {
+  %v = tail call i16 asm sideeffect "v_mov_b32 $0, -1", "=v"() #0
+  tail call void asm sideeffect "; use $0", "v"(i16 %v)
+  ret void
+}
+
+; GFX11-LABEL: {{^}}i16_imm_input_phys_vgpr:
+; GFX11: v_mov_b32_e32 v0, 0xffff
+; GFX11: ; use v0
+define amdgpu_kernel void @i16_imm_input_phys_vgpr() {
+entry:
+  call void asm sideeffect "; use $0 ", "{v0}"(i16 65535)
+  ret void
+}
+
+attributes #0 = { nounwind }
diff --git a/llvm/test/CodeGen/AMDGPU/inlineasm-16-true16.ll b/llvm/test/CodeGen/AMDGPU/inlineasm-16-true16.ll
new file mode 100644
index 0000000000000..908fb840e8d2c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inlineasm-16-true16.ll
@@ -0,0 +1,48 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 -verify-machineinstrs < %s 2>&1 | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
+
+; GFX11-LABEL: {{^}}s_input_output_i16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_i16() #0 {
+  %v = tail call i16 asm sideeffect "s_mov_b32 $0, -1", "=s"()
+  tail call void asm sideeffect "; use $0", "s"(i16 %v) #0
+  ret void
+}
+
+; GFX11-LABEL: {{^}}s_input_output_f16:
+; GFX11: s_mov_b32 s[[REG:[0-9]+]], -1
+; GFX11: ; use s[[REG]]
+define amdgpu_kernel void @s_input_output_f16() #0 {
+  %v = tail call half asm sideeffect "s_mov_b32 $0, -1", "=s"() #0
+  tail call void asm sideeffect "; use $0", "s"(half %v)
+  ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_f16:
+; GFX11: v_mov_b16 v[[REG:[0-9]+.(l|h)]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_f16() #0 {
+  %v = tail call half asm sideeffect "v_mov_b16 $0, -1", "=v"() #0
+  tail call void asm sideeffect "; use $0", "v"(half %v)
+  ret void
+}
+
+; GFX11-LABEL: {{^}}v_input_output_i16:
+; GFX11: v_mov_b16 v[[REG:[0-9]+.(l|h)]], -1
+; GFX11: ; use v[[REG]]
+define amdgpu_kernel void @v_input_output_i16() #0 {
+  %v = tail call i16 asm sideeffect "v_mov_b16 $0, -1", "=v"() #0
+  tail call void asm sideeffect "; use $0", "v"(i16 %v)
+  ret void
+}
+
+; GFX11-LABEL: {{^}}i16_imm_input_phys_vgpr:
+; GFX11: v_mov_b16_e32 v0.l, -1
+; GFX11: ; use v0
+define amdgpu_kernel void @i16_imm_input_phys_vgpr() {
+entry:
+  call void asm sideeffect "; use $0 ", "{v0.l}"(i16 65535)
+  ret void
+}
+
+attributes #0 = { nounwind }

@broxigarchen broxigarchen force-pushed the main-fix-asm-inline-1 branch from 47427c7 to d77dc1f Compare May 21, 2025 18:50
@broxigarchen broxigarchen changed the title [AMDGPU][True16][CodeGen] select vgpr16 for asm inline 16bit reg [AMDGPU][True16][CodeGen] select vgpr16 for asm inline 16bit vreg May 21, 2025
@@ -0,0 +1,48 @@
; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=+real-true16 -verify-machineinstrs < %s 2>&1 | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be separate files? Also don't need -verify-machineinstrs, or to redirect stderr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v_mov_b16 vs v_mov_b32 in asm

; GFX11: ; use v0.l
define amdgpu_kernel void @i16_imm_input_phys_vgpr() {
entry:
call void asm sideeffect "; use $0 ", "{v0.l}"(i16 65535)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add one more test with v0.h?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we need a constraint to specify specifically an l or h for a virtual register?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a .h case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we need a constraint to specify specifically an l or h for a virtual register?

I do not think it is practically needed. At least it is not needed for correctness at this point.

@broxigarchen broxigarchen force-pushed the main-fix-asm-inline-1 branch from d77dc1f to f3a655a Compare May 21, 2025 20:22
@broxigarchen broxigarchen force-pushed the main-fix-asm-inline-1 branch from f3a655a to c1b6d35 Compare May 21, 2025 20:27
Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

; GFX11: ; use v0.l
define amdgpu_kernel void @i16_imm_input_phys_vgpr() {
entry:
call void asm sideeffect "; use $0 ", "{v0.l}"(i16 65535)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we need a constraint to specify specifically an l or h for a virtual register?

I do not think it is practically needed. At least it is not needed for correctness at this point.

@@ -0,0 +1,48 @@
; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -mattr=-real-true16 < %s | FileCheck -enable-var-scope -check-prefixes=GFX11 %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you auto generate check lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@broxigarchen broxigarchen merged commit 7f62800 into llvm:main May 21, 2025
8 of 10 checks passed
@@ -16062,7 +16062,8 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
case 'v':
switch (BitWidth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch seems overkill here. Could just handle it with:

  if (BitWidth == 16 && !Subtarget->useRealTrue16Insts())
    BitWidth = 32;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even move the handling for the BitWidth == 16 case inside getVGPRClassForBitWidth?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me we need constraints for the aligned and unaligned versions of register classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants