Skip to content

Conversation

@GrumpyPigSkin
Copy link
Contributor

Added missing checks for casting half to/from i16 with global-isel enabled.

Please let me know if there is a better place for the tests to live and if you want additional checks for other sizes as I could not find any tests for these.

@RKSimon please could you review :)

Fixes #166557

@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2025

@llvm/pr-subscribers-backend-x86

Author: None (GrumpyPigSkin)

Changes

Added missing checks for casting half to/from i16 with global-isel enabled.

Please let me know if there is a better place for the tests to live and if you want additional checks for other sizes as I could not find any tests for these.

@RKSimon please could you review :)

Fixes #166557


Full diff: https://github.com/llvm/llvm-project/pull/168456.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+37-3)
  • (added) llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll (+46)
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index cb0208a4a5f32..30c2e535a9a35 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -4294,10 +4294,28 @@ static unsigned CopyToFromAsymmetricReg(Register DestReg, Register SrcReg,
 
   if (X86::VR128XRegClass.contains(DestReg) &&
       X86::GR32RegClass.contains(SrcReg))
-    // Copy from a VR128 register to a VR128 register.
+    // Copy from a GR32 register to a VR128 register.
     return HasAVX512 ? X86::VMOVDI2PDIZrr
            : HasAVX  ? X86::VMOVDI2PDIrr
                      : X86::MOVDI2PDIrr;
+
+  // SrcReg(VR128) -> DestReg(GR16)
+  // SrcReg(GR16)  -> DestReg(VR128)
+
+  if (X86::GR16RegClass.contains(DestReg) &&
+      X86::VR128XRegClass.contains(SrcReg))
+    // Copy from a VR128 register to a GR16 register.
+    return HasAVX512 ? X86::VPEXTRWZrri
+           : HasAVX  ? X86::VPEXTRWrri
+                     : X86::PEXTRWrri;
+
+  if (X86::VR128XRegClass.contains(DestReg) &&
+      X86::GR16RegClass.contains(SrcReg))
+    // Copy from a GR16 register to a VR128 register.
+    return HasAVX512 ? X86::VPINSRWZrri
+           : HasAVX  ? X86::VPINSRWrri
+                     : X86::PINSRWrri;
+
   return 0;
 }
 
@@ -4370,8 +4388,24 @@ void X86InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
     Opc = CopyToFromAsymmetricReg(DestReg, SrcReg, Subtarget);
 
   if (Opc) {
-    BuildMI(MBB, MI, DL, get(Opc), DestReg)
-        .addReg(SrcReg, getKillRegState(KillSrc));
+    auto MIB = BuildMI(MBB, MI, DL, get(Opc), DestReg);
+    switch (Opc) {
+    case X86::VPINSRWZrri:
+    case X86::VPINSRWrri:
+    case X86::PINSRWrri:
+      MIB.addReg(DestReg, RegState::Undef)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .addImm(0);
+      break;
+    case X86::VPEXTRWZrri:
+    case X86::VPEXTRWrri:
+    case X86::PEXTRWrri:
+      MIB.addReg(SrcReg, getKillRegState(KillSrc)).addImm(0);
+      break;
+    default:
+      MIB.addReg(SrcReg, getKillRegState(KillSrc));
+      break;
+    }
     return;
   }
 
diff --git a/llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll b/llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll
new file mode 100644
index 0000000000000..1d2bd50dbc368
--- /dev/null
+++ b/llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll
@@ -0,0 +1,46 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -global-isel | FileCheck %s --check-prefixes=CHECK,SSE2
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -global-isel -mattr=+avx | FileCheck %s --check-prefixes=CHECK,AVX
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -global-isel -mattr=+avx512f | FileCheck %s --check-prefixes=CHECK,AVX512
+
+define dso_local noundef half @bar(i16 %0) {
+; SSE2-LABEL: bar:
+; SSE2:       # %bb.0: # %entry
+; SSE2-NEXT:    pinsrw $0, %di, %xmm0
+; SSE2-NEXT:    retq
+;
+; AVX-LABEL: bar:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vpinsrw $0, %di, %xmm0, %xmm0
+; AVX-NEXT:    retq
+;
+; AVX512-LABEL: bar:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpinsrw $0, %di, %xmm0, %xmm0
+; AVX512-NEXT:    retq
+entry:
+  %2 = bitcast i16 %0 to half
+  ret half %2
+}
+
+define dso_local noundef i16 @test_half_to_i16(half %0) {
+; SSE2-LABEL: test_half_to_i16:
+; SSE2:       # %bb.0: # %entry
+; SSE2-NEXT:    pextrw $0, %xmm0, %ax
+; SSE2-NEXT:    retq
+;
+; AVX-LABEL: test_half_to_i16:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vpextrw $0, %xmm0, %ax
+; AVX-NEXT:    retq
+;
+; AVX512-LABEL: test_half_to_i16:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpextrw $0, %xmm0, %ax
+; AVX512-NEXT:    retq
+entry:
+  %2 = bitcast half %0 to i16
+  ret i16 %2
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}

@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: None (GrumpyPigSkin)

Changes

Added missing checks for casting half to/from i16 with global-isel enabled.

Please let me know if there is a better place for the tests to live and if you want additional checks for other sizes as I could not find any tests for these.

@RKSimon please could you review :)

Fixes #166557


Full diff: https://github.com/llvm/llvm-project/pull/168456.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+37-3)
  • (added) llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll (+46)
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index cb0208a4a5f32..30c2e535a9a35 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -4294,10 +4294,28 @@ static unsigned CopyToFromAsymmetricReg(Register DestReg, Register SrcReg,
 
   if (X86::VR128XRegClass.contains(DestReg) &&
       X86::GR32RegClass.contains(SrcReg))
-    // Copy from a VR128 register to a VR128 register.
+    // Copy from a GR32 register to a VR128 register.
     return HasAVX512 ? X86::VMOVDI2PDIZrr
            : HasAVX  ? X86::VMOVDI2PDIrr
                      : X86::MOVDI2PDIrr;
+
+  // SrcReg(VR128) -> DestReg(GR16)
+  // SrcReg(GR16)  -> DestReg(VR128)
+
+  if (X86::GR16RegClass.contains(DestReg) &&
+      X86::VR128XRegClass.contains(SrcReg))
+    // Copy from a VR128 register to a GR16 register.
+    return HasAVX512 ? X86::VPEXTRWZrri
+           : HasAVX  ? X86::VPEXTRWrri
+                     : X86::PEXTRWrri;
+
+  if (X86::VR128XRegClass.contains(DestReg) &&
+      X86::GR16RegClass.contains(SrcReg))
+    // Copy from a GR16 register to a VR128 register.
+    return HasAVX512 ? X86::VPINSRWZrri
+           : HasAVX  ? X86::VPINSRWrri
+                     : X86::PINSRWrri;
+
   return 0;
 }
 
@@ -4370,8 +4388,24 @@ void X86InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
     Opc = CopyToFromAsymmetricReg(DestReg, SrcReg, Subtarget);
 
   if (Opc) {
-    BuildMI(MBB, MI, DL, get(Opc), DestReg)
-        .addReg(SrcReg, getKillRegState(KillSrc));
+    auto MIB = BuildMI(MBB, MI, DL, get(Opc), DestReg);
+    switch (Opc) {
+    case X86::VPINSRWZrri:
+    case X86::VPINSRWrri:
+    case X86::PINSRWrri:
+      MIB.addReg(DestReg, RegState::Undef)
+          .addReg(SrcReg, getKillRegState(KillSrc))
+          .addImm(0);
+      break;
+    case X86::VPEXTRWZrri:
+    case X86::VPEXTRWrri:
+    case X86::PEXTRWrri:
+      MIB.addReg(SrcReg, getKillRegState(KillSrc)).addImm(0);
+      break;
+    default:
+      MIB.addReg(SrcReg, getKillRegState(KillSrc));
+      break;
+    }
     return;
   }
 
diff --git a/llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll b/llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll
new file mode 100644
index 0000000000000..1d2bd50dbc368
--- /dev/null
+++ b/llvm/test/CodeGen/X86/GlobalISel/fp-bitcast.ll
@@ -0,0 +1,46 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -global-isel | FileCheck %s --check-prefixes=CHECK,SSE2
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -global-isel -mattr=+avx | FileCheck %s --check-prefixes=CHECK,AVX
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -global-isel -mattr=+avx512f | FileCheck %s --check-prefixes=CHECK,AVX512
+
+define dso_local noundef half @bar(i16 %0) {
+; SSE2-LABEL: bar:
+; SSE2:       # %bb.0: # %entry
+; SSE2-NEXT:    pinsrw $0, %di, %xmm0
+; SSE2-NEXT:    retq
+;
+; AVX-LABEL: bar:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vpinsrw $0, %di, %xmm0, %xmm0
+; AVX-NEXT:    retq
+;
+; AVX512-LABEL: bar:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpinsrw $0, %di, %xmm0, %xmm0
+; AVX512-NEXT:    retq
+entry:
+  %2 = bitcast i16 %0 to half
+  ret half %2
+}
+
+define dso_local noundef i16 @test_half_to_i16(half %0) {
+; SSE2-LABEL: test_half_to_i16:
+; SSE2:       # %bb.0: # %entry
+; SSE2-NEXT:    pextrw $0, %xmm0, %ax
+; SSE2-NEXT:    retq
+;
+; AVX-LABEL: test_half_to_i16:
+; AVX:       # %bb.0: # %entry
+; AVX-NEXT:    vpextrw $0, %xmm0, %ax
+; AVX-NEXT:    retq
+;
+; AVX512-LABEL: test_half_to_i16:
+; AVX512:       # %bb.0: # %entry
+; AVX512-NEXT:    vpextrw $0, %xmm0, %ax
+; AVX512-NEXT:    retq
+entry:
+  %2 = bitcast half %0 to i16
+  ret i16 %2
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; CHECK: {{.*}}

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

🐧 Linux x64 Test Results

  • 186387 tests passed
  • 4855 tests skipped

@RKSimon RKSimon requested review from RKSimon, arsenm and e-kud November 18, 2025 06:49
@e-kud
Copy link
Contributor

e-kud commented Nov 18, 2025

I tend to consider that GlobalISel needs to emit quite similar code to SDAG on such small examples.

Optimized legalized selection DAG: %bb.0 'test_i16_to_half:entry'
SelectionDAG has 9 nodes:
  t0: ch,glue = EntryToken
        t2: i32,ch = CopyFromReg t0, Register:i32 %0
      t3: i16 = truncate t2
    t4: f16 = bitcast t3
  t7: ch,glue = CopyToReg t0, Register:f16 $xmm0, t4
  t8: ch = X86ISD::RET_GLUE t7, TargetConstant:i32<0>, Register:f16 $xmm0, t7:1
# *** IR Dump Before InstructionSelect (instruction-select) ***:
# Machine code for function test_i16_to_half: IsSSA, TracksLiveness, Legalized, RegBankSelected
Function Live Ins: $edi

bb.1.entry:
  liveins: $edi
  %1:gpr(s32) = COPY $edi
  %0:gpr(s16) = G_TRUNC %1:gpr(s32)
  $xmm0 = COPY %0:gpr(s16)
  RET 0, implicit $xmm0

I'd think that the problem is in GlobalISel that bitcast hasn't been inserted by IRTranslator. But probably for natural insertion of this bitcast #155107 is required.

UPD: Looked closer at CopyToFromAsymmetricReg and it looks like a right place for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[X86][GISel] fatal error: error in backend: Cannot emit physreg copy instruction

6 participants