Skip to content

Conversation

abhishek-kaushik22
Copy link
Contributor

@abhishek-kaushik22 abhishek-kaushik22 commented Oct 14, 2025

In buildClearRegister use the correct pseudo-opcode for each register class:

  • For VR128, use V_SET0
  • For VR256, use AVX_SET0
  • For VR512, use AVX512_512_SET0
  • For VK*, use KSET0Q/KSET0W

This avoids illegal register/opcode pairings and machine verifier errors when clearing call-used registers under -fzero-call-used-regs=used.

Fixes: #163053

Ensure buildClearRegister uses the correct opcode for each register class:

- For YMM (VR256), use VPXORYrr
- For ZMM (VR512), use VPXORDZrr

This avoids illegal register/opcode pairings and MachineVerifier errors when clearing call-used registers under -fzero-call-used-regs.

Fixes: llvm#163053
@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2025

@llvm/pr-subscribers-backend-x86

Author: Abhishek Kaushik (abhishek-kaushik22)

Changes

Ensure buildClearRegister uses the correct opcode for each register class:

  • For YMM (VR256), use VPXORYrr
  • For ZMM (VR512), use VPXORDZrr

This avoids illegal register/opcode pairings and machine verifier errors when clearing call-used registers under -fzero-call-used-regs=used.

Fixes: #163053


Full diff: https://github.com/llvm/llvm-project/pull/163358.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+4-4)
  • (added) llvm/test/CodeGen/X86/pr163053.ll (+22)
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 1d2cd39951bf4..671363d87db1b 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -10818,8 +10818,8 @@ void X86InstrInfo::buildClearRegister(Register Reg, MachineBasicBlock &MBB,
     if (!ST.hasAVX())
       return;
 
-    // VPXOR is safe to use because it doesn't affect flags.
-    BuildMI(MBB, Iter, DL, get(X86::VPXORrr), Reg)
+    // VPXORY is safe to use because it doesn't affect flags.
+    BuildMI(MBB, Iter, DL, get(X86::VPXORYrr), Reg)
         .addReg(Reg, RegState::Undef)
         .addReg(Reg, RegState::Undef);
   } else if (X86::VR512RegClass.contains(Reg)) {
@@ -10827,8 +10827,8 @@ void X86InstrInfo::buildClearRegister(Register Reg, MachineBasicBlock &MBB,
     if (!ST.hasAVX512())
       return;
 
-    // VPXORY is safe to use because it doesn't affect flags.
-    BuildMI(MBB, Iter, DL, get(X86::VPXORYrr), Reg)
+    // VPXORDZ is safe to use because it doesn't affect flags.
+    BuildMI(MBB, Iter, DL, get(X86::VPXORDZrr), Reg)
         .addReg(Reg, RegState::Undef)
         .addReg(Reg, RegState::Undef);
   } else if (X86::VK1RegClass.contains(Reg) || X86::VK2RegClass.contains(Reg) ||
diff --git a/llvm/test/CodeGen/X86/pr163053.ll b/llvm/test/CodeGen/X86/pr163053.ll
new file mode 100644
index 0000000000000..0fdb4d4be3896
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr163053.ll
@@ -0,0 +1,22 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=x86_64-- -mcpu=x86-64-v4 -verify-machineinstrs | FileCheck %s
+
+define void @pr163053(ptr %arg, ptr %arg1) #0 {
+; CHECK-LABEL: pr163053:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vxorps %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps %ymm0, (%rdi)
+; CHECK-NEXT:    vxorps %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovaps %zmm0, (%rsi)
+; CHECK-NEXT:    xorl %edi, %edi
+; CHECK-NEXT:    xorl %esi, %esi
+; CHECK-NEXT:    vxorps %ymm0, %ymm0, %ymm0
+; CHECK-NEXT:    vxorps %zmm0, %zmm0, %zmm0
+; CHECK-NEXT:    vzeroupper
+; CHECK-NEXT:    retq
+  store <16 x i16> zeroinitializer, ptr %arg, align 32
+  store <32 x i16> zeroinitializer, ptr %arg1, align 64
+  ret void
+}
+
+attributes #0 = { "zero-call-used-regs"="used" }

@abhishek-kaushik22
Copy link
Contributor Author

@phoebewang @e-kud @RKSimon should this function also handle the VR128X and VR256X register classes? I feel that might also be a verifier failure, but I couldn't write a test that triggers that.

@abhishek-kaushik22
Copy link
Contributor Author

Thanks @phoebewang for the suggestion!
@RKSimon should I change all the cases to use pseudo instructions? Also, I notice there are no tests for SIMD registers in the original commit deaf22b

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 14, 2025

@RKSimon should I change all the cases to use pseudo instructions?

Yes please.

Also, I notice there are no tests for SIMD registers in the original commit deaf22b

Yes, we need better coverage - please can you try adding x86-64 and x86-64-v3 RUNs to pr163053.ll? Maybe a AVX1 target such as sandybridge or btver2 as well.

@abhishek-kaushik22 abhishek-kaushik22 changed the title [X86] Fix zeroing idioms for YMM and ZMM in buildClearRegister [X86] Use pseudo instructions to zero registers in buildClearRegister Oct 16, 2025
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers

@abhishek-kaushik22 abhishek-kaushik22 enabled auto-merge (squash) October 17, 2025 09:28
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 17, 2025

@abhishek-kaushik22 the CI failures are unrelated if you want to skip auto-merge and just push

@abhishek-kaushik22 abhishek-kaushik22 merged commit 228dae7 into llvm:main Oct 17, 2025
9 of 10 checks passed
@iucoen
Copy link

iucoen commented Oct 17, 2025

Can this be cherry-picked to the release/21.x branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clang] Miscompiles OpenSSH 10.x's mlkem768 key-exchange protocol when avx512 is enabled.

6 participants