[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr #130488

phoebewang · 2025-03-09T13:13:58Z

https://godbolt.org/z/rWYdqnjjx

llvmbot · 2025-03-09T13:14:35Z

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

https://godbolt.org/z/rWYdqnjjx

Full diff: https://github.com/llvm/llvm-project/pull/130488.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+17-1)
(modified) llvm/test/CodeGen/X86/apx/cf.ll (+23-5)

diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 5fe7203c052d8..d83adf99ce6d3 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -5366,6 +5366,7 @@ bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
   MachineBasicBlock::reverse_iterator From =
       std::next(MachineBasicBlock::reverse_iterator(CmpInstr));
   for (MachineBasicBlock *MBB = &CmpMBB;;) {
+    SmallVector<MachineInstr *, 4> NDDInsts;
     for (MachineInstr &Inst : make_range(From, MBB->rend())) {
       // Try to use EFLAGS from the instruction defining %SrcReg. Example:
       //     %eax = addl ...
@@ -5441,13 +5442,28 @@ bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
           continue;
         }
 
+        // Try to replace NDD with NF instructions.
+        if (Subtarget.hasNF() &&
+            X86II::hasNewDataDest(Inst.getDesc().TSFlags) &&
+            Inst.registerDefIsDead(X86::EFLAGS, TRI)) {
+          NDDInsts.push_back(&Inst);
+          continue;
+        }
+
+        NDDInsts.clear();
+
         // Cannot do anything for any other EFLAG changes.
         return false;
       }
     }
 
-    if (MI || Sub)
+    if (MI || Sub) {
+      for (MachineInstr *NDD : NDDInsts) {
+        NDD->setDesc(get(X86::getNFVariant(NDD->getOpcode())));
+        NDD->removeOperand(NDD->getNumOperands() - 1);
+      }
       break;
+    }
 
     // Reached begin of basic block. Continue in predecessor if there is
     // exactly one.
diff --git a/llvm/test/CodeGen/X86/apx/cf.ll b/llvm/test/CodeGen/X86/apx/cf.ll
index a64d7df11a4d0..fc170ca5f2b2e 100644
--- a/llvm/test/CodeGen/X86/apx/cf.ll
+++ b/llvm/test/CodeGen/X86/apx/cf.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64 -mattr=+cf,+avx512f -verify-machineinstrs | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64 -mattr=+cf,+nf,+ndd,+avx512f -verify-machineinstrs | FileCheck %s
 
 define void @basic(i32 %a, ptr %b, ptr %p, ptr %q) {
 ; CHECK-LABEL: basic:
@@ -57,9 +57,8 @@ entry:
 define i64 @reduced_data_dependency(i64 %a, i64 %b, ptr %c) {
 ; CHECK-LABEL: reduced_data_dependency:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    movq %rdi, %rcx
-; CHECK-NEXT:    subq %rsi, %rcx
-; CHECK-NEXT:    cfcmovnsq (%rdx), %rdi, %rax
+; CHECK-NEXT:    subq %rsi, %rdi, %rax
+; CHECK-NEXT:    cfcmovnsq (%rdx), %rdi, %rcx
 ; CHECK-NEXT:    addq %rcx, %rax
 ; CHECK-NEXT:    retq
 entry:
@@ -125,7 +124,7 @@ entry:
   ret void
 }
 
-define void @single_cmp(i32 %a, i32 %b, ptr %c, ptr %d) #2 {
+define void @single_cmp(i32 %a, i32 %b, ptr %c, ptr %d) {
 ; CHECK-LABEL: single_cmp:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    cmpl %esi, %edi
@@ -139,3 +138,22 @@ entry:
   tail call void @llvm.masked.store.v1i16.p0(<1 x i16> %2, ptr %d, i32 2, <1 x i1> %1)
   ret void
 }
+
+define void @load_add_store(i32 %a, i32 %b, ptr %p) {
+; CHECK-LABEL: load_add_store:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    cmpl %esi, %edi
+; CHECK-NEXT:    cfcmovnew (%rdx), %ax
+; CHECK-NEXT:    {nf} incw %ax
+; CHECK-NEXT:    cfcmovnew %ax, (%rdx)
+; CHECK-NEXT:    retq
+entry:
+  %0 = icmp ne i32 %a, %b
+  %1 = insertelement <1 x i1> poison, i1 %0, i64 0
+  %2 = tail call <1 x i16> @llvm.masked.load.v1i16.p0(ptr %p, i32 2, <1 x i1> %1, <1 x i16> poison)
+  %3 = extractelement <1 x i16> %2, i64 0
+  %4 = add i16 %3, 1
+  %5 = insertelement <1 x i16> poison, i16 %4, i64 0
+  tail call void @llvm.masked.store.v1i16.p0(<1 x i16> %5, ptr %p, i32 2, <1 x i1> %1)
+  ret void
+}

…reInstr https://godbolt.org/z/rWYdqnjjx

KanRobert · 2025-03-10T02:05:49Z

llvm/lib/Target/X86/X86InstrInfo.cpp

  bool ClearsOverflowFlag = false;
  bool ShouldUpdateCC = false;
  bool IsSwapped = false;
+  bool HasCF = Subtarget.hasNF();


HasCF->HasNF

Good catch! done.

KanRobert · 2025-03-10T02:08:02Z

It seems the optimization does not rely on NDD?

phoebewang · 2025-03-10T08:32:50Z

It seems the optimization does not rely on NDD?

You are correct! I thought only NDD instructions support NF. Thanks!

KanRobert · 2025-03-10T08:40:14Z

llvm/lib/Target/X86/X86InstrInfo.cpp


+        // Try to replace non-NF with NF instructions.
+        if (HasNF && Inst.registerDefIsDead(X86::EFLAGS, TRI)) {
+          unsigned NewOp = X86::getNFVariant(Inst.getOpcode());


It seems we don't need to store the opcodes of NF variants. Just setDesc(X86::getNFVariant(Inst.getOpcode())); at line 5654?

I think the table lookup is more expensive than a little memory space.

Would it introduce more times of table lookup? The times seem equal for me.

Yes, we need to query twice if we don't store the value. We need query here to check if the instruction can turn into NF.

KanRobert

LGTM

KanRobert · 2025-03-10T10:21:57Z

llvm/test/CodeGen/X86/apx/cf.ll

BTW, this test should be added in nf.ll instead of cf.ll

There's no nf.ll. The nf tests are sacttered in add/or/sub/...ll, so put it in cf.ll is ok since we happen to have nf condition lowering here :)

Since llvm#130488, we have NF instructions when converting to three address instructions.

…30969) Since #130488, we have NF instructions when converting to three address instructions.

phoebewang requested review from KanRobert and RKSimon March 9, 2025 13:13

llvmbot added the backend:X86 label Mar 9, 2025

phoebewang force-pushed the APX branch from f9624b6 to 50bb638 Compare March 10, 2025 00:40

[X86][APX] Try to replace NDD with NF instructions when optimizeCompa…

3f0314d

…reInstr https://godbolt.org/z/rWYdqnjjx

phoebewang force-pushed the APX branch from 50bb638 to 3f0314d Compare March 10, 2025 00:56

KanRobert reviewed Mar 10, 2025

View reviewed changes

Address review comments

1c4bc9b

phoebewang force-pushed the APX branch from 527af4e to 1c4bc9b Compare March 10, 2025 08:34

KanRobert reviewed Mar 10, 2025

View reviewed changes

phoebewang changed the title ~~[X86][APX] Try to replace NDD with NF instructions when optimizeCompareInstr~~ [X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr Mar 10, 2025

KanRobert approved these changes Mar 10, 2025

View reviewed changes

KanRobert reviewed Mar 10, 2025

View reviewed changes

phoebewang merged commit 507e0c3 into llvm:main Mar 10, 2025
11 checks passed

phoebewang deleted the APX branch March 10, 2025 13:08

shiltian mentioned this pull request Mar 10, 2025

[AMDGPU] Fix test failures when expensive checks are enabled #130644

Merged

phoebewang mentioned this pull request Mar 12, 2025

[X86][APX] Add NF instructions to convertToThreeAddress functions #130969

Merged

phoebewang added a commit to phoebewang/llvm-project that referenced this pull request Mar 12, 2025

[X86][APX] Add NF instructions to convertToThreeAddress functions

5f86891

Since llvm#130488, we have NF instructions when converting to three address instructions.

phoebewang added a commit that referenced this pull request Mar 13, 2025

[X86][APX] Add NF instructions to convertToThreeAddress functions (#1…

bc4b2c7

…30969) Since #130488, we have NF instructions when converting to three address instructions.

[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr #130488

[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr #130488

Uh oh!

Conversation

phoebewang commented Mar 9, 2025

Uh oh!

llvmbot commented Mar 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KanRobert commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoebewang commented Mar 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KanRobert Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KanRobert left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KanRobert commented Mar 10, 2025 •

edited

Loading

KanRobert Mar 10, 2025 •

edited

Loading