-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr #130488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-x86 Author: Phoebe Wang (phoebewang) Changeshttps://godbolt.org/z/rWYdqnjjx Full diff: https://github.com/llvm/llvm-project/pull/130488.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 5fe7203c052d8..d83adf99ce6d3 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -5366,6 +5366,7 @@ bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
MachineBasicBlock::reverse_iterator From =
std::next(MachineBasicBlock::reverse_iterator(CmpInstr));
for (MachineBasicBlock *MBB = &CmpMBB;;) {
+ SmallVector<MachineInstr *, 4> NDDInsts;
for (MachineInstr &Inst : make_range(From, MBB->rend())) {
// Try to use EFLAGS from the instruction defining %SrcReg. Example:
// %eax = addl ...
@@ -5441,13 +5442,28 @@ bool X86InstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, Register SrcReg,
continue;
}
+ // Try to replace NDD with NF instructions.
+ if (Subtarget.hasNF() &&
+ X86II::hasNewDataDest(Inst.getDesc().TSFlags) &&
+ Inst.registerDefIsDead(X86::EFLAGS, TRI)) {
+ NDDInsts.push_back(&Inst);
+ continue;
+ }
+
+ NDDInsts.clear();
+
// Cannot do anything for any other EFLAG changes.
return false;
}
}
- if (MI || Sub)
+ if (MI || Sub) {
+ for (MachineInstr *NDD : NDDInsts) {
+ NDD->setDesc(get(X86::getNFVariant(NDD->getOpcode())));
+ NDD->removeOperand(NDD->getNumOperands() - 1);
+ }
break;
+ }
// Reached begin of basic block. Continue in predecessor if there is
// exactly one.
diff --git a/llvm/test/CodeGen/X86/apx/cf.ll b/llvm/test/CodeGen/X86/apx/cf.ll
index a64d7df11a4d0..fc170ca5f2b2e 100644
--- a/llvm/test/CodeGen/X86/apx/cf.ll
+++ b/llvm/test/CodeGen/X86/apx/cf.ll
@@ -1,5 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=x86_64 -mattr=+cf,+avx512f -verify-machineinstrs | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64 -mattr=+cf,+nf,+ndd,+avx512f -verify-machineinstrs | FileCheck %s
define void @basic(i32 %a, ptr %b, ptr %p, ptr %q) {
; CHECK-LABEL: basic:
@@ -57,9 +57,8 @@ entry:
define i64 @reduced_data_dependency(i64 %a, i64 %b, ptr %c) {
; CHECK-LABEL: reduced_data_dependency:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: movq %rdi, %rcx
-; CHECK-NEXT: subq %rsi, %rcx
-; CHECK-NEXT: cfcmovnsq (%rdx), %rdi, %rax
+; CHECK-NEXT: subq %rsi, %rdi, %rax
+; CHECK-NEXT: cfcmovnsq (%rdx), %rdi, %rcx
; CHECK-NEXT: addq %rcx, %rax
; CHECK-NEXT: retq
entry:
@@ -125,7 +124,7 @@ entry:
ret void
}
-define void @single_cmp(i32 %a, i32 %b, ptr %c, ptr %d) #2 {
+define void @single_cmp(i32 %a, i32 %b, ptr %c, ptr %d) {
; CHECK-LABEL: single_cmp:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: cmpl %esi, %edi
@@ -139,3 +138,22 @@ entry:
tail call void @llvm.masked.store.v1i16.p0(<1 x i16> %2, ptr %d, i32 2, <1 x i1> %1)
ret void
}
+
+define void @load_add_store(i32 %a, i32 %b, ptr %p) {
+; CHECK-LABEL: load_add_store:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: cmpl %esi, %edi
+; CHECK-NEXT: cfcmovnew (%rdx), %ax
+; CHECK-NEXT: {nf} incw %ax
+; CHECK-NEXT: cfcmovnew %ax, (%rdx)
+; CHECK-NEXT: retq
+entry:
+ %0 = icmp ne i32 %a, %b
+ %1 = insertelement <1 x i1> poison, i1 %0, i64 0
+ %2 = tail call <1 x i16> @llvm.masked.load.v1i16.p0(ptr %p, i32 2, <1 x i1> %1, <1 x i16> poison)
+ %3 = extractelement <1 x i16> %2, i64 0
+ %4 = add i16 %3, 1
+ %5 = insertelement <1 x i16> poison, i16 %4, i64 0
+ tail call void @llvm.masked.store.v1i16.p0(<1 x i16> %5, ptr %p, i32 2, <1 x i1> %1)
+ ret void
+}
|
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
| bool ClearsOverflowFlag = false; | ||
| bool ShouldUpdateCC = false; | ||
| bool IsSwapped = false; | ||
| bool HasCF = Subtarget.hasNF(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasCF->HasNF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! done.
|
It seems the optimization does not rely on NDD? |
You are correct! I thought only NDD instructions support NF. Thanks! |
|
|
||
| // Try to replace non-NF with NF instructions. | ||
| if (HasNF && Inst.registerDefIsDead(X86::EFLAGS, TRI)) { | ||
| unsigned NewOp = X86::getNFVariant(Inst.getOpcode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we don't need to store the opcodes of NF variants. Just setDesc(X86::getNFVariant(Inst.getOpcode())); at line 5654?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the table lookup is more expensive than a little memory space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it introduce more times of table lookup? The times seem equal for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to query twice if we don't store the value. We need query here to check if the instruction can turn into NF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay
KanRobert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this test should be added in nf.ll instead of cf.ll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no nf.ll. The nf tests are sacttered in add/or/sub/...ll, so put it in cf.ll is ok since we happen to have nf condition lowering here :)
Since llvm#130488, we have NF instructions when converting to three address instructions.
https://godbolt.org/z/rWYdqnjjx