-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[DAG] Recognise AVGFLOOR (((A >> 1) + (B >> 1)) + (A & B & 1)) patterns #169644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Recognise 'LSB' style AVGFLOOR patterns. I've attempted to use the m_Reassociatable* pattern matchers, but encountered an issue in that we can't correctly match m_Value/m_Deferred pairs in the same reassociation as it appears that we have no guarantees on the order of matching. I'll raise a bug for this, and in the meantime we have the pattern in the test_lsb_i32 tests to show the missed matching opportunity. Fixes llvm#53648
|
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-x86 Author: Simon Pilgrim (RKSimon) ChangesRecognise 'LSB' style AVGFLOOR patterns. I've attempted to use the m_Reassociatable* pattern matchers, but encountered an issue in that we can't correctly match m_Value/m_Deferred pairs in the same reassociation as it appears that we have no guarantees on the order of matching. I'll raise a bug for this, and in the meantime we have the pattern in the test_lsb_i32 tests to show the missed matching opportunity. Fixes #53648 Full diff: https://github.com/llvm/llvm-project/pull/169644.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 6b79dbb46cadc..813cbeafeaec9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -3154,19 +3154,31 @@ SDValue DAGCombiner::visitADDLike(SDNode *N) {
}
// Attempt to form avgfloor(A, B) from (A & B) + ((A ^ B) >> 1)
+// Attempt to form avgfloor(A, B) from ((A >> 1) + (B >> 1)) + (A & B & 1)
SDValue DAGCombiner::foldAddToAvg(SDNode *N, const SDLoc &DL) {
SDValue N0 = N->getOperand(0);
EVT VT = N0.getValueType();
SDValue A, B;
+ // FIXME: m_ReassociatableAdd can't handle m_Value/m_Deferred mixing.
if ((!LegalOperations || hasOperation(ISD::AVGFLOORU, VT)) &&
- sd_match(N, m_Add(m_And(m_Value(A), m_Value(B)),
- m_Srl(m_Xor(m_Deferred(A), m_Deferred(B)), m_One())))) {
+ (sd_match(N,
+ m_Add(m_And(m_Value(A), m_Value(B)),
+ m_Srl(m_Xor(m_Deferred(A), m_Deferred(B)), m_One()))) ||
+ sd_match(N, m_Add(m_Add(m_Srl(m_Value(A), m_One()),
+ m_Srl(m_Value(B), m_One())),
+ m_ReassociatableAnd(m_Deferred(A), m_Deferred(B),
+ m_One()))))) {
return DAG.getNode(ISD::AVGFLOORU, DL, VT, A, B);
}
if ((!LegalOperations || hasOperation(ISD::AVGFLOORS, VT)) &&
- sd_match(N, m_Add(m_And(m_Value(A), m_Value(B)),
- m_Sra(m_Xor(m_Deferred(A), m_Deferred(B)), m_One())))) {
+ (sd_match(N,
+ m_Add(m_And(m_Value(A), m_Value(B)),
+ m_Sra(m_Xor(m_Deferred(A), m_Deferred(B)), m_One()))) ||
+ sd_match(N, m_Add(m_Add(m_Sra(m_Value(A), m_One()),
+ m_Sra(m_Value(B), m_One())),
+ m_ReassociatableAnd(m_Deferred(A), m_Deferred(B),
+ m_One()))))) {
return DAG.getNode(ISD::AVGFLOORS, DL, VT, A, B);
}
diff --git a/llvm/test/CodeGen/X86/avgfloors-scalar.ll b/llvm/test/CodeGen/X86/avgfloors-scalar.ll
index fd303192e6c50..c8bbc875834d1 100644
--- a/llvm/test/CodeGen/X86/avgfloors-scalar.ll
+++ b/llvm/test/CodeGen/X86/avgfloors-scalar.ll
@@ -38,26 +38,20 @@ define i8 @test_fixed_i8(i8 %a0, i8 %a1) nounwind {
define i8 @test_lsb_i8(i8 %a0, i8 %a1) nounwind {
; X86-LABEL: test_lsb_i8:
; X86: # %bb.0:
-; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %eax, %edx
-; X86-NEXT: sarb %dl
-; X86-NEXT: andb %cl, %al
-; X86-NEXT: sarb %cl
-; X86-NEXT: addb %dl, %cl
-; X86-NEXT: andb $1, %al
-; X86-NEXT: addb %cl, %al
+; X86-NEXT: movsbl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT: movsbl {{[0-9]+}}(%esp), %eax
+; X86-NEXT: addl %ecx, %eax
+; X86-NEXT: shrl %eax
+; X86-NEXT: # kill: def $al killed $al killed $eax
; X86-NEXT: retl
;
; X64-LABEL: test_lsb_i8:
; X64: # %bb.0:
-; X64-NEXT: movl %edi, %eax
-; X64-NEXT: sarb %al
-; X64-NEXT: andb %sil, %dil
-; X64-NEXT: sarb %sil
-; X64-NEXT: addb %sil, %al
-; X64-NEXT: andb $1, %dil
-; X64-NEXT: addb %dil, %al
+; X64-NEXT: movsbl %sil, %ecx
+; X64-NEXT: movsbl %dil, %eax
+; X64-NEXT: addl %ecx, %eax
+; X64-NEXT: shrl %eax
+; X64-NEXT: # kill: def $al killed $al killed $eax
; X64-NEXT: retq
%s0 = ashr i8 %a0, 1
%s1 = ashr i8 %a1, 1
@@ -124,26 +118,17 @@ define i16 @test_lsb_i16(i16 %a0, i16 %a1) nounwind {
; X86: # %bb.0:
; X86-NEXT: movswl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movswl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %eax, %edx
-; X86-NEXT: sarl %edx
-; X86-NEXT: andl %ecx, %eax
-; X86-NEXT: sarl %ecx
-; X86-NEXT: addl %edx, %ecx
-; X86-NEXT: andl $1, %eax
; X86-NEXT: addl %ecx, %eax
+; X86-NEXT: shrl %eax
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: retl
;
; X64-LABEL: test_lsb_i16:
; X64: # %bb.0:
-; X64-NEXT: movswl %si, %eax
-; X64-NEXT: movswl %di, %ecx
-; X64-NEXT: sarl %ecx
-; X64-NEXT: sarl %eax
+; X64-NEXT: movswl %si, %ecx
+; X64-NEXT: movswl %di, %eax
; X64-NEXT: addl %ecx, %eax
-; X64-NEXT: andl %esi, %edi
-; X64-NEXT: andl $1, %edi
-; X64-NEXT: addl %edi, %eax
+; X64-NEXT: shrl %eax
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
%s0 = ashr i16 %a0, 1
@@ -316,21 +301,19 @@ define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind {
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl %edi, %ebx
-; X86-NEXT: sarl %ebx
-; X86-NEXT: shldl $31, %eax, %edi
+; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT: movl %eax, %ebx
+; X86-NEXT: xorl %esi, %ebx
; X86-NEXT: movl %ecx, %edx
+; X86-NEXT: xorl %edi, %edx
+; X86-NEXT: shrdl $1, %edx, %ebx
+; X86-NEXT: andl %edi, %ecx
; X86-NEXT: sarl %edx
-; X86-NEXT: shldl $31, %esi, %ecx
-; X86-NEXT: addl %edi, %ecx
-; X86-NEXT: adcl %ebx, %edx
; X86-NEXT: andl %esi, %eax
-; X86-NEXT: andl $1, %eax
-; X86-NEXT: addl %ecx, %eax
-; X86-NEXT: adcl $0, %edx
+; X86-NEXT: addl %ebx, %eax
+; X86-NEXT: adcl %ecx, %edx
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
@@ -338,14 +321,11 @@ define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind {
;
; X64-LABEL: test_lsb_i64:
; X64: # %bb.0:
-; X64-NEXT: movq %rdi, %rcx
-; X64-NEXT: sarq %rcx
-; X64-NEXT: andl %esi, %edi
; X64-NEXT: movq %rsi, %rax
-; X64-NEXT: sarq %rax
-; X64-NEXT: addq %rcx, %rax
-; X64-NEXT: andl $1, %edi
-; X64-NEXT: addq %rdi, %rax
+; X64-NEXT: andq %rdi, %rax
+; X64-NEXT: xorq %rdi, %rsi
+; X64-NEXT: sarq %rsi
+; X64-NEXT: addq %rsi, %rax
; X64-NEXT: retq
%s0 = ashr i64 %a0, 1
%s1 = ashr i64 %a1, 1
diff --git a/llvm/test/CodeGen/X86/avgflooru-scalar.ll b/llvm/test/CodeGen/X86/avgflooru-scalar.ll
index 9ae4492bb4cd4..7ad10164ad484 100644
--- a/llvm/test/CodeGen/X86/avgflooru-scalar.ll
+++ b/llvm/test/CodeGen/X86/avgflooru-scalar.ll
@@ -40,24 +40,18 @@ define i8 @test_lsb_i8(i8 %a0, i8 %a1) nounwind {
; X86: # %bb.0:
; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %eax, %edx
-; X86-NEXT: shrb %dl
-; X86-NEXT: andb %cl, %al
-; X86-NEXT: shrb %cl
-; X86-NEXT: addb %dl, %cl
-; X86-NEXT: andb $1, %al
-; X86-NEXT: addb %cl, %al
+; X86-NEXT: addl %ecx, %eax
+; X86-NEXT: shrl %eax
+; X86-NEXT: # kill: def $al killed $al killed $eax
; X86-NEXT: retl
;
; X64-LABEL: test_lsb_i8:
; X64: # %bb.0:
-; X64-NEXT: movl %edi, %eax
-; X64-NEXT: shrb %al
-; X64-NEXT: andb %sil, %dil
-; X64-NEXT: shrb %sil
-; X64-NEXT: addb %sil, %al
-; X64-NEXT: andb $1, %dil
-; X64-NEXT: addb %dil, %al
+; X64-NEXT: movzbl %sil, %ecx
+; X64-NEXT: movzbl %dil, %eax
+; X64-NEXT: addl %ecx, %eax
+; X64-NEXT: shrl %eax
+; X64-NEXT: # kill: def $al killed $al killed $eax
; X64-NEXT: retq
%s0 = lshr i8 %a0, 1
%s1 = lshr i8 %a1, 1
@@ -124,26 +118,17 @@ define i16 @test_lsb_i16(i16 %a0, i16 %a1) nounwind {
; X86: # %bb.0:
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl %eax, %edx
-; X86-NEXT: shrl %edx
-; X86-NEXT: andl %ecx, %eax
-; X86-NEXT: shrl %ecx
-; X86-NEXT: addl %edx, %ecx
-; X86-NEXT: andl $1, %eax
; X86-NEXT: addl %ecx, %eax
+; X86-NEXT: shrl %eax
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: retl
;
; X64-LABEL: test_lsb_i16:
; X64: # %bb.0:
-; X64-NEXT: movzwl %si, %eax
-; X64-NEXT: movzwl %di, %ecx
-; X64-NEXT: shrl %ecx
-; X64-NEXT: shrl %eax
+; X64-NEXT: movzwl %si, %ecx
+; X64-NEXT: movzwl %di, %eax
; X64-NEXT: addl %ecx, %eax
-; X64-NEXT: andl %esi, %edi
-; X64-NEXT: andl $1, %edi
-; X64-NEXT: addl %edi, %eax
+; X64-NEXT: shrl %eax
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
%s0 = lshr i16 %a0, 1
@@ -300,40 +285,23 @@ define i64 @test_fixed_i64(i64 %a0, i64 %a1) nounwind {
define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind {
; X86-LABEL: test_lsb_i64:
; X86: # %bb.0:
-; X86-NEXT: pushl %ebx
-; X86-NEXT: pushl %edi
-; X86-NEXT: pushl %esi
-; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT: movl %edi, %ebx
-; X86-NEXT: shrl %ebx
-; X86-NEXT: shldl $31, %eax, %edi
-; X86-NEXT: movl %ecx, %edx
-; X86-NEXT: shrl %edx
-; X86-NEXT: shldl $31, %esi, %ecx
-; X86-NEXT: addl %edi, %ecx
-; X86-NEXT: adcl %ebx, %edx
-; X86-NEXT: andl %esi, %eax
-; X86-NEXT: andl $1, %eax
-; X86-NEXT: addl %ecx, %eax
-; X86-NEXT: adcl $0, %edx
-; X86-NEXT: popl %esi
-; X86-NEXT: popl %edi
-; X86-NEXT: popl %ebx
+; X86-NEXT: addl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT: adcl {{[0-9]+}}(%esp), %eax
+; X86-NEXT: setb %dl
+; X86-NEXT: movzbl %dl, %edx
+; X86-NEXT: shldl $31, %eax, %edx
+; X86-NEXT: shldl $31, %ecx, %eax
; X86-NEXT: retl
;
; X64-LABEL: test_lsb_i64:
; X64: # %bb.0:
-; X64-NEXT: movq %rdi, %rcx
-; X64-NEXT: shrq %rcx
-; X64-NEXT: andl %esi, %edi
; X64-NEXT: movq %rsi, %rax
-; X64-NEXT: shrq %rax
-; X64-NEXT: addq %rcx, %rax
-; X64-NEXT: andl $1, %edi
-; X64-NEXT: addq %rdi, %rax
+; X64-NEXT: andq %rdi, %rax
+; X64-NEXT: xorq %rdi, %rsi
+; X64-NEXT: shrq %rsi
+; X64-NEXT: addq %rsi, %rax
; X64-NEXT: retq
%s0 = lshr i64 %a0, 1
%s1 = lshr i64 %a1, 1
|
|
ping? |
| EVT VT = N0.getValueType(); | ||
| SDValue A, B; | ||
|
|
||
| // FIXME: m_ReassociatableAdd can't handle m_Value/m_Deferred mixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general comment / thinking out loud on this issue: I guess the reason it doesn't handle m_Value / m_Deferred in the same reassociatable expression right now is because reassociatable matching is done in two phases, in which individual m_Value & m_Deferred are completely detached
That being said, even we somehow make them not detached so that all m_Value & m_Deferred will be triggered in the same pattern matching expression / chain, how should we make sense of the permutation case where m_Deferred might appear before its m_Value (I guess this is related to your comment of "no gurantee on the order of matching")? More specifically, if m_Deferred appears before m_Value, the preceding m_Deferred might accidentally grab the bound value populated by the previous permutation.
I think we might be able to solve this problem by imposing some partial order during permutation for reassocitable expressions that contain m_Value & m_Deferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure - hence I punted this to #169645 - I can find a workaround if necessary as a followup, but I'd much prefer to handle this completely with m_Reassociatable matchers if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mshockwave
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Surprisingly, this change seems to have measurable compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=a108881b24ecfea8d194b33dd9fb211943065bca&to=804e768bda1697ce0449eec844bd9ac4f29aed64&stat=instructions:u I suspect that ReassociatableOpc_match is very expensive. Might it make sense to make this less generic and just support 3-operand reassociation? That's all you need here, and in that case you only need to match two variants without any complex processing. |
|
CC @bermondd @mshockwave - thoughts? |
The use of nested m_Reassociatable matchers by llvm#169644 can result in high compile times as the inner m_Reassociatable call is being repeated a lot while the outer call is trying to match
|
I'll start working to make this more efficient tonight. These results look pretty bad. My first guess is that the template recursion that was done may bear some of the fault for this compile time impact here. The number of functions instantiated grows linearly with the number of patterns provided, so maybe a different approach where this isn't necessary could help. That being said, I personally think reducing the scope of ReassociatableOpc_match to only 3 patterns wouldn't be the right approach. The struct tried to match any number of patterns since prior to #170061, so reducing the domain could break any usages that involved 4 or more, if there are any. |
|
I've seen errors like when compiling for my out of tree target with this patch. When it fails (after a long time) there are a lot of calls on the stack. |
|
That can happen in cases where SmallVector<> screws up the CalculateSmallVectorDefaultInlinedElements calculation - please can you try with this local change: (note @bermondd is working on removing the SmallVector entirely) |
Thanks, unfortunately it didn't help.
Sounds good. Maybe I'll just wait and see how it behaves after that improvement then. |
I'll open a PR for this ASAP. I have further changes in mind to improve compile time, but maybe it's better to separate this into two PRs, so at least this reported issue is fixed sooner. |
Recognise 'LSB' style AVGFLOOR patterns.
Alive2: https://alive2.llvm.org/ce/z/nfSSk_
Fixes #53648