Skip to content

Conversation

@RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Nov 26, 2025

Recognise 'LSB' style AVGFLOOR patterns.

Alive2: https://alive2.llvm.org/ce/z/nfSSk_

Fixes #53648

Recognise 'LSB' style AVGFLOOR patterns.

I've attempted to use the m_Reassociatable* pattern matchers, but encountered an issue in that we can't correctly match m_Value/m_Deferred pairs in the same reassociation as it appears that we have no guarantees on the order of matching. I'll raise a bug for this, and in the meantime we have the pattern in the test_lsb_i32 tests to show the missed matching opportunity.

Fixes llvm#53648
@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Recognise 'LSB' style AVGFLOOR patterns.

I've attempted to use the m_Reassociatable* pattern matchers, but encountered an issue in that we can't correctly match m_Value/m_Deferred pairs in the same reassociation as it appears that we have no guarantees on the order of matching. I'll raise a bug for this, and in the meantime we have the pattern in the test_lsb_i32 tests to show the missed matching opportunity.

Fixes #53648


Full diff: https://github.com/llvm/llvm-project/pull/169644.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+16-4)
  • (modified) llvm/test/CodeGen/X86/avgfloors-scalar.ll (+27-47)
  • (modified) llvm/test/CodeGen/X86/avgflooru-scalar.ll (+22-54)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 6b79dbb46cadc..813cbeafeaec9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -3154,19 +3154,31 @@ SDValue DAGCombiner::visitADDLike(SDNode *N) {
 }
 
 // Attempt to form avgfloor(A, B) from (A & B) + ((A ^ B) >> 1)
+// Attempt to form avgfloor(A, B) from ((A >> 1) + (B >> 1)) + (A & B & 1)
 SDValue DAGCombiner::foldAddToAvg(SDNode *N, const SDLoc &DL) {
   SDValue N0 = N->getOperand(0);
   EVT VT = N0.getValueType();
   SDValue A, B;
 
+  // FIXME: m_ReassociatableAdd can't handle m_Value/m_Deferred mixing.
   if ((!LegalOperations || hasOperation(ISD::AVGFLOORU, VT)) &&
-      sd_match(N, m_Add(m_And(m_Value(A), m_Value(B)),
-                        m_Srl(m_Xor(m_Deferred(A), m_Deferred(B)), m_One())))) {
+      (sd_match(N,
+                m_Add(m_And(m_Value(A), m_Value(B)),
+                      m_Srl(m_Xor(m_Deferred(A), m_Deferred(B)), m_One()))) ||
+       sd_match(N, m_Add(m_Add(m_Srl(m_Value(A), m_One()),
+                               m_Srl(m_Value(B), m_One())),
+                         m_ReassociatableAnd(m_Deferred(A), m_Deferred(B),
+                                             m_One()))))) {
     return DAG.getNode(ISD::AVGFLOORU, DL, VT, A, B);
   }
   if ((!LegalOperations || hasOperation(ISD::AVGFLOORS, VT)) &&
-      sd_match(N, m_Add(m_And(m_Value(A), m_Value(B)),
-                        m_Sra(m_Xor(m_Deferred(A), m_Deferred(B)), m_One())))) {
+      (sd_match(N,
+                m_Add(m_And(m_Value(A), m_Value(B)),
+                      m_Sra(m_Xor(m_Deferred(A), m_Deferred(B)), m_One()))) ||
+       sd_match(N, m_Add(m_Add(m_Sra(m_Value(A), m_One()),
+                               m_Sra(m_Value(B), m_One())),
+                         m_ReassociatableAnd(m_Deferred(A), m_Deferred(B),
+                                             m_One()))))) {
     return DAG.getNode(ISD::AVGFLOORS, DL, VT, A, B);
   }
 
diff --git a/llvm/test/CodeGen/X86/avgfloors-scalar.ll b/llvm/test/CodeGen/X86/avgfloors-scalar.ll
index fd303192e6c50..c8bbc875834d1 100644
--- a/llvm/test/CodeGen/X86/avgfloors-scalar.ll
+++ b/llvm/test/CodeGen/X86/avgfloors-scalar.ll
@@ -38,26 +38,20 @@ define i8 @test_fixed_i8(i8 %a0, i8 %a1) nounwind {
 define i8 @test_lsb_i8(i8 %a0, i8 %a1) nounwind {
 ; X86-LABEL: test_lsb_i8:
 ; X86:       # %bb.0:
-; X86-NEXT:    movzbl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movzbl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %eax, %edx
-; X86-NEXT:    sarb %dl
-; X86-NEXT:    andb %cl, %al
-; X86-NEXT:    sarb %cl
-; X86-NEXT:    addb %dl, %cl
-; X86-NEXT:    andb $1, %al
-; X86-NEXT:    addb %cl, %al
+; X86-NEXT:    movsbl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movsbl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    addl %ecx, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    # kill: def $al killed $al killed $eax
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: test_lsb_i8:
 ; X64:       # %bb.0:
-; X64-NEXT:    movl %edi, %eax
-; X64-NEXT:    sarb %al
-; X64-NEXT:    andb %sil, %dil
-; X64-NEXT:    sarb %sil
-; X64-NEXT:    addb %sil, %al
-; X64-NEXT:    andb $1, %dil
-; X64-NEXT:    addb %dil, %al
+; X64-NEXT:    movsbl %sil, %ecx
+; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    addl %ecx, %eax
+; X64-NEXT:    shrl %eax
+; X64-NEXT:    # kill: def $al killed $al killed $eax
 ; X64-NEXT:    retq
   %s0 = ashr i8 %a0, 1
   %s1 = ashr i8 %a1, 1
@@ -124,26 +118,17 @@ define i16 @test_lsb_i16(i16 %a0, i16 %a1) nounwind {
 ; X86:       # %bb.0:
 ; X86-NEXT:    movswl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movswl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %eax, %edx
-; X86-NEXT:    sarl %edx
-; X86-NEXT:    andl %ecx, %eax
-; X86-NEXT:    sarl %ecx
-; X86-NEXT:    addl %edx, %ecx
-; X86-NEXT:    andl $1, %eax
 ; X86-NEXT:    addl %ecx, %eax
+; X86-NEXT:    shrl %eax
 ; X86-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: test_lsb_i16:
 ; X64:       # %bb.0:
-; X64-NEXT:    movswl %si, %eax
-; X64-NEXT:    movswl %di, %ecx
-; X64-NEXT:    sarl %ecx
-; X64-NEXT:    sarl %eax
+; X64-NEXT:    movswl %si, %ecx
+; X64-NEXT:    movswl %di, %eax
 ; X64-NEXT:    addl %ecx, %eax
-; X64-NEXT:    andl %esi, %edi
-; X64-NEXT:    andl $1, %edi
-; X64-NEXT:    addl %edi, %eax
+; X64-NEXT:    shrl %eax
 ; X64-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X64-NEXT:    retq
   %s0 = ashr i16 %a0, 1
@@ -316,21 +301,19 @@ define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind {
 ; X86-NEXT:    pushl %edi
 ; X86-NEXT:    pushl %esi
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, %ebx
-; X86-NEXT:    sarl %ebx
-; X86-NEXT:    shldl $31, %eax, %edi
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    movl %eax, %ebx
+; X86-NEXT:    xorl %esi, %ebx
 ; X86-NEXT:    movl %ecx, %edx
+; X86-NEXT:    xorl %edi, %edx
+; X86-NEXT:    shrdl $1, %edx, %ebx
+; X86-NEXT:    andl %edi, %ecx
 ; X86-NEXT:    sarl %edx
-; X86-NEXT:    shldl $31, %esi, %ecx
-; X86-NEXT:    addl %edi, %ecx
-; X86-NEXT:    adcl %ebx, %edx
 ; X86-NEXT:    andl %esi, %eax
-; X86-NEXT:    andl $1, %eax
-; X86-NEXT:    addl %ecx, %eax
-; X86-NEXT:    adcl $0, %edx
+; X86-NEXT:    addl %ebx, %eax
+; X86-NEXT:    adcl %ecx, %edx
 ; X86-NEXT:    popl %esi
 ; X86-NEXT:    popl %edi
 ; X86-NEXT:    popl %ebx
@@ -338,14 +321,11 @@ define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind {
 ;
 ; X64-LABEL: test_lsb_i64:
 ; X64:       # %bb.0:
-; X64-NEXT:    movq %rdi, %rcx
-; X64-NEXT:    sarq %rcx
-; X64-NEXT:    andl %esi, %edi
 ; X64-NEXT:    movq %rsi, %rax
-; X64-NEXT:    sarq %rax
-; X64-NEXT:    addq %rcx, %rax
-; X64-NEXT:    andl $1, %edi
-; X64-NEXT:    addq %rdi, %rax
+; X64-NEXT:    andq %rdi, %rax
+; X64-NEXT:    xorq %rdi, %rsi
+; X64-NEXT:    sarq %rsi
+; X64-NEXT:    addq %rsi, %rax
 ; X64-NEXT:    retq
   %s0 = ashr i64 %a0, 1
   %s1 = ashr i64 %a1, 1
diff --git a/llvm/test/CodeGen/X86/avgflooru-scalar.ll b/llvm/test/CodeGen/X86/avgflooru-scalar.ll
index 9ae4492bb4cd4..7ad10164ad484 100644
--- a/llvm/test/CodeGen/X86/avgflooru-scalar.ll
+++ b/llvm/test/CodeGen/X86/avgflooru-scalar.ll
@@ -40,24 +40,18 @@ define i8 @test_lsb_i8(i8 %a0, i8 %a1) nounwind {
 ; X86:       # %bb.0:
 ; X86-NEXT:    movzbl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movzbl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %eax, %edx
-; X86-NEXT:    shrb %dl
-; X86-NEXT:    andb %cl, %al
-; X86-NEXT:    shrb %cl
-; X86-NEXT:    addb %dl, %cl
-; X86-NEXT:    andb $1, %al
-; X86-NEXT:    addb %cl, %al
+; X86-NEXT:    addl %ecx, %eax
+; X86-NEXT:    shrl %eax
+; X86-NEXT:    # kill: def $al killed $al killed $eax
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: test_lsb_i8:
 ; X64:       # %bb.0:
-; X64-NEXT:    movl %edi, %eax
-; X64-NEXT:    shrb %al
-; X64-NEXT:    andb %sil, %dil
-; X64-NEXT:    shrb %sil
-; X64-NEXT:    addb %sil, %al
-; X64-NEXT:    andb $1, %dil
-; X64-NEXT:    addb %dil, %al
+; X64-NEXT:    movzbl %sil, %ecx
+; X64-NEXT:    movzbl %dil, %eax
+; X64-NEXT:    addl %ecx, %eax
+; X64-NEXT:    shrl %eax
+; X64-NEXT:    # kill: def $al killed $al killed $eax
 ; X64-NEXT:    retq
   %s0 = lshr i8 %a0, 1
   %s1 = lshr i8 %a1, 1
@@ -124,26 +118,17 @@ define i16 @test_lsb_i16(i16 %a0, i16 %a1) nounwind {
 ; X86:       # %bb.0:
 ; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl %eax, %edx
-; X86-NEXT:    shrl %edx
-; X86-NEXT:    andl %ecx, %eax
-; X86-NEXT:    shrl %ecx
-; X86-NEXT:    addl %edx, %ecx
-; X86-NEXT:    andl $1, %eax
 ; X86-NEXT:    addl %ecx, %eax
+; X86-NEXT:    shrl %eax
 ; X86-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: test_lsb_i16:
 ; X64:       # %bb.0:
-; X64-NEXT:    movzwl %si, %eax
-; X64-NEXT:    movzwl %di, %ecx
-; X64-NEXT:    shrl %ecx
-; X64-NEXT:    shrl %eax
+; X64-NEXT:    movzwl %si, %ecx
+; X64-NEXT:    movzwl %di, %eax
 ; X64-NEXT:    addl %ecx, %eax
-; X64-NEXT:    andl %esi, %edi
-; X64-NEXT:    andl $1, %edi
-; X64-NEXT:    addl %edi, %eax
+; X64-NEXT:    shrl %eax
 ; X64-NEXT:    # kill: def $ax killed $ax killed $eax
 ; X64-NEXT:    retq
   %s0 = lshr i16 %a0, 1
@@ -300,40 +285,23 @@ define i64 @test_fixed_i64(i64 %a0, i64 %a1) nounwind {
 define i64 @test_lsb_i64(i64 %a0, i64 %a1) nounwind {
 ; X86-LABEL: test_lsb_i64:
 ; X86:       # %bb.0:
-; X86-NEXT:    pushl %ebx
-; X86-NEXT:    pushl %edi
-; X86-NEXT:    pushl %esi
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %esi
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    movl {{[0-9]+}}(%esp), %edi
-; X86-NEXT:    movl %edi, %ebx
-; X86-NEXT:    shrl %ebx
-; X86-NEXT:    shldl $31, %eax, %edi
-; X86-NEXT:    movl %ecx, %edx
-; X86-NEXT:    shrl %edx
-; X86-NEXT:    shldl $31, %esi, %ecx
-; X86-NEXT:    addl %edi, %ecx
-; X86-NEXT:    adcl %ebx, %edx
-; X86-NEXT:    andl %esi, %eax
-; X86-NEXT:    andl $1, %eax
-; X86-NEXT:    addl %ecx, %eax
-; X86-NEXT:    adcl $0, %edx
-; X86-NEXT:    popl %esi
-; X86-NEXT:    popl %edi
-; X86-NEXT:    popl %ebx
+; X86-NEXT:    addl {{[0-9]+}}(%esp), %ecx
+; X86-NEXT:    adcl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    setb %dl
+; X86-NEXT:    movzbl %dl, %edx
+; X86-NEXT:    shldl $31, %eax, %edx
+; X86-NEXT:    shldl $31, %ecx, %eax
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: test_lsb_i64:
 ; X64:       # %bb.0:
-; X64-NEXT:    movq %rdi, %rcx
-; X64-NEXT:    shrq %rcx
-; X64-NEXT:    andl %esi, %edi
 ; X64-NEXT:    movq %rsi, %rax
-; X64-NEXT:    shrq %rax
-; X64-NEXT:    addq %rcx, %rax
-; X64-NEXT:    andl $1, %edi
-; X64-NEXT:    addq %rdi, %rax
+; X64-NEXT:    andq %rdi, %rax
+; X64-NEXT:    xorq %rdi, %rsi
+; X64-NEXT:    shrq %rsi
+; X64-NEXT:    addq %rsi, %rax
 ; X64-NEXT:    retq
   %s0 = lshr i64 %a0, 1
   %s1 = lshr i64 %a1, 1

@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 1, 2025

ping?

EVT VT = N0.getValueType();
SDValue A, B;

// FIXME: m_ReassociatableAdd can't handle m_Value/m_Deferred mixing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general comment / thinking out loud on this issue: I guess the reason it doesn't handle m_Value / m_Deferred in the same reassociatable expression right now is because reassociatable matching is done in two phases, in which individual m_Value & m_Deferred are completely detached
That being said, even we somehow make them not detached so that all m_Value & m_Deferred will be triggered in the same pattern matching expression / chain, how should we make sense of the permutation case where m_Deferred might appear before its m_Value (I guess this is related to your comment of "no gurantee on the order of matching")? More specifically, if m_Deferred appears before m_Value, the preceding m_Deferred might accidentally grab the bound value populated by the previous permutation.
I think we might be able to solve this problem by imposing some partial order during permutation for reassocitable expressions that contain m_Value & m_Deferred

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - hence I punted this to #169645 - I can find a workaround if necessary as a followup, but I'd much prefer to handle this completely with m_Reassociatable matchers if possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like @bermondd is working on this at #170061 (new contributors not being able to assign reviewers is REALLY annoying).

@RKSimon RKSimon requested a review from mshockwave December 9, 2025 15:20
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RKSimon RKSimon enabled auto-merge (squash) December 10, 2025 08:12
@RKSimon RKSimon merged commit 804e768 into llvm:main Dec 10, 2025
9 of 10 checks passed
@RKSimon RKSimon deleted the dag-avgfloor-reassoc branch December 10, 2025 13:51
@nikic
Copy link
Contributor

nikic commented Dec 10, 2025

Surprisingly, this change seems to have measurable compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=a108881b24ecfea8d194b33dd9fb211943065bca&to=804e768bda1697ce0449eec844bd9ac4f29aed64&stat=instructions:u

I suspect that ReassociatableOpc_match is very expensive. Might it make sense to make this less generic and just support 3-operand reassociation? That's all you need here, and in that case you only need to match two variants without any complex processing.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 10, 2025

CC @bermondd @mshockwave - thoughts?

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Dec 10, 2025
The use of nested m_Reassociatable matchers by llvm#169644 can result in high compile times as the inner m_Reassociatable call is being repeated a lot while the outer call is trying to match
@bermondd
Copy link
Contributor

bermondd commented Dec 10, 2025

I'll start working to make this more efficient tonight. These results look pretty bad.

My first guess is that the template recursion that was done may bear some of the fault for this compile time impact here. The number of functions instantiated grows linearly with the number of patterns provided, so maybe a different approach where this isn't necessary could help.

That being said, I personally think reducing the scope of ReassociatableOpc_match to only 3 patterns wouldn't be the right approach. The struct tried to match any number of patterns since prior to #170061, so reducing the domain could break any usages that involved 4 or more, if there are any.

@mikaelholmen
Copy link
Collaborator

mikaelholmen commented Dec 11, 2025

I've seen errors like

LLVM ERROR: SmallVector unable to grow. Requested capacity (4294967296) is larger than maximum value for size type (4294967295)

when compiling for my out of tree target with this patch. When it fails (after a long time) there are a lot of

#117 0x0000560facf1edc5 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 3u>&) DAGCombiner.cpp:0:0
#118 0x0000560facf1edc5 llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::BinaryOpc_match<llvm::SDPatternMatch::Value_bind, llvm::SDPatternMatch::Ones_match, false, false>, llvm::SDPatternMatch::ReassociatableOpc_match<llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::DeferredValue_match, llvm::SDPatternMatch::Ones_match>>::collectLeaves(llvm::SDValue, llvm::SmallVector<llvm::SDValue, 3u>&) DAGCombiner.cpp:0:0

calls on the stack.
Unfortunately I don't have a reproducer to share (at the moment at least).

@RKSimon
Copy link
Collaborator Author

RKSimon commented Dec 11, 2025

That can happen in cases where SmallVector<> screws up the CalculateSmallVectorDefaultInlinedElements calculation - please can you try with this local change:

diff --git a/llvm/include/llvm/CodeGen/SDPatternMatch.h b/llvm/include/llvm/CodeGen/SDPatternMatch.h
index dda3b3827c7a..4fe4fdcc023e 100644
--- a/llvm/include/llvm/CodeGen/SDPatternMatch.h
+++ b/llvm/include/llvm/CodeGen/SDPatternMatch.h
@@ -1310,7 +1310,7 @@ template <typename... PatternTs> struct ReassociatableOpc_match {
   bool match(const MatchContext &Ctx, SDValue N) {
     constexpr size_t NumPatterns = std::tuple_size_v<std::tuple<PatternTs...>>;

-    SmallVector<SDValue> Leaves;
+    SmallVector<SDValue, 4> Leaves;
     collectLeaves(N, Leaves);
     if (Leaves.size() != NumPatterns)
       return false;
@@ -1323,7 +1323,7 @@ template <typename... PatternTs> struct ReassociatableOpc_match {
         Patterns);
   }

-  void collectLeaves(SDValue V, SmallVector<SDValue> &Leaves) {
+  void collectLeaves(SDValue V, SmallVectorImpl<SDValue> &Leaves) {
     if (V->getOpcode() == Opcode) {
       for (size_t I = 0, N = V->getNumOperands(); I < N; I++)
         collectLeaves(V->getOperand(I), Leaves);

(note @bermondd is working on removing the SmallVector entirely)

@mikaelholmen
Copy link
Collaborator

That can happen in cases where SmallVector<> screws up the CalculateSmallVectorDefaultInlinedElements calculation - please can you try with this local change:

Thanks, unfortunately it didn't help.

(note @bermondd is working on removing the SmallVector entirely)

Sounds good. Maybe I'll just wait and see how it behaves after that improvement then.
(We reverted the patch downstream so we're not suffering right now, I just hope there won't be other conflicting changes to that code before the problem is fixed.)

@bermondd
Copy link
Contributor

(note @bermondd is working on removing the SmallVector entirely)

I'll open a PR for this ASAP. I have further changes in mind to improve compile time, but maybe it's better to separate this into two PRs, so at least this reported issue is fixed sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 llvm:SelectionDAG SelectionDAGISel as well

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suboptimal code generation for unsigned non-overflowing average

6 participants