-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[DAG] Fold (umin (sub a b) a) -> (usubo a b); (select usubo.1 a usubo.0) #161651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/cc @RKSimon |
|
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-backend-x86 Author: Chaitanya Koparkar (ckoparkar) ChangesFixes #161036. Full diff: https://github.com/llvm/llvm-project/pull/161651.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 558c5a0390228..99d7000c3b62e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -6199,6 +6199,28 @@ SDValue DAGCombiner::visitIMINMAX(SDNode *N) {
SDLoc(N), VT, N0, N1))
return SD;
+ // (umin (sub a, b) a) -> (usubo a, b); (select usubo.1, a, usubo.0)
+ //
+ // IR:
+ // %sub = sub %a, %b
+ // %cond = umin %sub, %a
+ // ->
+ // %usubo = usubo %a, %b
+ // %overflow = extractvalue %usubo, 1
+ // %sub = extractvalue %usubo, 0
+ // %cond = select %overflow, %a, %sub
+ if (N0.getOpcode() == ISD::SUB) {
+ SDValue A, B, C;
+ if (sd_match(N, m_UMin(m_Sub(m_Value(A), m_Value(B)), m_Value(C)))) {
+ EVT AVT = A.getValueType();
+ if (A == C && TLI.isOperationLegalOrCustom(ISD::USUBO, AVT)) {
+ SDVTList VTs = DAG.getVTList(AVT, MVT::i1);
+ SDValue USO = DAG.getNode(ISD::USUBO, DL, VTs, A, B);
+ return DAG.getSelect(DL, VT, USO.getValue(1), A, USO.getValue(0));
+ }
+ }
+ }
+
// Simplify the operands using demanded-bits information.
if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);
diff --git a/llvm/test/CodeGen/X86/underflow-compare-fold.ll b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
new file mode 100644
index 0000000000000..2416bcb909485
--- /dev/null
+++ b/llvm/test/CodeGen/X86/underflow-compare-fold.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s
+
+; GitHub issue #161036
+
+define i64 @subIfNoUnderflow_umin(i64 %a, i64 %b) {
+; CHECK-LABEL: subIfNoUnderflow_umin
+; CHECK-LABEL: %bb.0
+; CHECK-NEXT: movq %rdi, %rax
+; CHECK-NEXT: subq %rsi, %rax
+; CHECK-NEXT: cmovbq %rdi, %rax
+; CHECK-NEXT: retq
+entry:
+ %sub = sub i64 %a, %b
+ %cond = tail call i64 @llvm.umin.i64(i64 %sub, i64 %a)
+ ret i64 %cond
+}
|
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add aarch64 test coverage
|
@RKSimon this is ready for another review. |
|
Apologies, I was traveling for a few days. I'll update this today. |
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you update against trunk latest and investigate the CI failure
Previously the code was: Once I removed the
I've added some tests with vector types, incorrect patterns, etc. @arsenm also mentioned a test for multiple use. I didn't guard against multiple uses here since I thought the results of |
* main: (3124 commits) [X86] narrowBitOpRMW - add handling for single bit insertion patterns (llvm#165742) [gn build] Port 5322fb6 [libc++] Simplify the implementation of destroy_at a bit (llvm#165392) [MLIR][NVVM] Update mbarrier.init/inval Ops to use AnyTypeOf[] (llvm#165558) [X86] Remove AMX-TRANSPOSE (llvm#165556) [CIR] Fix multiple returns in switch statements (llvm#164468) [lld][test] Fix file cleanup in aarch64-build-attributes.s (llvm#164396) [X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure there isn't an interdependency between the load and amt (llvm#165850) [llvm][docs] Remove guidance on adding release:reviewed label (llvm#164395) [libc++] Update our documentation on the supported compilers (llvm#165684) [AMDGPU][GlobalISel] Add register bank legalization for G_FADD (llvm#163407) [LLVM][ConstantFolding] Extend constantFoldVectorReduce to include scalable vectors. (llvm#165437) [SDAG] Set InBounds when when computing offsets into memory objects (llvm#165425) [llvm][DebugInfo][ObjC] Make sure we link backing ivars to their DW_TAG_APPLE_property (llvm#165409) [NFCI] Address post-merge review of llvm#162503 (llvm#165582) [clang][tools][scan-view] Remove Python2 compatibility code in ScanView.py (llvm#163747) [lldb][TypeSystem] Remove count parameter from TypeSystem::IsFloatingPointType (llvm#165707) [llvm][tools][opt-viewer] Put back missing function [llvm][tools][opt-viewer] Remove Python2 compatability code in optrecord.py (llvm#163744) [clang][utils] Make CmpDriver Python3 compatible (llvm#163740) ...
* main: [SPIRV] Fix vector bitcast check in LegalizePointerCast (llvm#164997) [lldb][docs] Add troubleshooting section to scripting introduction [Sema] Fix parameter index checks on explicit object member functions (llvm#165586) To fix polymorphic pointer assignment in FORALL when LHS is unlimited polymorphic and RHS is intrinsic type target (llvm#164999) [CostModel][AArch64] Model cost of extract.last.active intrinsic (clastb) (llvm#165739) [MemProf] Select largest of matching contexts from profile (llvm#165338) [lldb][TypeSystem] Better support for _BitInt types (llvm#165689) [NVPTX] Move TMA G2S lowering to Tablegen (llvm#165710) [MLIR][NVVM] Extend NVVM mma ops to support fp64 (llvm#165380) [UTC] Support to test annotated IR (llvm#165419)
* main: (1028 commits) [clang][DebugInfo] Attach `DISubprogram` to additional call variants (llvm#166202) [C2y] Claim nonconformance to WG14 N3348 (llvm#166966) [X86] 2012-01-10-UndefExceptionEdge.ll - regenerate test checks (llvm#167307) Remove unused standard headers: <string>, <optional>, <numeric>, <tuple> (llvm#167232) [DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (llvm#166855) [VPlan] Don't apply predication discount to non-originally-predicated blocks (llvm#160449) [libc++] Avoid overloaded `operator,` for (`T`, `Iter`) cases (llvm#161049) [tools][llc] Make save-stats.ll test target independent (llvm#167238) [AArch64] Fallback to PRFUM for PRFM with negative or unaligned offset (llvm#166756) [X86] ldexp-avx512.ll - add v8f16/v16f16/v32f16 test coverage for llvm#165694 (llvm#167294) [DropAssumes] Drop dereferenceable assumptions after vectorization. (llvm#166947) [VPlan] Simplify branch-cond with getVectorTripCount (llvm#155604) Remove unused <algorithm> inclusion (llvm#166942) [AArch64] Combine subtract with borrow to SBC. (llvm#165271) [AArch64][SVE] Avoid redundant extend of unsigned i8/i16 extracts. (llvm#165863) [SPIRV] Fix failing assertion in SPIRVAsmPrinter (llvm#166909) [libc++] Merge insert/emplace(const_iterator, Args...) implementations (llvm#166470) [libc++] Replace __libcpp_is_final with a variable template (llvm#167137) [gn build] Port 152bda7 [libc++] Replace the last uses of __tuple_types with __type_list (llvm#167214) ...
|
To fix the test failure I need to update the patterns in |
* main: (63 commits) [libc++] Inline vector::__append into resize (llvm#162086) [Flang][OpenMP] Move char box bounds generation for Maps to DirectiveCommons.h (llvm#165918) RuntimeLibcalls: Add entries for vector sincospi functions (llvm#166981) [X86] _mm_addsub_pd is not valid for constexpr (llvm#167363) [CIR] Re-land: Recognize constant aggregate initialization of auto vars (llvm#167033) [gn build] Port d2521f1 [gn build] Port caed089 [gn build] Port 315d705 [gn build] Port 2345b7d [PowerPC] convert memmove to milicode call .___memmove64[PR] in 64-bit mode (llvm#167334) [HLSL] Add internal linkage attribute to resources (llvm#166844) AMDGPU: Update test after e95f6fa [bazel] Port llvm#166980: TLI/VectorLibrary refactor (llvm#167354) [libc++] Split macros related to hardening into their own header (llvm#167069) [libc++][NFC] Remove unused imports from generate_feature_test_macro_components.py (llvm#159591) [CIR][NFC] Add test for Complex imag with GUN extension (llvm#167215) [BOLT][AArch64] Add more heuristics on epilogue determination (llvm#167077) RegisterCoalescer: Enable terminal rule by default for AMDGPU (llvm#161621) Revert "[clang] Refactor option-related code from clangDriver into new clangOptions library" (llvm#167348) [libc++][docs] Update to refer to P3355R2 (llvm#167267) ...
RKSimon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
This change broke a significant number of cases in ffmpeg, across multiple architectures. One way of reproducing is this: $ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang
$ make -j$(nproc)
$ make fate-vsynth1-flvThis produces a binary that crashes for this testcase. On x86_64, the object file where the breakage is, is in Compilation of that particular object file (the one on x86) can be done standalone with https://martin.st/temp/videodsp_init-preproc.c - with If comparing the difference in generated code caused by this patch, I get this: --- out-good.s 2025-11-13 11:30:40.037589941 +0200
+++ out-bad.s 2025-11-13 11:30:33.665583286 +0200
@@ -132,15 +132,13 @@
addq %r10, %r11
movq %rax, %rbx
.LBB1_12: # %if.end29.i
+ xorl %eax, %eax
subq %r8, %r9
negq %r8
- xorl %eax, %eax
- testq %r8, %r8
- cmovleq %rax, %r8
+ cmovbq %rax, %r8
movq %rbx, %r13
negq %r13
- testq %r13, %r13
- cmovleq %rax, %r13
+ cmovbq %rax, %r13
cmpq %r9, %r14
cmovlq %r14, %r9
subq %rbx, %rbp
@@ -374,15 +372,13 @@
addq %r10, %r11
movq %rax, %rbx
.LBB2_12: # %if.end29.i
+ xorl %eax, %eax
subq %r8, %r9
negq %r8
- xorl %eax, %eax
- testq %r8, %r8
- cmovleq %rax, %r8
+ cmovbq %rax, %r8
movq %rbx, %r13
negq %r13
- testq %r13, %r13
- cmovleq %rax, %r13
+ cmovbq %rax, %r13
cmpq %r9, %r14
cmovlq %r14, %r9
subq %rbx, %rbpI would suggest reverting this commit for now. |
…ect usubo.1 a usubo.0)" (#167854) Reverts llvm/llvm-project#161651 due to downstream bad codegen reports
Baseline tests from llvm#161651 that were reverted in llvm#167854 Still missing test coverage for the ffmpeg regression failures
Baseline tests from llvm#161651 that were reverted in llvm#167854 Still missing test coverage for the ffmpeg regression failures
Fixes #161036.