[X86] Generate `vpmuludq` instead of `vpmullq` #121456

abhishek-kaushik22 · 2025-01-02T07:37:34Z

When lowering _mm512_mul_epu32 intrinsic if the generated value if later used in a vector shuffle we generate vpmullq instead of vpmuludq (https://godbolt.org/z/WbaGMqs8e) because SimplifyDemandedVectorElts simplifies the arguments and we fail the combine to PMULDQ.

Added an override to shouldSimplifyDemandedVectorElts in X86TargetLowering to check if we can combine the MUL to PMULUDQ first.

When lowering `_mm512_mul_epu32` intrinsic if the generated value if later used in a vector shuffle we generate `vpmullq` instead of `vpmuludq` (https://godbolt.org/z/WbaGMqs8e) because `SimplifyDemandedVectorElts` simplifies the arguments and we fail the combine to `PMULDQ`. Added an override to `shouldSimplifyDemandedVectorElts` in `X86TargetLowering` to check if we can combine the `MUL` to `PMULDQ` first.

llvmbot · 2025-01-02T07:38:05Z

@llvm/pr-subscribers-backend-x86

Author: None (abhishek-kaushik22)

Changes

When lowering _mm512_mul_epu32 intrinsic if the generated value if later used in a vector shuffle we generate vpmullq instead of vpmuludq (https://godbolt.org/z/WbaGMqs8e) because SimplifyDemandedVectorElts simplifies the arguments and we fail the combine to PMULDQ.

Added an override to shouldSimplifyDemandedVectorElts in X86TargetLowering to check if we can combine the MUL to PMULDQ first.

Full diff: https://github.com/llvm/llvm-project/pull/121456.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+21)
(modified) llvm/lib/Target/X86/X86ISelLowering.h (+3)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a0514e93d6598b..e104264bcbf918 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -60832,3 +60832,24 @@ Align X86TargetLowering::getPrefLoopAlignment(MachineLoop *ML) const {
     return Align(1ULL << ExperimentalPrefInnermostLoopAlignment);
   return TargetLowering::getPrefLoopAlignment();
 }
+
+bool X86TargetLowering::shouldSimplifyDemandedVectorElts(
+    SDValue Op, const TargetLoweringOpt &TLO) const {
+  if (Op.getOpcode() == ISD::VECTOR_SHUFFLE) {
+    SDValue V0 = peekThroughBitcasts(Op.getOperand(0));
+    SDValue V1 = peekThroughBitcasts(Op.getOperand(1));
+
+    if (V0.getOpcode() == ISD::MUL || V1.getOpcode() == ISD::MUL) {
+      SDNode *Mul = V0.getOpcode() == ISD::MUL ? V0.getNode() : V1.getNode();
+      SelectionDAG &DAG = TLO.DAG;
+      const X86Subtarget &Subtarget = DAG.getSubtarget<X86Subtarget>();
+      const SDLoc DL(Mul);
+
+      if (SDValue V = combineMulToPMULDQ(Mul, DL, DAG, Subtarget)) {
+        DAG.ReplaceAllUsesWith(Mul, V.getNode());
+        return false;
+      }
+    }
+  }
+  return true;
+}
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 2b7a8eaf249d83..0a6cd53f557bb2 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1207,6 +1207,9 @@ namespace llvm {
 
     bool hasBitTest(SDValue X, SDValue Y) const override;
 
+    bool shouldSimplifyDemandedVectorElts(
+        SDValue Op, const TargetLoweringOpt &TLO) const override;
+
     bool shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd(
         SDValue X, ConstantSDNode *XC, ConstantSDNode *CC, SDValue Y,
         unsigned OldShiftOpcode, unsigned NewShiftOpcode,

abhishek-kaushik22 · 2025-01-02T07:43:26Z

@phoebewang @e-kud @RKSimon can you please review?

llvm/test/CodeGen/X86/pr121456.ll

phoebewang · 2025-01-02T08:19:55Z

because SimplifyDemandedVectorElts simplifies the arguments and we fail the combine to PMULDQ.

Is it possible to combine the new patten instead of disble SimplifyDemandedVectorElts?

abhishek-kaushik22 · 2025-01-02T08:31:43Z

because SimplifyDemandedVectorElts simplifies the arguments and we fail the combine to PMULDQ.

Is it possible to combine the new patten instead of disble SimplifyDemandedVectorElts?

I'm not sure. We start with

t2: v8i64,ch = CopyFromReg t0, Register:v8i64 %0
t4: v8i64,ch = CopyFromReg t0, Register:v8i64 %1
t6: v8i64 = BUILD_VECTOR Constant:i64<4294967295>, Constant:i64<4294967295>, Constant:i64<4294967295>, Constant:i64<4294967295>, Constant:i64<4294967295>, Constant:i64<4294967295>, Constant:i64<4294967295>, Constant:i64<4294967295>
t7: v8i64 = and t2, t6
t8: v8i64 = and t4, t6
t9: v8i64 = mul nuw t8, t7

and this gets replaced by

t2: v8i64,ch = CopyFromReg t0, Register:v8i64 %0
t4: v8i64,ch = CopyFromReg t0, Register:v8i64 %1
t9: v8i64 = mul t4, t2

Is this safe to replace with PMULUDQ? We don't have any info about the mul being unsigned here.

RKSimon

I'm not sure shouldSimplifyDemandedVectorElts is the best way to handle this tbh

RKSimon · 2025-01-02T11:26:50Z

llvm/test/CodeGen/X86/pr121456.ll

+  %4 = shufflevector <16 x i32> <i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison, i32 0, i32 poison>, <16 x i32> %3, <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
+  %5 = bitcast <16 x i32> %4 to <8 x i64>
+  ret <8 x i64> %5
+}


Won't this be happening for v2i64/v4i64 as well on avx512dq targets? Please can you add test coverage for those as well.

RKSimon · 2025-01-02T11:29:39Z

llvm/test/CodeGen/X86/pr121456.ll

+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512dq | FileCheck %s
+
+define <8 x i64> @pr121456(<8 x i64> %a, <8 x i64> %b) {


Don't name the test file after a pull request - "pr" is the old llvm term for problem report - the number should be based off a reported issue number (is there one?) - otherwise I'd probably suggest adding these tests to combine-pmuldq.ll instead

RKSimon · 2025-01-02T11:33:23Z

llvm/lib/Target/X86/X86ISelLowering.cpp

+
+bool X86TargetLowering::shouldSimplifyDemandedVectorElts(
+    SDValue Op, const TargetLoweringOpt &TLO) const {
+  if (Op.getOpcode() == ISD::VECTOR_SHUFFLE) {


This isn't going to work in the general case, it will just help shuffles:
https://llvm.godbolt.org/z/no1Gc6fzT

llvmbot added the backend:X86 label Jan 2, 2025

abhishek-kaushik22 added 2 commits January 2, 2025 13:11

Add test

a8fc420

Merge branch 'main' into pmul

58775ec

phoebewang reviewed Jan 2, 2025

View reviewed changes

llvm/test/CodeGen/X86/pr121456.ll Outdated Show resolved Hide resolved

Remove -O3 from lit test

225b0c4

RKSimon self-requested a review January 2, 2025 11:22

RKSimon reviewed Jan 2, 2025

View reviewed changes

abhishek-kaushik22 closed this Jun 27, 2025

abhishek-kaushik22 deleted the pmul branch June 27, 2025 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Generate `vpmuludq` instead of `vpmullq` #121456

[X86] Generate `vpmuludq` instead of `vpmullq` #121456

Uh oh!

abhishek-kaushik22 commented Jan 2, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 2, 2025

Uh oh!

abhishek-kaushik22 commented Jan 2, 2025

Uh oh!

Uh oh!

phoebewang commented Jan 2, 2025

Uh oh!

abhishek-kaushik22 commented Jan 2, 2025

Uh oh!

RKSimon left a comment

Uh oh!

RKSimon Jan 2, 2025

Uh oh!

RKSimon Jan 2, 2025 •

edited

Loading

Uh oh!

RKSimon Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[X86] Generate vpmuludq instead of vpmullq #121456

[X86] Generate vpmuludq instead of vpmullq #121456

Uh oh!

Conversation

abhishek-kaushik22 commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 2, 2025

Uh oh!

abhishek-kaushik22 commented Jan 2, 2025

Uh oh!

Uh oh!

phoebewang commented Jan 2, 2025

Uh oh!

abhishek-kaushik22 commented Jan 2, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

RKSimon Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RKSimon Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[X86] Generate `vpmuludq` instead of `vpmullq` #121456

[X86] Generate `vpmuludq` instead of `vpmullq` #121456

abhishek-kaushik22 commented Jan 2, 2025 •

edited

Loading

RKSimon Jan 2, 2025 •

edited

Loading