[Hexagon] Passes for widening vector operations and shuffle opt #169559

fhossein-quic · 2025-11-25T19:58:23Z

Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization.

Co-authored-by: Jyotsna Verma [email protected]
Co-authored-by: Yashas Andaluri [email protected]
Co-authored-by: Fateme Hosseini [email protected]
Co-authored-by: Muntasir Mallick [email protected]
Co-authored-by: Tatiana Larina [email protected]
Co-authored-by: Kaushik Kulkarni [email protected]

Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35

github-actions · 2025-11-25T20:00:09Z

✅ With the latest revision this PR passed the undef deprecator.

github-actions · 2025-11-25T20:00:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

github-actions · 2025-11-25T21:09:21Z

🐧 Linux x64 Test Results

186828 tests passed
4893 tests skipped

llvmbot · 2025-11-28T23:15:03Z

@llvm/pr-subscribers-llvm-ir

Author: Fateme Hosseini (fhossein-quic)

Changes

Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization.

Co-authored-by: Jyotsna Verma <[email protected]>
Co-authored-by: Yashas Andaluri <[email protected]>
Co-authored-by: Fateme Hosseini <[email protected]>
Co-authored-by: Muntasir Mallick <[email protected]>
Co-authored-by: Tatiana Larina <[email protected]>
Co-authored-by: Kaushik Kulkarni <[email protected]>

Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35

Patch is 255.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169559.diff

37 Files Affected:

(modified) llvm/include/llvm/IR/IntrinsicsHexagon.td (+79-1)
(modified) llvm/include/llvm/IR/IntrinsicsHexagonDep.td (-14)
(modified) llvm/lib/Target/Hexagon/CMakeLists.txt (+3)
(modified) llvm/lib/Target/Hexagon/Hexagon.h (+4)
(modified) llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp (+1-2)
(added) llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp (+565)
(added) llvm/lib/Target/Hexagon/HexagonGenWideningVecInstr.cpp (+1184)
(modified) llvm/lib/Target/Hexagon/HexagonISelLowering.h (+1)
(modified) llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp (+110)
(modified) llvm/lib/Target/Hexagon/HexagonIntrinsics.td (+114)
(modified) llvm/lib/Target/Hexagon/HexagonNewValueJump.cpp (+1-1)
(added) llvm/lib/Target/Hexagon/HexagonOptShuffleVector.cpp (+713)
(modified) llvm/lib/Target/Hexagon/HexagonPatternsHVX.td (+12)
(modified) llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp (+17)
(modified) llvm/lib/Target/Hexagon/HexagonVectorCombine.cpp (+21-22)
(modified) llvm/lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.cpp (+5)
(modified) llvm/test/CodeGen/Hexagon/autohvx/isel-vpackew.ll (+11-15)
(modified) llvm/test/CodeGen/Hexagon/autohvx/widen-setcc.ll (+1-3)
(added) llvm/test/CodeGen/Hexagon/bug54537-vavg.ll (+20)
(added) llvm/test/CodeGen/Hexagon/extend-multiply-for-output-fpext.ll (+16)
(added) llvm/test/CodeGen/Hexagon/no_widening_of_bf16_vecmul.ll (+60)
(added) llvm/test/CodeGen/Hexagon/shortvec-vasrsat.ll (+68)
(added) llvm/test/CodeGen/Hexagon/shortvec-vavg.ll (+20)
(added) llvm/test/CodeGen/Hexagon/shortvec-vmpy.ll (+27)
(added) llvm/test/CodeGen/Hexagon/vadd-const.ll (+114)
(added) llvm/test/CodeGen/Hexagon/vasr-sat.ll (+66)
(added) llvm/test/CodeGen/Hexagon/vavg.ll (+33)
(added) llvm/test/CodeGen/Hexagon/vec-shuff-invalid-operand.ll (+32)
(added) llvm/test/CodeGen/Hexagon/vec-shuff-multi-uses.ll (+290)
(added) llvm/test/CodeGen/Hexagon/vec-shuff2.ll (+106)
(added) llvm/test/CodeGen/Hexagon/vmpa.ll (+64)
(added) llvm/test/CodeGen/Hexagon/vmpy-const.ll (+273)
(added) llvm/test/CodeGen/Hexagon/vmpy-qfp-const.ll (+71)
(added) llvm/test/CodeGen/Hexagon/vsub-const.ll (+112)
(added) llvm/test/CodeGen/Hexagon/widening-float-vec.ll (+15)
(added) llvm/test/CodeGen/Hexagon/widening-vec.ll (+96)
(added) llvm/test/CodeGen/Hexagon/widening-vec2.ll (+23)

diff --git a/llvm/include/llvm/IR/IntrinsicsHexagon.td b/llvm/include/llvm/IR/IntrinsicsHexagon.td
index 20ba51ade35a7..2c945d2399b25 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagon.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagon.td
@@ -14,7 +14,7 @@
 //
 // All Hexagon intrinsics start with "llvm.hexagon.".
 let TargetPrefix = "hexagon" in {
-  /// Hexagon_Intrinsic - Base class for the majority of Hexagon intrinsics.
+  /// Hexagon_Intrinsic - Base class for majority of Hexagon intrinsics.
   class Hexagon_Intrinsic<string GCCIntSuffix, list<LLVMType> ret_types,
                               list<LLVMType> param_types,
                               list<IntrinsicProperty> properties>
@@ -435,6 +435,84 @@ def int_hexagon_V6_vmaskedstorenq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentnq_128B: Hexagon_custom_vms_Intrinsic_128B;
 
+// Carryo
+// The script can't autogenerate clang builtins for vaddcarryo/vsubarryo,
+// and they are marked in HexagonIset.py as not having intrinsics at all.
+// The script could generate intrinsics, but instead of doing intrinsics
+// without builtins, just put the intrinsics here.
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
+       intr_properties>;
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
+       intr_properties>;
+
+// Pseudo intrinsics for widening vector isntructions that
+// get replaced with the real Hexagon instructions during
+// instruction lowering.
+class Hexagon_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, LLVMTruncatedType<0>],
+       [IntrNoMem]>;
+
+class Hexagon_non_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMMatchType<0>, LLVMMatchType<0>],
+       [IntrNoMem]>;
+
+// Widening vector add
+def int_hexagon_vadd_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_us: Hexagon_widenvec_Intrinsic;
+
+
+// Widening vector subtract
+def int_hexagon_vsub_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_us: Hexagon_widenvec_Intrinsic;
+
+// Widening vector multiply
+def int_hexagon_vmpy_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_us: Hexagon_widenvec_Intrinsic;
+
+def int_hexagon_vavgu: Hexagon_non_widenvec_Intrinsic;
+def int_hexagon_vavgs: Hexagon_non_widenvec_Intrinsic;
+
+class Hexagon_vasr_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [LLVMSubdivide2VectorType<0>],
+       [llvm_anyvector_ty, LLVMMatchType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+def int_hexagon_vasrsat_su: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_uu: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_ss: Hexagon_vasr_Intrinsic;
+
+class Hexagon_widen_vec_scalar_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+// Widening vector scalar multiply
+def int_hexagon_vmpy_ub_b: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_ub_ub: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_uh_uh: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_h_h: Hexagon_widen_vec_scalar_Intrinsic;
 
 // Intrinsic for instrumentation based profiling using a custom handler. The
 // name of the handler is passed as the first operand to the intrinsic. The
diff --git a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
index dde4132791f06..2a673603e4e03 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
@@ -491,20 +491,6 @@ class Hexagon_custom_v32i32v128i1_v32i32v32i32v128i1_Intrinsic_128B<
        [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty,llvm_v128i1_ty],
        intr_properties>;
 
-// tag : V6_vaddcarryo
-class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
-       intr_properties>;
-
-// tag : V6_vaddcarryo
-class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
-       intr_properties>;
-
 // tag : V6_vaddcarrysat
 class Hexagon_v16i32_v16i32v16i32v64i1_Intrinsic<string GCCIntSuffix,
       list<IntrinsicProperty> intr_properties = [IntrNoMem]>
diff --git a/llvm/lib/Target/Hexagon/CMakeLists.txt b/llvm/lib/Target/Hexagon/CMakeLists.txt
index 1a5f09642ea66..eddab5a235dab 100644
--- a/llvm/lib/Target/Hexagon/CMakeLists.txt
+++ b/llvm/lib/Target/Hexagon/CMakeLists.txt
@@ -37,6 +37,8 @@ add_llvm_target(HexagonCodeGen
   HexagonGenMemAbsolute.cpp
   HexagonGenMux.cpp
   HexagonGenPredicate.cpp
+  HexagonGenWideningVecFloatInstr.cpp
+  HexagonGenWideningVecInstr.cpp
   HexagonHardwareLoops.cpp
   HexagonHazardRecognizer.cpp
   HexagonInstrInfo.cpp
@@ -53,6 +55,7 @@ add_llvm_target(HexagonCodeGen
   HexagonNewValueJump.cpp
   HexagonOptAddrMode.cpp
   HexagonOptimizeSZextends.cpp
+  HexagonOptShuffleVector.cpp
   HexagonPeephole.cpp
   HexagonQFPOptimizer.cpp
   HexagonRDFOpt.cpp
diff --git a/llvm/lib/Target/Hexagon/Hexagon.h b/llvm/lib/Target/Hexagon/Hexagon.h
index 422ab20891b94..b98369d1b3e30 100644
--- a/llvm/lib/Target/Hexagon/Hexagon.h
+++ b/llvm/lib/Target/Hexagon/Hexagon.h
@@ -92,6 +92,9 @@ FunctionPass *createHexagonGenInsert();
 FunctionPass *createHexagonGenMemAbsolute();
 FunctionPass *createHexagonGenMux();
 FunctionPass *createHexagonGenPredicate();
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+FunctionPass *createHexagonGenWideningVecInstr(const HexagonTargetMachine &);
 FunctionPass *createHexagonHardwareLoops();
 FunctionPass *createHexagonISelDag(HexagonTargetMachine &TM,
                                    CodeGenOptLevel OptLevel);
@@ -102,6 +105,7 @@ FunctionPass *createHexagonMergeActivateWeight();
 FunctionPass *createHexagonNewValueJump();
 FunctionPass *createHexagonOptAddrMode();
 FunctionPass *createHexagonOptimizeSZextends();
+FunctionPass *createHexagonOptShuffleVector(const HexagonTargetMachine &);
 FunctionPass *createHexagonPacketizer(bool Minimal);
 FunctionPass *createHexagonPeephole();
 FunctionPass *createHexagonRDFOpt();
diff --git a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
index 5344ed8446efc..412d58743df94 100644
--- a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
@@ -51,8 +51,7 @@ struct PrintRegister {
 };
 
 [[maybe_unused]] raw_ostream &operator<<(raw_ostream &OS,
-                                         const PrintRegister &PR);
-raw_ostream &operator<<(raw_ostream &OS, const PrintRegister &PR) {
+                                         const PrintRegister &PR) {
   return OS << printReg(PR.Reg.Reg, &PR.TRI, PR.Reg.SubReg);
 }
 
diff --git a/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
new file mode 100644
index 0000000000000..7271f1f839d69
--- /dev/null
+++ b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
@@ -0,0 +1,565 @@
+//===------------------- HexagonGenWideningVecFloatInstr.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Replace widening vector float operations with hexagon intrinsics.
+//
+//===----------------------------------------------------------------------===//
+//
+// Brief overview of working of GenWideningVecFloatInstr pass.
+// This version of pass is replica of already existing pass(which will replace
+// widen vector integer operations with it's respective intrinsics). In this
+// pass we will generate hexagon intrinsics for widen vector float instructions.
+//
+// Example1(64 vector-width widening):
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %1 = fpext <64 x half> %wide.load to <64 x float>
+// %3 = fpext <64 x half> %wide.load53 to <64 x float>
+// %4 = fmul <64 x float> %1, %3
+//
+// If we run this pass on the above example, it will first find fmul
+// instruction, and then it will check whether the operands of fmul instruction
+// (%1 and %3) belongs to either of these categories [%1 ->fpext, %3 ->fpext]
+// or [%1 ->fpext, %3 ->constant_vector] or [%1 ->constant_vector, %3 ->fpext].
+// If it sees such pattern, then this pass will replace such pattern with
+// appropriate hexagon intrinsics.
+//
+// After replacement:
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %3 = bitcast <64 x half> %wide.load to <32 x i32>
+// %4 = bitcast <64 x half> %wide.load53 to <32 x i32>
+// %5 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %4)
+// %6 = shufflevector <64 x i32> %5, <64 x i32> poison, <64 x i32> ShuffMask1
+// %7 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %6)
+// %8 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %6)
+// %9 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %7)
+// %10 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %8)
+// %11 = bitcast <32 x i32> %9 to <32 x float>
+// %12 = bitcast <32 x i32> %10 to <32 x float>
+// %13 = shufflevector <32 x float> %12, <32 x float> %11, <64 x i32> ShuffMask2
+//
+//
+//
+// Example2(128 vector-width widening):
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = fpext <128 x half> %wide.load to <128 x float>
+// %2 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %2, align 2
+// %3 = fpext <128 x half> %wide.load2 to <128 x float>
+// %4 = fmul <128 x float> %1, %3
+//
+// After replacement:
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %1, align 2
+// %2 = bitcast <128 x half> %wide.load to <64 x i32>
+// %3 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %2)
+// %4 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %2)
+// %5 = bitcast <128 x half> %wide.load2 to <64 x i32>
+// %6 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %5)
+// %7 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %5)
+// %8 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %6)
+// %9 = shufflevector <64 x i32> %8, <64 x i32> poison, <64 x i32> Mask1
+// %10 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %9)
+// %11 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %9)
+// %12 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %10)
+// %13 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %11)
+// %14 = bitcast <32 x i32> %12 to <32 x float>
+// %15 = bitcast <32 x i32> %13 to <32 x float>
+// %16 = shufflevector <32 x float> %15, <32 x float> %14, <64 x i32> Mask2
+// %17 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%4, %7)
+// %18 = shufflevector <64 x i32> %17, <64 x i32> poison, <64 x i32> Mask1
+// %19 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %18)
+// %20 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %18)
+// %21 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %19)
+// %22 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %20)
+// %23 = bitcast <32 x i32> %21 to <32 x float>
+// %24 = bitcast <32 x i32> %22 to <32 x float>
+// %25 = shufflevector <32 x float> %24, <32 x float> %23, <64 x i32> Mask2
+// %26 = shufflevector <64 x float> %25, <64 x float> %16, <128 x i32> Mask3
+//
+//
+//===----------------------------------------------------------------------===//
+#include "HexagonTargetMachine.h"
+#include "llvm/ADT/APInt.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicsHexagon.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/IR/Type.h"
+#include "llvm/IR/Value.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include <algorithm>
+#include <utility>
+
+using namespace llvm;
+
+namespace llvm {
+void initializeHexagonGenWideningVecFloatInstrPass(PassRegistry &);
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+} // end namespace llvm
+
+namespace {
+
+class HexagonGenWideningVecFloatInstr : public FunctionPass {
+public:
+  static char ID;
+
+  HexagonGenWideningVecFloatInstr() : FunctionPass(ID) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  HexagonGenWideningVecFloatInstr(const HexagonTargetMachine *TM)
+      : FunctionPass(ID), TM(TM) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+    return "Hexagon generate widening vector float instructions";
+  }
+
+  bool runOnFunction(Function &F) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    FunctionPass::getAnalysisUsage(AU);
+  }
+
+private:
+  Module *M = nullptr;
+  const HexagonTargetMachine *TM = nullptr;
+  const HexagonSubtarget *HST = nullptr;
+  unsigned HwVLen;
+  unsigned NumHalfEltsInFullVec;
+
+  struct OPInfo {
+    Value *OP;
+    Value *ExtInOP;
+    unsigned ExtInSize;
+  };
+
+  bool visitBlock(BasicBlock *B);
+  bool processInstruction(Instruction *Inst);
+  bool replaceWithIntrinsic(Instruction *Inst, OPInfo &OP1Info,
+                            OPInfo &OP2Info);
+
+  bool getOperandInfo(Value *V, OPInfo &OPI);
+  bool isExtendedConstant(Constant *C);
+  unsigned getElementSizeInBits(Value *V);
+  Type *getElementTy(unsigned size, IRBuilder<> &IRB);
+
+  Value *adjustExtensionForOp(OPInfo &OPI, IRBuilder<> &IRB,
+                              unsigned NewEltsize, unsigned NumElts);
+
+  std::pair<Value *, Value *> opSplit(Value *OP, Instruction *Inst);
+
+  Value *createIntrinsic(Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1,
+                         Value *NewOP2, FixedVectorType *ResType,
+                         unsigned NumElts, bool BitCastOp);
+};
+
+} // end anonymous namespace
+
+char HexagonGenWideningVecFloatInstr::ID = 0;
+
+INITIALIZE_PASS_BEGIN(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                      "Hexagon generate "
+                      "widening vector float instructions",
+                      false, false)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_PASS_END(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                    "Hexagon generate "
+                    "widening vector float instructions",
+                    false, false)
+
+bool HexagonGenWideningVecFloatInstr::isExtendedConstant(Constant *C) {
+  if (Value *SplatV = C->getSplatValue()) {
+    if (auto *CFP = dyn_cast<ConstantFP>(SplatV)) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts == APFloat::opStatus::opOK || sts == APFloat::opStatus::opInexact)
+        return true;
+    }
+    return false;
+  }
+  unsigned NumElts = cast<FixedVectorType>(C->getType())->getNumElements();
+  for (unsigned i = 0, e = NumElts; i != e; ++i) {
+    if (auto *CFP = dyn_cast<ConstantFP>(C->getAggregateElement(i))) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts != APFloat::opStatus::opOK && sts != APFloat::opStatus::opInexact)
+        return false;
+      continue;
+    }
+    return false;
+  }
+  return true;
+}
+
+unsigned HexagonGenWideningVecFloatInstr::getElementSizeInBits(Value *V) {
+  Type *ValTy = V->getType();
+  Type *EltTy = ValTy;
+  if (dyn_cast<Constant>(V)) {
+    unsigned EltSize =
+        cast<VectorType>(EltTy)->getElementType()->getPrimitiveSizeInBits();
+    unsigned ReducedSize = EltSize / 2;
+
+    return ReducedSize;
+  }
+
+  if (ValTy->isVectorTy())
+    EltTy = cast<VectorType>(ValTy)->getElementType();
+  return EltTy->getPrimitiveSizeInBits();
+}
+
+bool HexagonGenWideningVecFloatInstr::getOperandInfo(Value *V, OPInfo &OPI) {
+  using namespace PatternMatch;
+  OPI.OP = V;
+  Value *ExtV = nullptr;
+  Constant *C = nullptr;
+
+  if (match(V, (m_FPExt(m_Value(ExtV)))) ||
+      match(V,
+            m_Shuffle(m_InsertElt(m_Poison(), m_FPExt(m_Value(ExtV)), m_Zero()),
+                      m_Poison(), m_ZeroMask()))) {
+
+    if (auto *ExtVType = dyn_cast<VectorType>(ExtV->getType())) {
+      // Matches the first branch.
+      if (ExtVType->getElementType()->isBFloatTy())
+        // do not confuse bf16 with ieee-fp16.
+        return false;
+    } else {
+      // Matches the second branch (insert element branch)
+      if (ExtV->getType()->isBFloatTy())
+        return false;
+    }
+
+    OPI.ExtInOP = ExtV;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  if (match(V, m_Constant(C))) {
+    if (!isExtendedConstant(C))
+      return false;
+    OPI.ExtInOP = C;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  return false;
+}
+
+Type *HexagonGenWideningVecFloatInstr::getElementTy(unsigned size,
+                                                    IRBuilder<> &IRB) {
+  switch (size) {
+  case 16:
+    return IRB.getHalfTy();
+  case 32:
+    return IRB.getFloatTy();
+  default:
+    llvm_unreachable("Unhandled Element size");
+  }
+}
+
+Value *HexagonGenWideningVecFloatInstr::adjustExtensionForOp(
+    OPInfo &OPI, IRBuilder<> &IRB, unsigned NewExtSize, unsigned NumElts) {
+  Value *V = OPI.ExtInOP;
+  unsigned EltSize = getElementSizeInBits(OPI.ExtInOP);
+  assert(NewExtSize >= EltSize);
+  Type *EltType = getElementTy(NewExtSize, IRB);
+  auto *NewOpTy = FixedVectorType::get(EltType, NumElts);
+
+  if (auto *C = dyn_cast<Constant>(V))
+    return IRB.CreateFPTrunc(C, NewOpTy);
+
+  if (V->getType()->isVectorTy())
+    if (NewExtSize == EltSize)
+      return V;
+
+  return nullptr;
+}
+
+std::pair<Value *, Value *>
+HexagonGenWideningVecFloatInstr::opSplit(Value *OP, Instruction *Inst) {
+  Type *InstTy = Inst->getType();
+  unsigned NumElts = cast<FixedVectorType>(InstTy)->getNumElements();
+  IRBuilder<> IRB(Inst);
+  Intrinsic::ID IntHi = Intrinsic::hexagon_V6_hi_128B;
+  Intrinsic::ID IntLo = Intrinsic::hexagon_V6_lo_128B;
+  Function *ExtFHi = Intrinsic::getOrInsertDeclaration(M, IntHi);
+  Function *ExtFLo = Intrinsic::getOrInsertDeclaration(M, IntLo);
+  if (NumElts == 128) {
+    auto *InType = FixedVectorType::get(IRB.getInt32Ty(), 64);
+    OP = IRB.CreateBitCast(OP, InType);
+  }
+  Value *OP1Hi = IRB.CreateCall(ExtFHi, {OP});
+  Value *OP1Lo = IRB.CreateCall(ExtFLo, {OP});
+  return std::pair<Value *, Value *>(OP1Hi, OP1Lo);
+}
+
+Value *HexagonGenWideningVecFloatInstr::createIntrinsic(
+    Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1, Value *NewOP2,
+    FixedVectorType *ResType, unsigned NumElts, bool BitCastO...
[truncated]

llvmbot · 2025-11-28T23:15:03Z

@llvm/pr-subscribers-backend-hexagon

Author: Fateme Hosseini (fhossein-quic)

Changes

Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization.

Co-authored-by: Jyotsna Verma <[email protected]>
Co-authored-by: Yashas Andaluri <[email protected]>
Co-authored-by: Fateme Hosseini <[email protected]>
Co-authored-by: Muntasir Mallick <[email protected]>
Co-authored-by: Tatiana Larina <[email protected]>
Co-authored-by: Kaushik Kulkarni <[email protected]>

Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35

Patch is 255.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169559.diff

37 Files Affected:

(modified) llvm/include/llvm/IR/IntrinsicsHexagon.td (+79-1)
(modified) llvm/include/llvm/IR/IntrinsicsHexagonDep.td (-14)
(modified) llvm/lib/Target/Hexagon/CMakeLists.txt (+3)
(modified) llvm/lib/Target/Hexagon/Hexagon.h (+4)
(modified) llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp (+1-2)
(added) llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp (+565)
(added) llvm/lib/Target/Hexagon/HexagonGenWideningVecInstr.cpp (+1184)
(modified) llvm/lib/Target/Hexagon/HexagonISelLowering.h (+1)
(modified) llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp (+110)
(modified) llvm/lib/Target/Hexagon/HexagonIntrinsics.td (+114)
(modified) llvm/lib/Target/Hexagon/HexagonNewValueJump.cpp (+1-1)
(added) llvm/lib/Target/Hexagon/HexagonOptShuffleVector.cpp (+713)
(modified) llvm/lib/Target/Hexagon/HexagonPatternsHVX.td (+12)
(modified) llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp (+17)
(modified) llvm/lib/Target/Hexagon/HexagonVectorCombine.cpp (+21-22)
(modified) llvm/lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.cpp (+5)
(modified) llvm/test/CodeGen/Hexagon/autohvx/isel-vpackew.ll (+11-15)
(modified) llvm/test/CodeGen/Hexagon/autohvx/widen-setcc.ll (+1-3)
(added) llvm/test/CodeGen/Hexagon/bug54537-vavg.ll (+20)
(added) llvm/test/CodeGen/Hexagon/extend-multiply-for-output-fpext.ll (+16)
(added) llvm/test/CodeGen/Hexagon/no_widening_of_bf16_vecmul.ll (+60)
(added) llvm/test/CodeGen/Hexagon/shortvec-vasrsat.ll (+68)
(added) llvm/test/CodeGen/Hexagon/shortvec-vavg.ll (+20)
(added) llvm/test/CodeGen/Hexagon/shortvec-vmpy.ll (+27)
(added) llvm/test/CodeGen/Hexagon/vadd-const.ll (+114)
(added) llvm/test/CodeGen/Hexagon/vasr-sat.ll (+66)
(added) llvm/test/CodeGen/Hexagon/vavg.ll (+33)
(added) llvm/test/CodeGen/Hexagon/vec-shuff-invalid-operand.ll (+32)
(added) llvm/test/CodeGen/Hexagon/vec-shuff-multi-uses.ll (+290)
(added) llvm/test/CodeGen/Hexagon/vec-shuff2.ll (+106)
(added) llvm/test/CodeGen/Hexagon/vmpa.ll (+64)
(added) llvm/test/CodeGen/Hexagon/vmpy-const.ll (+273)
(added) llvm/test/CodeGen/Hexagon/vmpy-qfp-const.ll (+71)
(added) llvm/test/CodeGen/Hexagon/vsub-const.ll (+112)
(added) llvm/test/CodeGen/Hexagon/widening-float-vec.ll (+15)
(added) llvm/test/CodeGen/Hexagon/widening-vec.ll (+96)
(added) llvm/test/CodeGen/Hexagon/widening-vec2.ll (+23)

diff --git a/llvm/include/llvm/IR/IntrinsicsHexagon.td b/llvm/include/llvm/IR/IntrinsicsHexagon.td
index 20ba51ade35a7..2c945d2399b25 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagon.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagon.td
@@ -14,7 +14,7 @@
 //
 // All Hexagon intrinsics start with "llvm.hexagon.".
 let TargetPrefix = "hexagon" in {
-  /// Hexagon_Intrinsic - Base class for the majority of Hexagon intrinsics.
+  /// Hexagon_Intrinsic - Base class for majority of Hexagon intrinsics.
   class Hexagon_Intrinsic<string GCCIntSuffix, list<LLVMType> ret_types,
                               list<LLVMType> param_types,
                               list<IntrinsicProperty> properties>
@@ -435,6 +435,84 @@ def int_hexagon_V6_vmaskedstorenq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentnq_128B: Hexagon_custom_vms_Intrinsic_128B;
 
+// Carryo
+// The script can't autogenerate clang builtins for vaddcarryo/vsubarryo,
+// and they are marked in HexagonIset.py as not having intrinsics at all.
+// The script could generate intrinsics, but instead of doing intrinsics
+// without builtins, just put the intrinsics here.
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
+       intr_properties>;
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
+       intr_properties>;
+
+// Pseudo intrinsics for widening vector isntructions that
+// get replaced with the real Hexagon instructions during
+// instruction lowering.
+class Hexagon_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, LLVMTruncatedType<0>],
+       [IntrNoMem]>;
+
+class Hexagon_non_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMMatchType<0>, LLVMMatchType<0>],
+       [IntrNoMem]>;
+
+// Widening vector add
+def int_hexagon_vadd_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_us: Hexagon_widenvec_Intrinsic;
+
+
+// Widening vector subtract
+def int_hexagon_vsub_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_us: Hexagon_widenvec_Intrinsic;
+
+// Widening vector multiply
+def int_hexagon_vmpy_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_us: Hexagon_widenvec_Intrinsic;
+
+def int_hexagon_vavgu: Hexagon_non_widenvec_Intrinsic;
+def int_hexagon_vavgs: Hexagon_non_widenvec_Intrinsic;
+
+class Hexagon_vasr_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [LLVMSubdivide2VectorType<0>],
+       [llvm_anyvector_ty, LLVMMatchType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+def int_hexagon_vasrsat_su: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_uu: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_ss: Hexagon_vasr_Intrinsic;
+
+class Hexagon_widen_vec_scalar_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+// Widening vector scalar multiply
+def int_hexagon_vmpy_ub_b: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_ub_ub: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_uh_uh: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_h_h: Hexagon_widen_vec_scalar_Intrinsic;
 
 // Intrinsic for instrumentation based profiling using a custom handler. The
 // name of the handler is passed as the first operand to the intrinsic. The
diff --git a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
index dde4132791f06..2a673603e4e03 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
@@ -491,20 +491,6 @@ class Hexagon_custom_v32i32v128i1_v32i32v32i32v128i1_Intrinsic_128B<
        [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty,llvm_v128i1_ty],
        intr_properties>;
 
-// tag : V6_vaddcarryo
-class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
-       intr_properties>;
-
-// tag : V6_vaddcarryo
-class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
-       intr_properties>;
-
 // tag : V6_vaddcarrysat
 class Hexagon_v16i32_v16i32v16i32v64i1_Intrinsic<string GCCIntSuffix,
       list<IntrinsicProperty> intr_properties = [IntrNoMem]>
diff --git a/llvm/lib/Target/Hexagon/CMakeLists.txt b/llvm/lib/Target/Hexagon/CMakeLists.txt
index 1a5f09642ea66..eddab5a235dab 100644
--- a/llvm/lib/Target/Hexagon/CMakeLists.txt
+++ b/llvm/lib/Target/Hexagon/CMakeLists.txt
@@ -37,6 +37,8 @@ add_llvm_target(HexagonCodeGen
   HexagonGenMemAbsolute.cpp
   HexagonGenMux.cpp
   HexagonGenPredicate.cpp
+  HexagonGenWideningVecFloatInstr.cpp
+  HexagonGenWideningVecInstr.cpp
   HexagonHardwareLoops.cpp
   HexagonHazardRecognizer.cpp
   HexagonInstrInfo.cpp
@@ -53,6 +55,7 @@ add_llvm_target(HexagonCodeGen
   HexagonNewValueJump.cpp
   HexagonOptAddrMode.cpp
   HexagonOptimizeSZextends.cpp
+  HexagonOptShuffleVector.cpp
   HexagonPeephole.cpp
   HexagonQFPOptimizer.cpp
   HexagonRDFOpt.cpp
diff --git a/llvm/lib/Target/Hexagon/Hexagon.h b/llvm/lib/Target/Hexagon/Hexagon.h
index 422ab20891b94..b98369d1b3e30 100644
--- a/llvm/lib/Target/Hexagon/Hexagon.h
+++ b/llvm/lib/Target/Hexagon/Hexagon.h
@@ -92,6 +92,9 @@ FunctionPass *createHexagonGenInsert();
 FunctionPass *createHexagonGenMemAbsolute();
 FunctionPass *createHexagonGenMux();
 FunctionPass *createHexagonGenPredicate();
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+FunctionPass *createHexagonGenWideningVecInstr(const HexagonTargetMachine &);
 FunctionPass *createHexagonHardwareLoops();
 FunctionPass *createHexagonISelDag(HexagonTargetMachine &TM,
                                    CodeGenOptLevel OptLevel);
@@ -102,6 +105,7 @@ FunctionPass *createHexagonMergeActivateWeight();
 FunctionPass *createHexagonNewValueJump();
 FunctionPass *createHexagonOptAddrMode();
 FunctionPass *createHexagonOptimizeSZextends();
+FunctionPass *createHexagonOptShuffleVector(const HexagonTargetMachine &);
 FunctionPass *createHexagonPacketizer(bool Minimal);
 FunctionPass *createHexagonPeephole();
 FunctionPass *createHexagonRDFOpt();
diff --git a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
index 5344ed8446efc..412d58743df94 100644
--- a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
@@ -51,8 +51,7 @@ struct PrintRegister {
 };
 
 [[maybe_unused]] raw_ostream &operator<<(raw_ostream &OS,
-                                         const PrintRegister &PR);
-raw_ostream &operator<<(raw_ostream &OS, const PrintRegister &PR) {
+                                         const PrintRegister &PR) {
   return OS << printReg(PR.Reg.Reg, &PR.TRI, PR.Reg.SubReg);
 }
 
diff --git a/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
new file mode 100644
index 0000000000000..7271f1f839d69
--- /dev/null
+++ b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
@@ -0,0 +1,565 @@
+//===------------------- HexagonGenWideningVecFloatInstr.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Replace widening vector float operations with hexagon intrinsics.
+//
+//===----------------------------------------------------------------------===//
+//
+// Brief overview of working of GenWideningVecFloatInstr pass.
+// This version of pass is replica of already existing pass(which will replace
+// widen vector integer operations with it's respective intrinsics). In this
+// pass we will generate hexagon intrinsics for widen vector float instructions.
+//
+// Example1(64 vector-width widening):
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %1 = fpext <64 x half> %wide.load to <64 x float>
+// %3 = fpext <64 x half> %wide.load53 to <64 x float>
+// %4 = fmul <64 x float> %1, %3
+//
+// If we run this pass on the above example, it will first find fmul
+// instruction, and then it will check whether the operands of fmul instruction
+// (%1 and %3) belongs to either of these categories [%1 ->fpext, %3 ->fpext]
+// or [%1 ->fpext, %3 ->constant_vector] or [%1 ->constant_vector, %3 ->fpext].
+// If it sees such pattern, then this pass will replace such pattern with
+// appropriate hexagon intrinsics.
+//
+// After replacement:
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %3 = bitcast <64 x half> %wide.load to <32 x i32>
+// %4 = bitcast <64 x half> %wide.load53 to <32 x i32>
+// %5 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %4)
+// %6 = shufflevector <64 x i32> %5, <64 x i32> poison, <64 x i32> ShuffMask1
+// %7 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %6)
+// %8 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %6)
+// %9 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %7)
+// %10 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %8)
+// %11 = bitcast <32 x i32> %9 to <32 x float>
+// %12 = bitcast <32 x i32> %10 to <32 x float>
+// %13 = shufflevector <32 x float> %12, <32 x float> %11, <64 x i32> ShuffMask2
+//
+//
+//
+// Example2(128 vector-width widening):
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = fpext <128 x half> %wide.load to <128 x float>
+// %2 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %2, align 2
+// %3 = fpext <128 x half> %wide.load2 to <128 x float>
+// %4 = fmul <128 x float> %1, %3
+//
+// After replacement:
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %1, align 2
+// %2 = bitcast <128 x half> %wide.load to <64 x i32>
+// %3 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %2)
+// %4 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %2)
+// %5 = bitcast <128 x half> %wide.load2 to <64 x i32>
+// %6 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %5)
+// %7 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %5)
+// %8 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %6)
+// %9 = shufflevector <64 x i32> %8, <64 x i32> poison, <64 x i32> Mask1
+// %10 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %9)
+// %11 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %9)
+// %12 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %10)
+// %13 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %11)
+// %14 = bitcast <32 x i32> %12 to <32 x float>
+// %15 = bitcast <32 x i32> %13 to <32 x float>
+// %16 = shufflevector <32 x float> %15, <32 x float> %14, <64 x i32> Mask2
+// %17 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%4, %7)
+// %18 = shufflevector <64 x i32> %17, <64 x i32> poison, <64 x i32> Mask1
+// %19 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %18)
+// %20 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %18)
+// %21 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %19)
+// %22 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %20)
+// %23 = bitcast <32 x i32> %21 to <32 x float>
+// %24 = bitcast <32 x i32> %22 to <32 x float>
+// %25 = shufflevector <32 x float> %24, <32 x float> %23, <64 x i32> Mask2
+// %26 = shufflevector <64 x float> %25, <64 x float> %16, <128 x i32> Mask3
+//
+//
+//===----------------------------------------------------------------------===//
+#include "HexagonTargetMachine.h"
+#include "llvm/ADT/APInt.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicsHexagon.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/IR/Type.h"
+#include "llvm/IR/Value.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include <algorithm>
+#include <utility>
+
+using namespace llvm;
+
+namespace llvm {
+void initializeHexagonGenWideningVecFloatInstrPass(PassRegistry &);
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+} // end namespace llvm
+
+namespace {
+
+class HexagonGenWideningVecFloatInstr : public FunctionPass {
+public:
+  static char ID;
+
+  HexagonGenWideningVecFloatInstr() : FunctionPass(ID) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  HexagonGenWideningVecFloatInstr(const HexagonTargetMachine *TM)
+      : FunctionPass(ID), TM(TM) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+    return "Hexagon generate widening vector float instructions";
+  }
+
+  bool runOnFunction(Function &F) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    FunctionPass::getAnalysisUsage(AU);
+  }
+
+private:
+  Module *M = nullptr;
+  const HexagonTargetMachine *TM = nullptr;
+  const HexagonSubtarget *HST = nullptr;
+  unsigned HwVLen;
+  unsigned NumHalfEltsInFullVec;
+
+  struct OPInfo {
+    Value *OP;
+    Value *ExtInOP;
+    unsigned ExtInSize;
+  };
+
+  bool visitBlock(BasicBlock *B);
+  bool processInstruction(Instruction *Inst);
+  bool replaceWithIntrinsic(Instruction *Inst, OPInfo &OP1Info,
+                            OPInfo &OP2Info);
+
+  bool getOperandInfo(Value *V, OPInfo &OPI);
+  bool isExtendedConstant(Constant *C);
+  unsigned getElementSizeInBits(Value *V);
+  Type *getElementTy(unsigned size, IRBuilder<> &IRB);
+
+  Value *adjustExtensionForOp(OPInfo &OPI, IRBuilder<> &IRB,
+                              unsigned NewEltsize, unsigned NumElts);
+
+  std::pair<Value *, Value *> opSplit(Value *OP, Instruction *Inst);
+
+  Value *createIntrinsic(Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1,
+                         Value *NewOP2, FixedVectorType *ResType,
+                         unsigned NumElts, bool BitCastOp);
+};
+
+} // end anonymous namespace
+
+char HexagonGenWideningVecFloatInstr::ID = 0;
+
+INITIALIZE_PASS_BEGIN(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                      "Hexagon generate "
+                      "widening vector float instructions",
+                      false, false)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_PASS_END(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                    "Hexagon generate "
+                    "widening vector float instructions",
+                    false, false)
+
+bool HexagonGenWideningVecFloatInstr::isExtendedConstant(Constant *C) {
+  if (Value *SplatV = C->getSplatValue()) {
+    if (auto *CFP = dyn_cast<ConstantFP>(SplatV)) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts == APFloat::opStatus::opOK || sts == APFloat::opStatus::opInexact)
+        return true;
+    }
+    return false;
+  }
+  unsigned NumElts = cast<FixedVectorType>(C->getType())->getNumElements();
+  for (unsigned i = 0, e = NumElts; i != e; ++i) {
+    if (auto *CFP = dyn_cast<ConstantFP>(C->getAggregateElement(i))) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts != APFloat::opStatus::opOK && sts != APFloat::opStatus::opInexact)
+        return false;
+      continue;
+    }
+    return false;
+  }
+  return true;
+}
+
+unsigned HexagonGenWideningVecFloatInstr::getElementSizeInBits(Value *V) {
+  Type *ValTy = V->getType();
+  Type *EltTy = ValTy;
+  if (dyn_cast<Constant>(V)) {
+    unsigned EltSize =
+        cast<VectorType>(EltTy)->getElementType()->getPrimitiveSizeInBits();
+    unsigned ReducedSize = EltSize / 2;
+
+    return ReducedSize;
+  }
+
+  if (ValTy->isVectorTy())
+    EltTy = cast<VectorType>(ValTy)->getElementType();
+  return EltTy->getPrimitiveSizeInBits();
+}
+
+bool HexagonGenWideningVecFloatInstr::getOperandInfo(Value *V, OPInfo &OPI) {
+  using namespace PatternMatch;
+  OPI.OP = V;
+  Value *ExtV = nullptr;
+  Constant *C = nullptr;
+
+  if (match(V, (m_FPExt(m_Value(ExtV)))) ||
+      match(V,
+            m_Shuffle(m_InsertElt(m_Poison(), m_FPExt(m_Value(ExtV)), m_Zero()),
+                      m_Poison(), m_ZeroMask()))) {
+
+    if (auto *ExtVType = dyn_cast<VectorType>(ExtV->getType())) {
+      // Matches the first branch.
+      if (ExtVType->getElementType()->isBFloatTy())
+        // do not confuse bf16 with ieee-fp16.
+        return false;
+    } else {
+      // Matches the second branch (insert element branch)
+      if (ExtV->getType()->isBFloatTy())
+        return false;
+    }
+
+    OPI.ExtInOP = ExtV;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  if (match(V, m_Constant(C))) {
+    if (!isExtendedConstant(C))
+      return false;
+    OPI.ExtInOP = C;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  return false;
+}
+
+Type *HexagonGenWideningVecFloatInstr::getElementTy(unsigned size,
+                                                    IRBuilder<> &IRB) {
+  switch (size) {
+  case 16:
+    return IRB.getHalfTy();
+  case 32:
+    return IRB.getFloatTy();
+  default:
+    llvm_unreachable("Unhandled Element size");
+  }
+}
+
+Value *HexagonGenWideningVecFloatInstr::adjustExtensionForOp(
+    OPInfo &OPI, IRBuilder<> &IRB, unsigned NewExtSize, unsigned NumElts) {
+  Value *V = OPI.ExtInOP;
+  unsigned EltSize = getElementSizeInBits(OPI.ExtInOP);
+  assert(NewExtSize >= EltSize);
+  Type *EltType = getElementTy(NewExtSize, IRB);
+  auto *NewOpTy = FixedVectorType::get(EltType, NumElts);
+
+  if (auto *C = dyn_cast<Constant>(V))
+    return IRB.CreateFPTrunc(C, NewOpTy);
+
+  if (V->getType()->isVectorTy())
+    if (NewExtSize == EltSize)
+      return V;
+
+  return nullptr;
+}
+
+std::pair<Value *, Value *>
+HexagonGenWideningVecFloatInstr::opSplit(Value *OP, Instruction *Inst) {
+  Type *InstTy = Inst->getType();
+  unsigned NumElts = cast<FixedVectorType>(InstTy)->getNumElements();
+  IRBuilder<> IRB(Inst);
+  Intrinsic::ID IntHi = Intrinsic::hexagon_V6_hi_128B;
+  Intrinsic::ID IntLo = Intrinsic::hexagon_V6_lo_128B;
+  Function *ExtFHi = Intrinsic::getOrInsertDeclaration(M, IntHi);
+  Function *ExtFLo = Intrinsic::getOrInsertDeclaration(M, IntLo);
+  if (NumElts == 128) {
+    auto *InType = FixedVectorType::get(IRB.getInt32Ty(), 64);
+    OP = IRB.CreateBitCast(OP, InType);
+  }
+  Value *OP1Hi = IRB.CreateCall(ExtFHi, {OP});
+  Value *OP1Lo = IRB.CreateCall(ExtFLo, {OP});
+  return std::pair<Value *, Value *>(OP1Hi, OP1Lo);
+}
+
+Value *HexagonGenWideningVecFloatInstr::createIntrinsic(
+    Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1, Value *NewOP2,
+    FixedVectorType *ResType, unsigned NumElts, bool BitCastO...
[truncated]

iajbar · 2025-12-01T20:14:38Z

This is not needed "Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35"

Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization. Co-authored-by: Jyotsna Verma <[email protected]> Co-authored-by: Yashas Andaluri <[email protected]> Co-authored-by: Fateme Hosseini <[email protected]> Co-authored-by: Muntasir Mallick <[email protected]> Co-authored-by: Tatiana Larina <[email protected]> Co-authored-by: Kaushik Kulkarni <[email protected]>

fhossein-quic force-pushed the PR_GenVecWid branch 2 times, most recently from 74b49be to 90d294a Compare November 25, 2025 20:39

fhossein-quic force-pushed the PR_GenVecWid branch from 90d294a to 7fa86a5 Compare November 28, 2025 19:31

fhossein-quic marked this pull request as ready for review November 28, 2025 23:14

llvmbot added backend:Hexagon llvm:ir labels Nov 28, 2025

fhossein-quic requested review from SergeiYLarin, aankit-ca, iajbar and jverma-quic November 28, 2025 23:15

fhossein-quic changed the title ~~Passes for widening vector operations and shuffle opt~~ [Hexagon] Passes for widening vector operations and shuffle opt Dec 1, 2025

SergeiYLarin approved these changes Dec 1, 2025

View reviewed changes

fhossein-quic force-pushed the PR_GenVecWid branch 2 times, most recently from 441c18b to d7b5116 Compare December 1, 2025 20:49

jverma-quic approved these changes Dec 1, 2025

View reviewed changes

fhossein-quic force-pushed the PR_GenVecWid branch from d7b5116 to 1e24119 Compare December 1, 2025 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Hexagon] Passes for widening vector operations and shuffle opt #169559

[Hexagon] Passes for widening vector operations and shuffle opt #169559

fhossein-quic commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 28, 2025

Uh oh!

llvmbot commented Nov 28, 2025

Uh oh!

iajbar commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Hexagon] Passes for widening vector operations and shuffle opt #169559

Are you sure you want to change the base?

[Hexagon] Passes for widening vector operations and shuffle opt #169559

Conversation

fhossein-quic commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

llvmbot commented Nov 28, 2025

Uh oh!

llvmbot commented Nov 28, 2025

Uh oh!

iajbar commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Nov 25, 2025 •

edited

Loading

github-actions bot commented Nov 25, 2025 •

edited

Loading

github-actions bot commented Nov 25, 2025 •

edited

Loading