Skip to content

Conversation

@fhossein-quic
Copy link
Contributor

Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization.

Co-authored-by: Jyotsna Verma [email protected]
Co-authored-by: Yashas Andaluri [email protected]
Co-authored-by: Fateme Hosseini [email protected]
Co-authored-by: Muntasir Mallick [email protected]
Co-authored-by: Tatiana Larina [email protected]
Co-authored-by: Kaushik Kulkarni [email protected]

Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35

@github-actions
Copy link

github-actions bot commented Nov 25, 2025

✅ With the latest revision this PR passed the undef deprecator.

@github-actions
Copy link

github-actions bot commented Nov 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhossein-quic fhossein-quic force-pushed the PR_GenVecWid branch 2 times, most recently from 74b49be to 90d294a Compare November 25, 2025 20:39
@github-actions
Copy link

github-actions bot commented Nov 25, 2025

🐧 Linux x64 Test Results

  • 186828 tests passed
  • 4893 tests skipped

@llvmbot
Copy link
Member

llvmbot commented Nov 28, 2025

@llvm/pr-subscribers-llvm-ir

Author: Fateme Hosseini (fhossein-quic)

Changes

Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization.

Co-authored-by: Jyotsna Verma <[email protected]>
Co-authored-by: Yashas Andaluri <[email protected]>
Co-authored-by: Fateme Hosseini <[email protected]>
Co-authored-by: Muntasir Mallick <[email protected]>
Co-authored-by: Tatiana Larina <[email protected]>
Co-authored-by: Kaushik Kulkarni <[email protected]>

Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35


Patch is 255.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169559.diff

37 Files Affected:

  • (modified) llvm/include/llvm/IR/IntrinsicsHexagon.td (+79-1)
  • (modified) llvm/include/llvm/IR/IntrinsicsHexagonDep.td (-14)
  • (modified) llvm/lib/Target/Hexagon/CMakeLists.txt (+3)
  • (modified) llvm/lib/Target/Hexagon/Hexagon.h (+4)
  • (modified) llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp (+1-2)
  • (added) llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp (+565)
  • (added) llvm/lib/Target/Hexagon/HexagonGenWideningVecInstr.cpp (+1184)
  • (modified) llvm/lib/Target/Hexagon/HexagonISelLowering.h (+1)
  • (modified) llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp (+110)
  • (modified) llvm/lib/Target/Hexagon/HexagonIntrinsics.td (+114)
  • (modified) llvm/lib/Target/Hexagon/HexagonNewValueJump.cpp (+1-1)
  • (added) llvm/lib/Target/Hexagon/HexagonOptShuffleVector.cpp (+713)
  • (modified) llvm/lib/Target/Hexagon/HexagonPatternsHVX.td (+12)
  • (modified) llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp (+17)
  • (modified) llvm/lib/Target/Hexagon/HexagonVectorCombine.cpp (+21-22)
  • (modified) llvm/lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.cpp (+5)
  • (modified) llvm/test/CodeGen/Hexagon/autohvx/isel-vpackew.ll (+11-15)
  • (modified) llvm/test/CodeGen/Hexagon/autohvx/widen-setcc.ll (+1-3)
  • (added) llvm/test/CodeGen/Hexagon/bug54537-vavg.ll (+20)
  • (added) llvm/test/CodeGen/Hexagon/extend-multiply-for-output-fpext.ll (+16)
  • (added) llvm/test/CodeGen/Hexagon/no_widening_of_bf16_vecmul.ll (+60)
  • (added) llvm/test/CodeGen/Hexagon/shortvec-vasrsat.ll (+68)
  • (added) llvm/test/CodeGen/Hexagon/shortvec-vavg.ll (+20)
  • (added) llvm/test/CodeGen/Hexagon/shortvec-vmpy.ll (+27)
  • (added) llvm/test/CodeGen/Hexagon/vadd-const.ll (+114)
  • (added) llvm/test/CodeGen/Hexagon/vasr-sat.ll (+66)
  • (added) llvm/test/CodeGen/Hexagon/vavg.ll (+33)
  • (added) llvm/test/CodeGen/Hexagon/vec-shuff-invalid-operand.ll (+32)
  • (added) llvm/test/CodeGen/Hexagon/vec-shuff-multi-uses.ll (+290)
  • (added) llvm/test/CodeGen/Hexagon/vec-shuff2.ll (+106)
  • (added) llvm/test/CodeGen/Hexagon/vmpa.ll (+64)
  • (added) llvm/test/CodeGen/Hexagon/vmpy-const.ll (+273)
  • (added) llvm/test/CodeGen/Hexagon/vmpy-qfp-const.ll (+71)
  • (added) llvm/test/CodeGen/Hexagon/vsub-const.ll (+112)
  • (added) llvm/test/CodeGen/Hexagon/widening-float-vec.ll (+15)
  • (added) llvm/test/CodeGen/Hexagon/widening-vec.ll (+96)
  • (added) llvm/test/CodeGen/Hexagon/widening-vec2.ll (+23)
diff --git a/llvm/include/llvm/IR/IntrinsicsHexagon.td b/llvm/include/llvm/IR/IntrinsicsHexagon.td
index 20ba51ade35a7..2c945d2399b25 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagon.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagon.td
@@ -14,7 +14,7 @@
 //
 // All Hexagon intrinsics start with "llvm.hexagon.".
 let TargetPrefix = "hexagon" in {
-  /// Hexagon_Intrinsic - Base class for the majority of Hexagon intrinsics.
+  /// Hexagon_Intrinsic - Base class for majority of Hexagon intrinsics.
   class Hexagon_Intrinsic<string GCCIntSuffix, list<LLVMType> ret_types,
                               list<LLVMType> param_types,
                               list<IntrinsicProperty> properties>
@@ -435,6 +435,84 @@ def int_hexagon_V6_vmaskedstorenq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentnq_128B: Hexagon_custom_vms_Intrinsic_128B;
 
+// Carryo
+// The script can't autogenerate clang builtins for vaddcarryo/vsubarryo,
+// and they are marked in HexagonIset.py as not having intrinsics at all.
+// The script could generate intrinsics, but instead of doing intrinsics
+// without builtins, just put the intrinsics here.
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
+       intr_properties>;
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
+       intr_properties>;
+
+// Pseudo intrinsics for widening vector isntructions that
+// get replaced with the real Hexagon instructions during
+// instruction lowering.
+class Hexagon_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, LLVMTruncatedType<0>],
+       [IntrNoMem]>;
+
+class Hexagon_non_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMMatchType<0>, LLVMMatchType<0>],
+       [IntrNoMem]>;
+
+// Widening vector add
+def int_hexagon_vadd_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_us: Hexagon_widenvec_Intrinsic;
+
+
+// Widening vector subtract
+def int_hexagon_vsub_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_us: Hexagon_widenvec_Intrinsic;
+
+// Widening vector multiply
+def int_hexagon_vmpy_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_us: Hexagon_widenvec_Intrinsic;
+
+def int_hexagon_vavgu: Hexagon_non_widenvec_Intrinsic;
+def int_hexagon_vavgs: Hexagon_non_widenvec_Intrinsic;
+
+class Hexagon_vasr_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [LLVMSubdivide2VectorType<0>],
+       [llvm_anyvector_ty, LLVMMatchType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+def int_hexagon_vasrsat_su: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_uu: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_ss: Hexagon_vasr_Intrinsic;
+
+class Hexagon_widen_vec_scalar_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+// Widening vector scalar multiply
+def int_hexagon_vmpy_ub_b: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_ub_ub: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_uh_uh: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_h_h: Hexagon_widen_vec_scalar_Intrinsic;
 
 // Intrinsic for instrumentation based profiling using a custom handler. The
 // name of the handler is passed as the first operand to the intrinsic. The
diff --git a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
index dde4132791f06..2a673603e4e03 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
@@ -491,20 +491,6 @@ class Hexagon_custom_v32i32v128i1_v32i32v32i32v128i1_Intrinsic_128B<
        [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty,llvm_v128i1_ty],
        intr_properties>;
 
-// tag : V6_vaddcarryo
-class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
-       intr_properties>;
-
-// tag : V6_vaddcarryo
-class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
-       intr_properties>;
-
 // tag : V6_vaddcarrysat
 class Hexagon_v16i32_v16i32v16i32v64i1_Intrinsic<string GCCIntSuffix,
       list<IntrinsicProperty> intr_properties = [IntrNoMem]>
diff --git a/llvm/lib/Target/Hexagon/CMakeLists.txt b/llvm/lib/Target/Hexagon/CMakeLists.txt
index 1a5f09642ea66..eddab5a235dab 100644
--- a/llvm/lib/Target/Hexagon/CMakeLists.txt
+++ b/llvm/lib/Target/Hexagon/CMakeLists.txt
@@ -37,6 +37,8 @@ add_llvm_target(HexagonCodeGen
   HexagonGenMemAbsolute.cpp
   HexagonGenMux.cpp
   HexagonGenPredicate.cpp
+  HexagonGenWideningVecFloatInstr.cpp
+  HexagonGenWideningVecInstr.cpp
   HexagonHardwareLoops.cpp
   HexagonHazardRecognizer.cpp
   HexagonInstrInfo.cpp
@@ -53,6 +55,7 @@ add_llvm_target(HexagonCodeGen
   HexagonNewValueJump.cpp
   HexagonOptAddrMode.cpp
   HexagonOptimizeSZextends.cpp
+  HexagonOptShuffleVector.cpp
   HexagonPeephole.cpp
   HexagonQFPOptimizer.cpp
   HexagonRDFOpt.cpp
diff --git a/llvm/lib/Target/Hexagon/Hexagon.h b/llvm/lib/Target/Hexagon/Hexagon.h
index 422ab20891b94..b98369d1b3e30 100644
--- a/llvm/lib/Target/Hexagon/Hexagon.h
+++ b/llvm/lib/Target/Hexagon/Hexagon.h
@@ -92,6 +92,9 @@ FunctionPass *createHexagonGenInsert();
 FunctionPass *createHexagonGenMemAbsolute();
 FunctionPass *createHexagonGenMux();
 FunctionPass *createHexagonGenPredicate();
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+FunctionPass *createHexagonGenWideningVecInstr(const HexagonTargetMachine &);
 FunctionPass *createHexagonHardwareLoops();
 FunctionPass *createHexagonISelDag(HexagonTargetMachine &TM,
                                    CodeGenOptLevel OptLevel);
@@ -102,6 +105,7 @@ FunctionPass *createHexagonMergeActivateWeight();
 FunctionPass *createHexagonNewValueJump();
 FunctionPass *createHexagonOptAddrMode();
 FunctionPass *createHexagonOptimizeSZextends();
+FunctionPass *createHexagonOptShuffleVector(const HexagonTargetMachine &);
 FunctionPass *createHexagonPacketizer(bool Minimal);
 FunctionPass *createHexagonPeephole();
 FunctionPass *createHexagonRDFOpt();
diff --git a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
index 5344ed8446efc..412d58743df94 100644
--- a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
@@ -51,8 +51,7 @@ struct PrintRegister {
 };
 
 [[maybe_unused]] raw_ostream &operator<<(raw_ostream &OS,
-                                         const PrintRegister &PR);
-raw_ostream &operator<<(raw_ostream &OS, const PrintRegister &PR) {
+                                         const PrintRegister &PR) {
   return OS << printReg(PR.Reg.Reg, &PR.TRI, PR.Reg.SubReg);
 }
 
diff --git a/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
new file mode 100644
index 0000000000000..7271f1f839d69
--- /dev/null
+++ b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
@@ -0,0 +1,565 @@
+//===------------------- HexagonGenWideningVecFloatInstr.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Replace widening vector float operations with hexagon intrinsics.
+//
+//===----------------------------------------------------------------------===//
+//
+// Brief overview of working of GenWideningVecFloatInstr pass.
+// This version of pass is replica of already existing pass(which will replace
+// widen vector integer operations with it's respective intrinsics). In this
+// pass we will generate hexagon intrinsics for widen vector float instructions.
+//
+// Example1(64 vector-width widening):
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %1 = fpext <64 x half> %wide.load to <64 x float>
+// %3 = fpext <64 x half> %wide.load53 to <64 x float>
+// %4 = fmul <64 x float> %1, %3
+//
+// If we run this pass on the above example, it will first find fmul
+// instruction, and then it will check whether the operands of fmul instruction
+// (%1 and %3) belongs to either of these categories [%1 ->fpext, %3 ->fpext]
+// or [%1 ->fpext, %3 ->constant_vector] or [%1 ->constant_vector, %3 ->fpext].
+// If it sees such pattern, then this pass will replace such pattern with
+// appropriate hexagon intrinsics.
+//
+// After replacement:
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %3 = bitcast <64 x half> %wide.load to <32 x i32>
+// %4 = bitcast <64 x half> %wide.load53 to <32 x i32>
+// %5 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %4)
+// %6 = shufflevector <64 x i32> %5, <64 x i32> poison, <64 x i32> ShuffMask1
+// %7 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %6)
+// %8 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %6)
+// %9 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %7)
+// %10 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %8)
+// %11 = bitcast <32 x i32> %9 to <32 x float>
+// %12 = bitcast <32 x i32> %10 to <32 x float>
+// %13 = shufflevector <32 x float> %12, <32 x float> %11, <64 x i32> ShuffMask2
+//
+//
+//
+// Example2(128 vector-width widening):
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = fpext <128 x half> %wide.load to <128 x float>
+// %2 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %2, align 2
+// %3 = fpext <128 x half> %wide.load2 to <128 x float>
+// %4 = fmul <128 x float> %1, %3
+//
+// After replacement:
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %1, align 2
+// %2 = bitcast <128 x half> %wide.load to <64 x i32>
+// %3 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %2)
+// %4 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %2)
+// %5 = bitcast <128 x half> %wide.load2 to <64 x i32>
+// %6 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %5)
+// %7 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %5)
+// %8 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %6)
+// %9 = shufflevector <64 x i32> %8, <64 x i32> poison, <64 x i32> Mask1
+// %10 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %9)
+// %11 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %9)
+// %12 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %10)
+// %13 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %11)
+// %14 = bitcast <32 x i32> %12 to <32 x float>
+// %15 = bitcast <32 x i32> %13 to <32 x float>
+// %16 = shufflevector <32 x float> %15, <32 x float> %14, <64 x i32> Mask2
+// %17 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%4, %7)
+// %18 = shufflevector <64 x i32> %17, <64 x i32> poison, <64 x i32> Mask1
+// %19 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %18)
+// %20 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %18)
+// %21 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %19)
+// %22 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %20)
+// %23 = bitcast <32 x i32> %21 to <32 x float>
+// %24 = bitcast <32 x i32> %22 to <32 x float>
+// %25 = shufflevector <32 x float> %24, <32 x float> %23, <64 x i32> Mask2
+// %26 = shufflevector <64 x float> %25, <64 x float> %16, <128 x i32> Mask3
+//
+//
+//===----------------------------------------------------------------------===//
+#include "HexagonTargetMachine.h"
+#include "llvm/ADT/APInt.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicsHexagon.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/IR/Type.h"
+#include "llvm/IR/Value.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include <algorithm>
+#include <utility>
+
+using namespace llvm;
+
+namespace llvm {
+void initializeHexagonGenWideningVecFloatInstrPass(PassRegistry &);
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+} // end namespace llvm
+
+namespace {
+
+class HexagonGenWideningVecFloatInstr : public FunctionPass {
+public:
+  static char ID;
+
+  HexagonGenWideningVecFloatInstr() : FunctionPass(ID) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  HexagonGenWideningVecFloatInstr(const HexagonTargetMachine *TM)
+      : FunctionPass(ID), TM(TM) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+    return "Hexagon generate widening vector float instructions";
+  }
+
+  bool runOnFunction(Function &F) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    FunctionPass::getAnalysisUsage(AU);
+  }
+
+private:
+  Module *M = nullptr;
+  const HexagonTargetMachine *TM = nullptr;
+  const HexagonSubtarget *HST = nullptr;
+  unsigned HwVLen;
+  unsigned NumHalfEltsInFullVec;
+
+  struct OPInfo {
+    Value *OP;
+    Value *ExtInOP;
+    unsigned ExtInSize;
+  };
+
+  bool visitBlock(BasicBlock *B);
+  bool processInstruction(Instruction *Inst);
+  bool replaceWithIntrinsic(Instruction *Inst, OPInfo &OP1Info,
+                            OPInfo &OP2Info);
+
+  bool getOperandInfo(Value *V, OPInfo &OPI);
+  bool isExtendedConstant(Constant *C);
+  unsigned getElementSizeInBits(Value *V);
+  Type *getElementTy(unsigned size, IRBuilder<> &IRB);
+
+  Value *adjustExtensionForOp(OPInfo &OPI, IRBuilder<> &IRB,
+                              unsigned NewEltsize, unsigned NumElts);
+
+  std::pair<Value *, Value *> opSplit(Value *OP, Instruction *Inst);
+
+  Value *createIntrinsic(Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1,
+                         Value *NewOP2, FixedVectorType *ResType,
+                         unsigned NumElts, bool BitCastOp);
+};
+
+} // end anonymous namespace
+
+char HexagonGenWideningVecFloatInstr::ID = 0;
+
+INITIALIZE_PASS_BEGIN(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                      "Hexagon generate "
+                      "widening vector float instructions",
+                      false, false)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_PASS_END(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                    "Hexagon generate "
+                    "widening vector float instructions",
+                    false, false)
+
+bool HexagonGenWideningVecFloatInstr::isExtendedConstant(Constant *C) {
+  if (Value *SplatV = C->getSplatValue()) {
+    if (auto *CFP = dyn_cast<ConstantFP>(SplatV)) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts == APFloat::opStatus::opOK || sts == APFloat::opStatus::opInexact)
+        return true;
+    }
+    return false;
+  }
+  unsigned NumElts = cast<FixedVectorType>(C->getType())->getNumElements();
+  for (unsigned i = 0, e = NumElts; i != e; ++i) {
+    if (auto *CFP = dyn_cast<ConstantFP>(C->getAggregateElement(i))) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts != APFloat::opStatus::opOK && sts != APFloat::opStatus::opInexact)
+        return false;
+      continue;
+    }
+    return false;
+  }
+  return true;
+}
+
+unsigned HexagonGenWideningVecFloatInstr::getElementSizeInBits(Value *V) {
+  Type *ValTy = V->getType();
+  Type *EltTy = ValTy;
+  if (dyn_cast<Constant>(V)) {
+    unsigned EltSize =
+        cast<VectorType>(EltTy)->getElementType()->getPrimitiveSizeInBits();
+    unsigned ReducedSize = EltSize / 2;
+
+    return ReducedSize;
+  }
+
+  if (ValTy->isVectorTy())
+    EltTy = cast<VectorType>(ValTy)->getElementType();
+  return EltTy->getPrimitiveSizeInBits();
+}
+
+bool HexagonGenWideningVecFloatInstr::getOperandInfo(Value *V, OPInfo &OPI) {
+  using namespace PatternMatch;
+  OPI.OP = V;
+  Value *ExtV = nullptr;
+  Constant *C = nullptr;
+
+  if (match(V, (m_FPExt(m_Value(ExtV)))) ||
+      match(V,
+            m_Shuffle(m_InsertElt(m_Poison(), m_FPExt(m_Value(ExtV)), m_Zero()),
+                      m_Poison(), m_ZeroMask()))) {
+
+    if (auto *ExtVType = dyn_cast<VectorType>(ExtV->getType())) {
+      // Matches the first branch.
+      if (ExtVType->getElementType()->isBFloatTy())
+        // do not confuse bf16 with ieee-fp16.
+        return false;
+    } else {
+      // Matches the second branch (insert element branch)
+      if (ExtV->getType()->isBFloatTy())
+        return false;
+    }
+
+    OPI.ExtInOP = ExtV;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  if (match(V, m_Constant(C))) {
+    if (!isExtendedConstant(C))
+      return false;
+    OPI.ExtInOP = C;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  return false;
+}
+
+Type *HexagonGenWideningVecFloatInstr::getElementTy(unsigned size,
+                                                    IRBuilder<> &IRB) {
+  switch (size) {
+  case 16:
+    return IRB.getHalfTy();
+  case 32:
+    return IRB.getFloatTy();
+  default:
+    llvm_unreachable("Unhandled Element size");
+  }
+}
+
+Value *HexagonGenWideningVecFloatInstr::adjustExtensionForOp(
+    OPInfo &OPI, IRBuilder<> &IRB, unsigned NewExtSize, unsigned NumElts) {
+  Value *V = OPI.ExtInOP;
+  unsigned EltSize = getElementSizeInBits(OPI.ExtInOP);
+  assert(NewExtSize >= EltSize);
+  Type *EltType = getElementTy(NewExtSize, IRB);
+  auto *NewOpTy = FixedVectorType::get(EltType, NumElts);
+
+  if (auto *C = dyn_cast<Constant>(V))
+    return IRB.CreateFPTrunc(C, NewOpTy);
+
+  if (V->getType()->isVectorTy())
+    if (NewExtSize == EltSize)
+      return V;
+
+  return nullptr;
+}
+
+std::pair<Value *, Value *>
+HexagonGenWideningVecFloatInstr::opSplit(Value *OP, Instruction *Inst) {
+  Type *InstTy = Inst->getType();
+  unsigned NumElts = cast<FixedVectorType>(InstTy)->getNumElements();
+  IRBuilder<> IRB(Inst);
+  Intrinsic::ID IntHi = Intrinsic::hexagon_V6_hi_128B;
+  Intrinsic::ID IntLo = Intrinsic::hexagon_V6_lo_128B;
+  Function *ExtFHi = Intrinsic::getOrInsertDeclaration(M, IntHi);
+  Function *ExtFLo = Intrinsic::getOrInsertDeclaration(M, IntLo);
+  if (NumElts == 128) {
+    auto *InType = FixedVectorType::get(IRB.getInt32Ty(), 64);
+    OP = IRB.CreateBitCast(OP, InType);
+  }
+  Value *OP1Hi = IRB.CreateCall(ExtFHi, {OP});
+  Value *OP1Lo = IRB.CreateCall(ExtFLo, {OP});
+  return std::pair<Value *, Value *>(OP1Hi, OP1Lo);
+}
+
+Value *HexagonGenWideningVecFloatInstr::createIntrinsic(
+    Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1, Value *NewOP2,
+    FixedVectorType *ResType, unsigned NumElts, bool BitCastO...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 28, 2025

@llvm/pr-subscribers-backend-hexagon

Author: Fateme Hosseini (fhossein-quic)

Changes

Introduce Hexagon-specific passes to generate widening vector instructions for integer and floating-point operations using generic LLVM intrinsics. This enables widening operations for short vectors and improves type legalization by allowing operands to be widened to appropriate types. The patch also includes a shuffle optimization pass to relocate and validate shufflevector instructions during widening legalization.

Co-authored-by: Jyotsna Verma <[email protected]>
Co-authored-by: Yashas Andaluri <[email protected]>
Co-authored-by: Fateme Hosseini <[email protected]>
Co-authored-by: Muntasir Mallick <[email protected]>
Co-authored-by: Tatiana Larina <[email protected]>
Co-authored-by: Kaushik Kulkarni <[email protected]>

Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35


Patch is 255.46 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169559.diff

37 Files Affected:

  • (modified) llvm/include/llvm/IR/IntrinsicsHexagon.td (+79-1)
  • (modified) llvm/include/llvm/IR/IntrinsicsHexagonDep.td (-14)
  • (modified) llvm/lib/Target/Hexagon/CMakeLists.txt (+3)
  • (modified) llvm/lib/Target/Hexagon/Hexagon.h (+4)
  • (modified) llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp (+1-2)
  • (added) llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp (+565)
  • (added) llvm/lib/Target/Hexagon/HexagonGenWideningVecInstr.cpp (+1184)
  • (modified) llvm/lib/Target/Hexagon/HexagonISelLowering.h (+1)
  • (modified) llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp (+110)
  • (modified) llvm/lib/Target/Hexagon/HexagonIntrinsics.td (+114)
  • (modified) llvm/lib/Target/Hexagon/HexagonNewValueJump.cpp (+1-1)
  • (added) llvm/lib/Target/Hexagon/HexagonOptShuffleVector.cpp (+713)
  • (modified) llvm/lib/Target/Hexagon/HexagonPatternsHVX.td (+12)
  • (modified) llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp (+17)
  • (modified) llvm/lib/Target/Hexagon/HexagonVectorCombine.cpp (+21-22)
  • (modified) llvm/lib/Target/Hexagon/MCTargetDesc/HexagonMCELFStreamer.cpp (+5)
  • (modified) llvm/test/CodeGen/Hexagon/autohvx/isel-vpackew.ll (+11-15)
  • (modified) llvm/test/CodeGen/Hexagon/autohvx/widen-setcc.ll (+1-3)
  • (added) llvm/test/CodeGen/Hexagon/bug54537-vavg.ll (+20)
  • (added) llvm/test/CodeGen/Hexagon/extend-multiply-for-output-fpext.ll (+16)
  • (added) llvm/test/CodeGen/Hexagon/no_widening_of_bf16_vecmul.ll (+60)
  • (added) llvm/test/CodeGen/Hexagon/shortvec-vasrsat.ll (+68)
  • (added) llvm/test/CodeGen/Hexagon/shortvec-vavg.ll (+20)
  • (added) llvm/test/CodeGen/Hexagon/shortvec-vmpy.ll (+27)
  • (added) llvm/test/CodeGen/Hexagon/vadd-const.ll (+114)
  • (added) llvm/test/CodeGen/Hexagon/vasr-sat.ll (+66)
  • (added) llvm/test/CodeGen/Hexagon/vavg.ll (+33)
  • (added) llvm/test/CodeGen/Hexagon/vec-shuff-invalid-operand.ll (+32)
  • (added) llvm/test/CodeGen/Hexagon/vec-shuff-multi-uses.ll (+290)
  • (added) llvm/test/CodeGen/Hexagon/vec-shuff2.ll (+106)
  • (added) llvm/test/CodeGen/Hexagon/vmpa.ll (+64)
  • (added) llvm/test/CodeGen/Hexagon/vmpy-const.ll (+273)
  • (added) llvm/test/CodeGen/Hexagon/vmpy-qfp-const.ll (+71)
  • (added) llvm/test/CodeGen/Hexagon/vsub-const.ll (+112)
  • (added) llvm/test/CodeGen/Hexagon/widening-float-vec.ll (+15)
  • (added) llvm/test/CodeGen/Hexagon/widening-vec.ll (+96)
  • (added) llvm/test/CodeGen/Hexagon/widening-vec2.ll (+23)
diff --git a/llvm/include/llvm/IR/IntrinsicsHexagon.td b/llvm/include/llvm/IR/IntrinsicsHexagon.td
index 20ba51ade35a7..2c945d2399b25 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagon.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagon.td
@@ -14,7 +14,7 @@
 //
 // All Hexagon intrinsics start with "llvm.hexagon.".
 let TargetPrefix = "hexagon" in {
-  /// Hexagon_Intrinsic - Base class for the majority of Hexagon intrinsics.
+  /// Hexagon_Intrinsic - Base class for majority of Hexagon intrinsics.
   class Hexagon_Intrinsic<string GCCIntSuffix, list<LLVMType> ret_types,
                               list<LLVMType> param_types,
                               list<IntrinsicProperty> properties>
@@ -435,6 +435,84 @@ def int_hexagon_V6_vmaskedstorenq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentq_128B: Hexagon_custom_vms_Intrinsic_128B;
 def int_hexagon_V6_vmaskedstorentnq_128B: Hexagon_custom_vms_Intrinsic_128B;
 
+// Carryo
+// The script can't autogenerate clang builtins for vaddcarryo/vsubarryo,
+// and they are marked in HexagonIset.py as not having intrinsics at all.
+// The script could generate intrinsics, but instead of doing intrinsics
+// without builtins, just put the intrinsics here.
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
+       intr_properties>;
+
+// tag : V6_vaddcarryo
+class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
+      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
+       intr_properties>;
+
+// Pseudo intrinsics for widening vector isntructions that
+// get replaced with the real Hexagon instructions during
+// instruction lowering.
+class Hexagon_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, LLVMTruncatedType<0>],
+       [IntrNoMem]>;
+
+class Hexagon_non_widenvec_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMMatchType<0>, LLVMMatchType<0>],
+       [IntrNoMem]>;
+
+// Widening vector add
+def int_hexagon_vadd_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vadd_us: Hexagon_widenvec_Intrinsic;
+
+
+// Widening vector subtract
+def int_hexagon_vsub_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vsub_us: Hexagon_widenvec_Intrinsic;
+
+// Widening vector multiply
+def int_hexagon_vmpy_su: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_uu: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_ss: Hexagon_widenvec_Intrinsic;
+def int_hexagon_vmpy_us: Hexagon_widenvec_Intrinsic;
+
+def int_hexagon_vavgu: Hexagon_non_widenvec_Intrinsic;
+def int_hexagon_vavgs: Hexagon_non_widenvec_Intrinsic;
+
+class Hexagon_vasr_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [LLVMSubdivide2VectorType<0>],
+       [llvm_anyvector_ty, LLVMMatchType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+def int_hexagon_vasrsat_su: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_uu: Hexagon_vasr_Intrinsic;
+def int_hexagon_vasrsat_ss: Hexagon_vasr_Intrinsic;
+
+class Hexagon_widen_vec_scalar_Intrinsic
+  : Hexagon_NonGCC_Intrinsic<
+       [llvm_anyvector_ty],
+       [LLVMTruncatedType<0>, llvm_i32_ty],
+       [IntrNoMem]>;
+
+// Widening vector scalar multiply
+def int_hexagon_vmpy_ub_b: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_ub_ub: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_uh_uh: Hexagon_widen_vec_scalar_Intrinsic;
+def int_hexagon_vmpy_h_h: Hexagon_widen_vec_scalar_Intrinsic;
 
 // Intrinsic for instrumentation based profiling using a custom handler. The
 // name of the handler is passed as the first operand to the intrinsic. The
diff --git a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
index dde4132791f06..2a673603e4e03 100644
--- a/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
+++ b/llvm/include/llvm/IR/IntrinsicsHexagonDep.td
@@ -491,20 +491,6 @@ class Hexagon_custom_v32i32v128i1_v32i32v32i32v128i1_Intrinsic_128B<
        [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty,llvm_v128i1_ty],
        intr_properties>;
 
-// tag : V6_vaddcarryo
-class Hexagon_custom_v16i32v64i1_v16i32v16i32_Intrinsic<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v16i32_ty,llvm_v64i1_ty], [llvm_v16i32_ty,llvm_v16i32_ty],
-       intr_properties>;
-
-// tag : V6_vaddcarryo
-class Hexagon_custom_v32i32v128i1_v32i32v32i32_Intrinsic_128B<
-      list<IntrinsicProperty> intr_properties = [IntrNoMem]>
-  : Hexagon_NonGCC_Intrinsic<
-       [llvm_v32i32_ty,llvm_v128i1_ty], [llvm_v32i32_ty,llvm_v32i32_ty],
-       intr_properties>;
-
 // tag : V6_vaddcarrysat
 class Hexagon_v16i32_v16i32v16i32v64i1_Intrinsic<string GCCIntSuffix,
       list<IntrinsicProperty> intr_properties = [IntrNoMem]>
diff --git a/llvm/lib/Target/Hexagon/CMakeLists.txt b/llvm/lib/Target/Hexagon/CMakeLists.txt
index 1a5f09642ea66..eddab5a235dab 100644
--- a/llvm/lib/Target/Hexagon/CMakeLists.txt
+++ b/llvm/lib/Target/Hexagon/CMakeLists.txt
@@ -37,6 +37,8 @@ add_llvm_target(HexagonCodeGen
   HexagonGenMemAbsolute.cpp
   HexagonGenMux.cpp
   HexagonGenPredicate.cpp
+  HexagonGenWideningVecFloatInstr.cpp
+  HexagonGenWideningVecInstr.cpp
   HexagonHardwareLoops.cpp
   HexagonHazardRecognizer.cpp
   HexagonInstrInfo.cpp
@@ -53,6 +55,7 @@ add_llvm_target(HexagonCodeGen
   HexagonNewValueJump.cpp
   HexagonOptAddrMode.cpp
   HexagonOptimizeSZextends.cpp
+  HexagonOptShuffleVector.cpp
   HexagonPeephole.cpp
   HexagonQFPOptimizer.cpp
   HexagonRDFOpt.cpp
diff --git a/llvm/lib/Target/Hexagon/Hexagon.h b/llvm/lib/Target/Hexagon/Hexagon.h
index 422ab20891b94..b98369d1b3e30 100644
--- a/llvm/lib/Target/Hexagon/Hexagon.h
+++ b/llvm/lib/Target/Hexagon/Hexagon.h
@@ -92,6 +92,9 @@ FunctionPass *createHexagonGenInsert();
 FunctionPass *createHexagonGenMemAbsolute();
 FunctionPass *createHexagonGenMux();
 FunctionPass *createHexagonGenPredicate();
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+FunctionPass *createHexagonGenWideningVecInstr(const HexagonTargetMachine &);
 FunctionPass *createHexagonHardwareLoops();
 FunctionPass *createHexagonISelDag(HexagonTargetMachine &TM,
                                    CodeGenOptLevel OptLevel);
@@ -102,6 +105,7 @@ FunctionPass *createHexagonMergeActivateWeight();
 FunctionPass *createHexagonNewValueJump();
 FunctionPass *createHexagonOptAddrMode();
 FunctionPass *createHexagonOptimizeSZextends();
+FunctionPass *createHexagonOptShuffleVector(const HexagonTargetMachine &);
 FunctionPass *createHexagonPacketizer(bool Minimal);
 FunctionPass *createHexagonPeephole();
 FunctionPass *createHexagonRDFOpt();
diff --git a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
index 5344ed8446efc..412d58743df94 100644
--- a/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonGenPredicate.cpp
@@ -51,8 +51,7 @@ struct PrintRegister {
 };
 
 [[maybe_unused]] raw_ostream &operator<<(raw_ostream &OS,
-                                         const PrintRegister &PR);
-raw_ostream &operator<<(raw_ostream &OS, const PrintRegister &PR) {
+                                         const PrintRegister &PR) {
   return OS << printReg(PR.Reg.Reg, &PR.TRI, PR.Reg.SubReg);
 }
 
diff --git a/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
new file mode 100644
index 0000000000000..7271f1f839d69
--- /dev/null
+++ b/llvm/lib/Target/Hexagon/HexagonGenWideningVecFloatInstr.cpp
@@ -0,0 +1,565 @@
+//===------------------- HexagonGenWideningVecFloatInstr.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Replace widening vector float operations with hexagon intrinsics.
+//
+//===----------------------------------------------------------------------===//
+//
+// Brief overview of working of GenWideningVecFloatInstr pass.
+// This version of pass is replica of already existing pass(which will replace
+// widen vector integer operations with it's respective intrinsics). In this
+// pass we will generate hexagon intrinsics for widen vector float instructions.
+//
+// Example1(64 vector-width widening):
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %1 = fpext <64 x half> %wide.load to <64 x float>
+// %3 = fpext <64 x half> %wide.load53 to <64 x float>
+// %4 = fmul <64 x float> %1, %3
+//
+// If we run this pass on the above example, it will first find fmul
+// instruction, and then it will check whether the operands of fmul instruction
+// (%1 and %3) belongs to either of these categories [%1 ->fpext, %3 ->fpext]
+// or [%1 ->fpext, %3 ->constant_vector] or [%1 ->constant_vector, %3 ->fpext].
+// If it sees such pattern, then this pass will replace such pattern with
+// appropriate hexagon intrinsics.
+//
+// After replacement:
+// %wide.load = load <64 x half>, <64 x half>* %0, align 2
+// %wide.load53 = load <64 x half>, <64 x half>* %2, align 2
+// %3 = bitcast <64 x half> %wide.load to <32 x i32>
+// %4 = bitcast <64 x half> %wide.load53 to <32 x i32>
+// %5 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %4)
+// %6 = shufflevector <64 x i32> %5, <64 x i32> poison, <64 x i32> ShuffMask1
+// %7 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %6)
+// %8 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %6)
+// %9 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %7)
+// %10 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %8)
+// %11 = bitcast <32 x i32> %9 to <32 x float>
+// %12 = bitcast <32 x i32> %10 to <32 x float>
+// %13 = shufflevector <32 x float> %12, <32 x float> %11, <64 x i32> ShuffMask2
+//
+//
+//
+// Example2(128 vector-width widening):
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = fpext <128 x half> %wide.load to <128 x float>
+// %2 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %2, align 2
+// %3 = fpext <128 x half> %wide.load2 to <128 x float>
+// %4 = fmul <128 x float> %1, %3
+//
+// After replacement:
+// %0 = bitcast half* %a to <128 x half>*
+// %wide.load = load <128 x half>, <128 x half>* %0, align 2
+// %1 = bitcast half* %b to <128 x half>*
+// %wide.load2 = load <128 x half>, <128 x half>* %1, align 2
+// %2 = bitcast <128 x half> %wide.load to <64 x i32>
+// %3 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %2)
+// %4 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %2)
+// %5 = bitcast <128 x half> %wide.load2 to <64 x i32>
+// %6 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %5)
+// %7 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %5)
+// %8 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%3, %6)
+// %9 = shufflevector <64 x i32> %8, <64 x i32> poison, <64 x i32> Mask1
+// %10 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %9)
+// %11 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %9)
+// %12 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %10)
+// %13 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %11)
+// %14 = bitcast <32 x i32> %12 to <32 x float>
+// %15 = bitcast <32 x i32> %13 to <32 x float>
+// %16 = shufflevector <32 x float> %15, <32 x float> %14, <64 x i32> Mask2
+// %17 = call <64 x i32> @llvm.hexagon.V6.vmpy.qf32.hf.128B(%4, %7)
+// %18 = shufflevector <64 x i32> %17, <64 x i32> poison, <64 x i32> Mask1
+// %19 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %18)
+// %20 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %18)
+// %21 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %19)
+// %22 = call <32 x i32> @llvm.hexagon.V6.vconv.sf.qf32.128B(<32 x i32> %20)
+// %23 = bitcast <32 x i32> %21 to <32 x float>
+// %24 = bitcast <32 x i32> %22 to <32 x float>
+// %25 = shufflevector <32 x float> %24, <32 x float> %23, <64 x i32> Mask2
+// %26 = shufflevector <64 x float> %25, <64 x float> %16, <128 x i32> Mask3
+//
+//
+//===----------------------------------------------------------------------===//
+#include "HexagonTargetMachine.h"
+#include "llvm/ADT/APInt.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicsHexagon.h"
+#include "llvm/IR/PatternMatch.h"
+#include "llvm/IR/Type.h"
+#include "llvm/IR/Value.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include <algorithm>
+#include <utility>
+
+using namespace llvm;
+
+namespace llvm {
+void initializeHexagonGenWideningVecFloatInstrPass(PassRegistry &);
+FunctionPass *
+createHexagonGenWideningVecFloatInstr(const HexagonTargetMachine &);
+} // end namespace llvm
+
+namespace {
+
+class HexagonGenWideningVecFloatInstr : public FunctionPass {
+public:
+  static char ID;
+
+  HexagonGenWideningVecFloatInstr() : FunctionPass(ID) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  HexagonGenWideningVecFloatInstr(const HexagonTargetMachine *TM)
+      : FunctionPass(ID), TM(TM) {
+    initializeHexagonGenWideningVecFloatInstrPass(
+        *PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+    return "Hexagon generate widening vector float instructions";
+  }
+
+  bool runOnFunction(Function &F) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    FunctionPass::getAnalysisUsage(AU);
+  }
+
+private:
+  Module *M = nullptr;
+  const HexagonTargetMachine *TM = nullptr;
+  const HexagonSubtarget *HST = nullptr;
+  unsigned HwVLen;
+  unsigned NumHalfEltsInFullVec;
+
+  struct OPInfo {
+    Value *OP;
+    Value *ExtInOP;
+    unsigned ExtInSize;
+  };
+
+  bool visitBlock(BasicBlock *B);
+  bool processInstruction(Instruction *Inst);
+  bool replaceWithIntrinsic(Instruction *Inst, OPInfo &OP1Info,
+                            OPInfo &OP2Info);
+
+  bool getOperandInfo(Value *V, OPInfo &OPI);
+  bool isExtendedConstant(Constant *C);
+  unsigned getElementSizeInBits(Value *V);
+  Type *getElementTy(unsigned size, IRBuilder<> &IRB);
+
+  Value *adjustExtensionForOp(OPInfo &OPI, IRBuilder<> &IRB,
+                              unsigned NewEltsize, unsigned NumElts);
+
+  std::pair<Value *, Value *> opSplit(Value *OP, Instruction *Inst);
+
+  Value *createIntrinsic(Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1,
+                         Value *NewOP2, FixedVectorType *ResType,
+                         unsigned NumElts, bool BitCastOp);
+};
+
+} // end anonymous namespace
+
+char HexagonGenWideningVecFloatInstr::ID = 0;
+
+INITIALIZE_PASS_BEGIN(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                      "Hexagon generate "
+                      "widening vector float instructions",
+                      false, false)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_PASS_END(HexagonGenWideningVecFloatInstr, "widening-vec-float",
+                    "Hexagon generate "
+                    "widening vector float instructions",
+                    false, false)
+
+bool HexagonGenWideningVecFloatInstr::isExtendedConstant(Constant *C) {
+  if (Value *SplatV = C->getSplatValue()) {
+    if (auto *CFP = dyn_cast<ConstantFP>(SplatV)) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts == APFloat::opStatus::opOK || sts == APFloat::opStatus::opInexact)
+        return true;
+    }
+    return false;
+  }
+  unsigned NumElts = cast<FixedVectorType>(C->getType())->getNumElements();
+  for (unsigned i = 0, e = NumElts; i != e; ++i) {
+    if (auto *CFP = dyn_cast<ConstantFP>(C->getAggregateElement(i))) {
+      bool Ignored;
+      APFloat APF = CFP->getValueAPF();
+      APFloat::opStatus sts = APF.convert(
+          APFloat::IEEEhalf(), APFloat::rmNearestTiesToEven, &Ignored);
+      if (sts != APFloat::opStatus::opOK && sts != APFloat::opStatus::opInexact)
+        return false;
+      continue;
+    }
+    return false;
+  }
+  return true;
+}
+
+unsigned HexagonGenWideningVecFloatInstr::getElementSizeInBits(Value *V) {
+  Type *ValTy = V->getType();
+  Type *EltTy = ValTy;
+  if (dyn_cast<Constant>(V)) {
+    unsigned EltSize =
+        cast<VectorType>(EltTy)->getElementType()->getPrimitiveSizeInBits();
+    unsigned ReducedSize = EltSize / 2;
+
+    return ReducedSize;
+  }
+
+  if (ValTy->isVectorTy())
+    EltTy = cast<VectorType>(ValTy)->getElementType();
+  return EltTy->getPrimitiveSizeInBits();
+}
+
+bool HexagonGenWideningVecFloatInstr::getOperandInfo(Value *V, OPInfo &OPI) {
+  using namespace PatternMatch;
+  OPI.OP = V;
+  Value *ExtV = nullptr;
+  Constant *C = nullptr;
+
+  if (match(V, (m_FPExt(m_Value(ExtV)))) ||
+      match(V,
+            m_Shuffle(m_InsertElt(m_Poison(), m_FPExt(m_Value(ExtV)), m_Zero()),
+                      m_Poison(), m_ZeroMask()))) {
+
+    if (auto *ExtVType = dyn_cast<VectorType>(ExtV->getType())) {
+      // Matches the first branch.
+      if (ExtVType->getElementType()->isBFloatTy())
+        // do not confuse bf16 with ieee-fp16.
+        return false;
+    } else {
+      // Matches the second branch (insert element branch)
+      if (ExtV->getType()->isBFloatTy())
+        return false;
+    }
+
+    OPI.ExtInOP = ExtV;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  if (match(V, m_Constant(C))) {
+    if (!isExtendedConstant(C))
+      return false;
+    OPI.ExtInOP = C;
+    OPI.ExtInSize = getElementSizeInBits(OPI.ExtInOP);
+    return true;
+  }
+
+  return false;
+}
+
+Type *HexagonGenWideningVecFloatInstr::getElementTy(unsigned size,
+                                                    IRBuilder<> &IRB) {
+  switch (size) {
+  case 16:
+    return IRB.getHalfTy();
+  case 32:
+    return IRB.getFloatTy();
+  default:
+    llvm_unreachable("Unhandled Element size");
+  }
+}
+
+Value *HexagonGenWideningVecFloatInstr::adjustExtensionForOp(
+    OPInfo &OPI, IRBuilder<> &IRB, unsigned NewExtSize, unsigned NumElts) {
+  Value *V = OPI.ExtInOP;
+  unsigned EltSize = getElementSizeInBits(OPI.ExtInOP);
+  assert(NewExtSize >= EltSize);
+  Type *EltType = getElementTy(NewExtSize, IRB);
+  auto *NewOpTy = FixedVectorType::get(EltType, NumElts);
+
+  if (auto *C = dyn_cast<Constant>(V))
+    return IRB.CreateFPTrunc(C, NewOpTy);
+
+  if (V->getType()->isVectorTy())
+    if (NewExtSize == EltSize)
+      return V;
+
+  return nullptr;
+}
+
+std::pair<Value *, Value *>
+HexagonGenWideningVecFloatInstr::opSplit(Value *OP, Instruction *Inst) {
+  Type *InstTy = Inst->getType();
+  unsigned NumElts = cast<FixedVectorType>(InstTy)->getNumElements();
+  IRBuilder<> IRB(Inst);
+  Intrinsic::ID IntHi = Intrinsic::hexagon_V6_hi_128B;
+  Intrinsic::ID IntLo = Intrinsic::hexagon_V6_lo_128B;
+  Function *ExtFHi = Intrinsic::getOrInsertDeclaration(M, IntHi);
+  Function *ExtFLo = Intrinsic::getOrInsertDeclaration(M, IntLo);
+  if (NumElts == 128) {
+    auto *InType = FixedVectorType::get(IRB.getInt32Ty(), 64);
+    OP = IRB.CreateBitCast(OP, InType);
+  }
+  Value *OP1Hi = IRB.CreateCall(ExtFHi, {OP});
+  Value *OP1Lo = IRB.CreateCall(ExtFLo, {OP});
+  return std::pair<Value *, Value *>(OP1Hi, OP1Lo);
+}
+
+Value *HexagonGenWideningVecFloatInstr::createIntrinsic(
+    Intrinsic::ID IntId, Instruction *Inst, Value *NewOP1, Value *NewOP2,
+    FixedVectorType *ResType, unsigned NumElts, bool BitCastO...
[truncated]

@fhossein-quic fhossein-quic changed the title Passes for widening vector operations and shuffle opt [Hexagon] Passes for widening vector operations and shuffle opt Dec 1, 2025
@iajbar
Copy link
Contributor

iajbar commented Dec 1, 2025

This is not needed "Change-Id: I1f6c146bd70ffd1ea42b614fa22fad04d16d6c35"

@fhossein-quic fhossein-quic force-pushed the PR_GenVecWid branch 2 times, most recently from 441c18b to d7b5116 Compare December 1, 2025 20:49
Introduce Hexagon-specific passes to generate widening vector
instructions for integer and floating-point operations using generic
LLVM intrinsics. This enables widening operations for short vectors
and improves type legalization by allowing operands to be widened
to appropriate types. The patch also includes a shuffle optimization
pass to relocate and validate shufflevector instructions during
widening legalization.

Co-authored-by: Jyotsna Verma <[email protected]>
Co-authored-by: Yashas Andaluri <[email protected]>
Co-authored-by: Fateme Hosseini <[email protected]>
Co-authored-by: Muntasir Mallick <[email protected]>
Co-authored-by: Tatiana Larina <[email protected]>
Co-authored-by: Kaushik Kulkarni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants