-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[SPIR-V] Legalize vector arithmetic and intrinsics for large vectors #170668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This patch improves the legalization of vector operations, particularly focusing on vectors that exceed the maximum supported size (e.g., 4 elements for shaders). This includes better handling for insert and extract element operations, which facilitates the legalization of loads and stores for long vectors—a common pattern when compiling HLSL matrices with Clang. Key changes include: - Adding legalization rules for G_FMA, G_INSERT_VECTOR_ELT, and various arithmetic operations to handle splitting of large vectors. - Updating G_CONCAT_VECTORS and G_SPLAT_VECTOR to be legal for allowed types. - Implementing custom legalization for G_INSERT_VECTOR_ELT using the spv_insertelt intrinsic. - Enhancing SPIRVPostLegalizer to deduce types for arithmetic instructions and vector element intrinsics (spv_insertelt, spv_extractelt). - Refactoring legalizeIntrinsic to uniformly handle vector legalization requirements. The strategy for insert and extract operations mirrors that of bitcasts: incoming intrinsics are converted to generic MIR instructions (G_INSERT_VECTOR_ELT and G_EXTRACT_VECTOR_ELT) to leverage standard legalization rules (like splitting). After legalization, they are converted back to their respective SPIR-V intrinsics (spv_insertelt, spv_extractelt) because later passes in the backend expect these intrinsics rather than the generic instructions. This ensures that operations on large vectors (e.g., <16 x float>) are correctly broken down into legal sub-vectors.
|
@llvm/pr-subscribers-backend-spir-v Author: Steven Perron (s-perron) ChangesThis patch improves the legalization of vector operations, particularly Key changes include:
The strategy for insert and extract operations mirrors that of bitcasts: This ensures that operations on large vectors (e.g., <16 x float>) are Patch is 29.59 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170668.diff 4 Files Affected:
diff --git a/llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp b/llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
index b5912c27316c9..4d83649c0f84f 100644
--- a/llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVLegalizerInfo.cpp
@@ -113,6 +113,8 @@ SPIRVLegalizerInfo::SPIRVLegalizerInfo(const SPIRVSubtarget &ST) {
v3s1, v3s8, v3s16, v3s32, v3s64,
v4s1, v4s8, v4s16, v4s32, v4s64};
+ auto allScalars = {s1, s8, s16, s32};
+
auto allScalarsAndVectors = {
s1, s8, s16, s32, s64, v2s1, v2s8, v2s16, v2s32, v2s64,
v3s1, v3s8, v3s16, v3s32, v3s64, v4s1, v4s8, v4s16, v4s32, v4s64,
@@ -172,9 +174,25 @@ SPIRVLegalizerInfo::SPIRVLegalizerInfo(const SPIRVSubtarget &ST) {
for (auto Opc : getTypeFoldingSupportedOpcodes()) {
if (Opc != G_EXTRACT_VECTOR_ELT)
- getActionDefinitionsBuilder(Opc).custom();
+ getActionDefinitionsBuilder(Opc)
+ .customFor(allScalars)
+ .customFor(allowedVectorTypes)
+ .moreElementsToNextPow2(0)
+ .fewerElementsIf(vectorElementCountIsGreaterThan(0, MaxVectorSize),
+ LegalizeMutations::changeElementCountTo(
+ 0, ElementCount::getFixed(MaxVectorSize)))
+ .custom();
}
+ getActionDefinitionsBuilder(TargetOpcode::G_FMA)
+ .legalFor(allScalars)
+ .legalFor(allowedVectorTypes)
+ .moreElementsToNextPow2(0)
+ .fewerElementsIf(vectorElementCountIsGreaterThan(0, MaxVectorSize),
+ LegalizeMutations::changeElementCountTo(
+ 0, ElementCount::getFixed(MaxVectorSize)))
+ .alwaysLegal();
+
getActionDefinitionsBuilder(G_INTRINSIC_W_SIDE_EFFECTS).custom();
getActionDefinitionsBuilder(G_SHUFFLE_VECTOR)
@@ -192,6 +210,13 @@ SPIRVLegalizerInfo::SPIRVLegalizerInfo(const SPIRVSubtarget &ST) {
1, ElementCount::getFixed(MaxVectorSize)))
.custom();
+ getActionDefinitionsBuilder(G_INSERT_VECTOR_ELT)
+ .moreElementsToNextPow2(0)
+ .fewerElementsIf(vectorElementCountIsGreaterThan(0, MaxVectorSize),
+ LegalizeMutations::changeElementCountTo(
+ 0, ElementCount::getFixed(MaxVectorSize)))
+ .custom();
+
// Illegal G_UNMERGE_VALUES instructions should be handled
// during the combine phase.
getActionDefinitionsBuilder(G_BUILD_VECTOR)
@@ -215,14 +240,13 @@ SPIRVLegalizerInfo::SPIRVLegalizerInfo(const SPIRVSubtarget &ST) {
.lowerIf(vectorElementCountIsGreaterThan(1, MaxVectorSize))
.custom();
+ // If the result is still illegal, the combiner should be able to remove it.
getActionDefinitionsBuilder(G_CONCAT_VECTORS)
- .legalIf(vectorElementCountIsLessThanOrEqualTo(0, MaxVectorSize))
- .moreElementsToNextPow2(0)
- .lowerIf(vectorElementCountIsGreaterThan(0, MaxVectorSize))
- .alwaysLegal();
+ .legalForCartesianProduct(allowedVectorTypes, allowedVectorTypes)
+ .moreElementsToNextPow2(0);
getActionDefinitionsBuilder(G_SPLAT_VECTOR)
- .legalIf(vectorElementCountIsLessThanOrEqualTo(0, MaxVectorSize))
+ .legalFor(allowedVectorTypes)
.moreElementsToNextPow2(0)
.fewerElementsIf(vectorElementCountIsGreaterThan(0, MaxVectorSize),
LegalizeMutations::changeElementSizeTo(0, MaxVectorSize))
@@ -458,6 +482,23 @@ static bool legalizeExtractVectorElt(LegalizerHelper &Helper, MachineInstr &MI,
return true;
}
+static bool legalizeInsertVectorElt(LegalizerHelper &Helper, MachineInstr &MI,
+ SPIRVGlobalRegistry *GR) {
+ MachineIRBuilder &MIRBuilder = Helper.MIRBuilder;
+ Register DstReg = MI.getOperand(0).getReg();
+ Register SrcReg = MI.getOperand(1).getReg();
+ Register ValReg = MI.getOperand(2).getReg();
+ Register IdxReg = MI.getOperand(3).getReg();
+
+ MIRBuilder
+ .buildIntrinsic(Intrinsic::spv_insertelt, ArrayRef<Register>{DstReg})
+ .addUse(SrcReg)
+ .addUse(ValReg)
+ .addUse(IdxReg);
+ MI.eraseFromParent();
+ return true;
+}
+
static Register convertPtrToInt(Register Reg, LLT ConvTy, SPIRVType *SpvType,
LegalizerHelper &Helper,
MachineRegisterInfo &MRI,
@@ -483,6 +524,8 @@ bool SPIRVLegalizerInfo::legalizeCustom(
return legalizeBitcast(Helper, MI);
case TargetOpcode::G_EXTRACT_VECTOR_ELT:
return legalizeExtractVectorElt(Helper, MI, GR);
+ case TargetOpcode::G_INSERT_VECTOR_ELT:
+ return legalizeInsertVectorElt(Helper, MI, GR);
case TargetOpcode::G_INTRINSIC:
case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS:
return legalizeIntrinsic(Helper, MI);
@@ -512,6 +555,15 @@ bool SPIRVLegalizerInfo::legalizeCustom(
}
}
+static bool needsVectorLegalization(const LLT &Ty, const SPIRVSubtarget &ST) {
+ if (!Ty.isVector())
+ return false;
+ unsigned NumElements = Ty.getNumElements();
+ unsigned MaxVectorSize = ST.isShader() ? 4 : 16;
+ return (NumElements > 4 && !isPowerOf2_32(NumElements)) ||
+ NumElements > MaxVectorSize;
+}
+
bool SPIRVLegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
MachineInstr &MI) const {
LLVM_DEBUG(dbgs() << "legalizeIntrinsic: " << MI);
@@ -528,41 +580,38 @@ bool SPIRVLegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
LLT DstTy = MRI.getType(DstReg);
LLT SrcTy = MRI.getType(SrcReg);
- int32_t MaxVectorSize = ST.isShader() ? 4 : 16;
-
- bool DstNeedsLegalization = false;
- bool SrcNeedsLegalization = false;
-
- if (DstTy.isVector()) {
- if (DstTy.getNumElements() > 4 &&
- !isPowerOf2_32(DstTy.getNumElements())) {
- DstNeedsLegalization = true;
- }
-
- if (DstTy.getNumElements() > MaxVectorSize) {
- DstNeedsLegalization = true;
- }
- }
-
- if (SrcTy.isVector()) {
- if (SrcTy.getNumElements() > 4 &&
- !isPowerOf2_32(SrcTy.getNumElements())) {
- SrcNeedsLegalization = true;
- }
-
- if (SrcTy.getNumElements() > MaxVectorSize) {
- SrcNeedsLegalization = true;
- }
- }
-
// If an spv_bitcast needs to be legalized, we convert it to G_BITCAST to
// allow using the generic legalization rules.
- if (DstNeedsLegalization || SrcNeedsLegalization) {
+ if (needsVectorLegalization(DstTy, ST) ||
+ needsVectorLegalization(SrcTy, ST)) {
LLVM_DEBUG(dbgs() << "Replacing with a G_BITCAST\n");
MIRBuilder.buildBitcast(DstReg, SrcReg);
MI.eraseFromParent();
}
return true;
+ } else if (IntrinsicID == Intrinsic::spv_insertelt) {
+ Register DstReg = MI.getOperand(0).getReg();
+ LLT DstTy = MRI.getType(DstReg);
+
+ if (needsVectorLegalization(DstTy, ST)) {
+ Register SrcReg = MI.getOperand(2).getReg();
+ Register ValReg = MI.getOperand(3).getReg();
+ Register IdxReg = MI.getOperand(4).getReg();
+ MIRBuilder.buildInsertVectorElement(DstReg, SrcReg, ValReg, IdxReg);
+ MI.eraseFromParent();
+ }
+ return true;
+ } else if (IntrinsicID == Intrinsic::spv_extractelt) {
+ Register SrcReg = MI.getOperand(2).getReg();
+ LLT SrcTy = MRI.getType(SrcReg);
+
+ if (needsVectorLegalization(SrcTy, ST)) {
+ Register DstReg = MI.getOperand(0).getReg();
+ Register IdxReg = MI.getOperand(3).getReg();
+ MIRBuilder.buildExtractVectorElement(DstReg, SrcReg, IdxReg);
+ MI.eraseFromParent();
+ }
+ return true;
}
return true;
}
diff --git a/llvm/lib/Target/SPIRV/SPIRVPostLegalizer.cpp b/llvm/lib/Target/SPIRV/SPIRVPostLegalizer.cpp
index c90e6d8cfbfb4..d91016a38539b 100644
--- a/llvm/lib/Target/SPIRV/SPIRVPostLegalizer.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVPostLegalizer.cpp
@@ -16,6 +16,7 @@
#include "SPIRV.h"
#include "SPIRVSubtarget.h"
#include "SPIRVUtils.h"
+#include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
#include "llvm/IR/IntrinsicsSPIRV.h"
#include "llvm/Support/Debug.h"
#include <stack>
@@ -66,8 +67,9 @@ static bool deduceAndAssignTypeForGUnmerge(MachineInstr *I, MachineFunction &MF,
for (unsigned i = 0; i < I->getNumDefs() && !ScalarType; ++i) {
for (const auto &Use :
MRI.use_nodbg_instructions(I->getOperand(i).getReg())) {
- assert(Use.getOpcode() == TargetOpcode::G_BUILD_VECTOR &&
- "Expected use of G_UNMERGE_VALUES to be a G_BUILD_VECTOR");
+ if (Use.getOpcode() != TargetOpcode::G_BUILD_VECTOR)
+ continue;
+
if (auto *VecType =
GR->getSPIRVTypeForVReg(Use.getOperand(0).getReg())) {
ScalarType = GR->getScalarOrVectorComponentType(VecType);
@@ -133,10 +135,10 @@ static SPIRVType *deduceTypeFromOperandRange(MachineInstr *I,
return ResType;
}
-static SPIRVType *deduceTypeForResultRegister(MachineInstr *Use,
- Register UseRegister,
- SPIRVGlobalRegistry *GR,
- MachineIRBuilder &MIB) {
+static SPIRVType *deduceTypeFromResultRegister(MachineInstr *Use,
+ Register UseRegister,
+ SPIRVGlobalRegistry *GR,
+ MachineIRBuilder &MIB) {
for (const MachineOperand &MO : Use->defs()) {
if (!MO.isReg())
continue;
@@ -159,16 +161,43 @@ static SPIRVType *deduceTypeFromUses(Register Reg, MachineFunction &MF,
MachineRegisterInfo &MRI = MF.getRegInfo();
for (MachineInstr &Use : MRI.use_nodbg_instructions(Reg)) {
SPIRVType *ResType = nullptr;
+ LLVM_DEBUG(dbgs() << "Looking at use " << Use);
switch (Use.getOpcode()) {
case TargetOpcode::G_BUILD_VECTOR:
case TargetOpcode::G_EXTRACT_VECTOR_ELT:
case TargetOpcode::G_UNMERGE_VALUES:
- LLVM_DEBUG(dbgs() << "Looking at use " << Use << "\n");
- ResType = deduceTypeForResultRegister(&Use, Reg, GR, MIB);
+ case TargetOpcode::G_ADD:
+ case TargetOpcode::G_SUB:
+ case TargetOpcode::G_MUL:
+ case TargetOpcode::G_SDIV:
+ case TargetOpcode::G_UDIV:
+ case TargetOpcode::G_SREM:
+ case TargetOpcode::G_UREM:
+ case TargetOpcode::G_FADD:
+ case TargetOpcode::G_FSUB:
+ case TargetOpcode::G_FMUL:
+ case TargetOpcode::G_FDIV:
+ case TargetOpcode::G_FREM:
+ case TargetOpcode::G_FMA:
+ ResType = deduceTypeFromResultRegister(&Use, Reg, GR, MIB);
+ break;
+ case TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS:
+ case TargetOpcode::G_INTRINSIC: {
+ auto IntrinsicID = cast<GIntrinsic>(Use).getIntrinsicID();
+ if (IntrinsicID == Intrinsic::spv_insertelt) {
+ if (Reg == Use.getOperand(2).getReg())
+ ResType = deduceTypeFromResultRegister(&Use, Reg, GR, MIB);
+ } else if (IntrinsicID == Intrinsic::spv_extractelt) {
+ if (Reg == Use.getOperand(2).getReg())
+ ResType = deduceTypeFromResultRegister(&Use, Reg, GR, MIB);
+ }
break;
}
- if (ResType)
+ }
+ if (ResType) {
+ LLVM_DEBUG(dbgs() << "Deduced type from use " << *ResType);
return ResType;
+ }
}
return nullptr;
}
diff --git a/llvm/test/CodeGen/SPIRV/legalization/load-store-global.ll b/llvm/test/CodeGen/SPIRV/legalization/load-store-global.ll
new file mode 100644
index 0000000000000..468d3ded4c306
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/legalization/load-store-global.ll
@@ -0,0 +1,194 @@
+; RUN: llc -O0 -verify-machineinstrs -mtriple=spirv-unknown-vulkan %s -o - | FileCheck %s
+; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-vulkan %s -o - -filetype=obj | spirv-val %}
+
+; CHECK-DAG: OpName %[[#test_int32_double_conversion:]] "test_int32_double_conversion"
+; CHECK-DAG: %[[#int:]] = OpTypeInt 32 0
+; CHECK-DAG: %[[#v4i32:]] = OpTypeVector %[[#int]] 4
+; CHECK-DAG: %[[#double:]] = OpTypeFloat 64
+; CHECK-DAG: %[[#v4f64:]] = OpTypeVector %[[#double]] 4
+; CHECK-DAG: %[[#v2i32:]] = OpTypeVector %[[#int]] 2
+; CHECK-DAG: %[[#ptr_private_v4i32:]] = OpTypePointer Private %[[#v4i32]]
+; CHECK-DAG: %[[#ptr_private_v4f64:]] = OpTypePointer Private %[[#v4f64]]
+; CHECK-DAG: %[[#global_double:]] = OpVariable %[[#ptr_private_v4f64]] Private
+; CHECK-DAG: %[[#C15:]] = OpConstant %[[#int]] 15{{$}}
+; CHECK-DAG: %[[#C14:]] = OpConstant %[[#int]] 14{{$}}
+; CHECK-DAG: %[[#C13:]] = OpConstant %[[#int]] 13{{$}}
+; CHECK-DAG: %[[#C12:]] = OpConstant %[[#int]] 12{{$}}
+; CHECK-DAG: %[[#C11:]] = OpConstant %[[#int]] 11{{$}}
+; CHECK-DAG: %[[#C10:]] = OpConstant %[[#int]] 10{{$}}
+; CHECK-DAG: %[[#C9:]] = OpConstant %[[#int]] 9{{$}}
+; CHECK-DAG: %[[#C8:]] = OpConstant %[[#int]] 8{{$}}
+; CHECK-DAG: %[[#C7:]] = OpConstant %[[#int]] 7{{$}}
+; CHECK-DAG: %[[#C6:]] = OpConstant %[[#int]] 6{{$}}
+; CHECK-DAG: %[[#C5:]] = OpConstant %[[#int]] 5{{$}}
+; CHECK-DAG: %[[#C4:]] = OpConstant %[[#int]] 4{{$}}
+; CHECK-DAG: %[[#C3:]] = OpConstant %[[#int]] 3{{$}}
+; CHECK-DAG: %[[#C2:]] = OpConstant %[[#int]] 2{{$}}
+; CHECK-DAG: %[[#C1:]] = OpConstant %[[#int]] 1{{$}}
+; CHECK-DAG: %[[#C0:]] = OpConstant %[[#int]] 0{{$}}
+
+@G_16 = internal addrspace(10) global [16 x i32] zeroinitializer
+@G_4_double = internal addrspace(10) global <4 x double> zeroinitializer
+@G_4_int = internal addrspace(10) global <4 x i32> zeroinitializer
+
+
+; This is the way matrices will be represented in HLSL. The memory type will be
+; an array, but it will be loaded as a vector.
+define spir_func void @test_load_store_global() {
+entry:
+; CHECK-DAG: %[[#PTR0:]] = OpAccessChain %[[#ptr_int:]] %[[#G16:]] %[[#C0]]
+; CHECK-DAG: %[[#VAL0:]] = OpLoad %[[#int]] %[[#PTR0]] Aligned 4
+; CHECK-DAG: %[[#PTR1:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C1]]
+; CHECK-DAG: %[[#VAL1:]] = OpLoad %[[#int]] %[[#PTR1]] Aligned 4
+; CHECK-DAG: %[[#PTR2:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C2]]
+; CHECK-DAG: %[[#VAL2:]] = OpLoad %[[#int]] %[[#PTR2]] Aligned 4
+; CHECK-DAG: %[[#PTR3:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C3]]
+; CHECK-DAG: %[[#VAL3:]] = OpLoad %[[#int]] %[[#PTR3]] Aligned 4
+; CHECK-DAG: %[[#PTR4:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C4]]
+; CHECK-DAG: %[[#VAL4:]] = OpLoad %[[#int]] %[[#PTR4]] Aligned 4
+; CHECK-DAG: %[[#PTR5:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C5]]
+; CHECK-DAG: %[[#VAL5:]] = OpLoad %[[#int]] %[[#PTR5]] Aligned 4
+; CHECK-DAG: %[[#PTR6:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C6]]
+; CHECK-DAG: %[[#VAL6:]] = OpLoad %[[#int]] %[[#PTR6]] Aligned 4
+; CHECK-DAG: %[[#PTR7:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C7]]
+; CHECK-DAG: %[[#VAL7:]] = OpLoad %[[#int]] %[[#PTR7]] Aligned 4
+; CHECK-DAG: %[[#PTR8:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C8]]
+; CHECK-DAG: %[[#VAL8:]] = OpLoad %[[#int]] %[[#PTR8]] Aligned 4
+; CHECK-DAG: %[[#PTR9:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C9]]
+; CHECK-DAG: %[[#VAL9:]] = OpLoad %[[#int]] %[[#PTR9]] Aligned 4
+; CHECK-DAG: %[[#PTR10:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C10]]
+; CHECK-DAG: %[[#VAL10:]] = OpLoad %[[#int]] %[[#PTR10]] Aligned 4
+; CHECK-DAG: %[[#PTR11:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C11]]
+; CHECK-DAG: %[[#VAL11:]] = OpLoad %[[#int]] %[[#PTR11]] Aligned 4
+; CHECK-DAG: %[[#PTR12:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C12]]
+; CHECK-DAG: %[[#VAL12:]] = OpLoad %[[#int]] %[[#PTR12]] Aligned 4
+; CHECK-DAG: %[[#PTR13:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C13]]
+; CHECK-DAG: %[[#VAL13:]] = OpLoad %[[#int]] %[[#PTR13]] Aligned 4
+; CHECK-DAG: %[[#PTR14:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C14]]
+; CHECK-DAG: %[[#VAL14:]] = OpLoad %[[#int]] %[[#PTR14]] Aligned 4
+; CHECK-DAG: %[[#PTR15:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C15]]
+; CHECK-DAG: %[[#VAL15:]] = OpLoad %[[#int]] %[[#PTR15]] Aligned 4
+; CHECK-DAG: %[[#INS0:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL0]] %[[#UNDEF:]] 0
+; CHECK-DAG: %[[#INS1:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL1]] %[[#INS0]] 1
+; CHECK-DAG: %[[#INS2:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL2]] %[[#INS1]] 2
+; CHECK-DAG: %[[#INS3:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL3]] %[[#INS2]] 3
+; CHECK-DAG: %[[#INS4:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL4]] %[[#UNDEF]] 0
+; CHECK-DAG: %[[#INS5:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL5]] %[[#INS4]] 1
+; CHECK-DAG: %[[#INS6:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL6]] %[[#INS5]] 2
+; CHECK-DAG: %[[#INS7:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL7]] %[[#INS6]] 3
+; CHECK-DAG: %[[#INS8:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL8]] %[[#UNDEF]] 0
+; CHECK-DAG: %[[#INS9:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL9]] %[[#INS8]] 1
+; CHECK-DAG: %[[#INS10:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL10]] %[[#INS9]] 2
+; CHECK-DAG: %[[#INS11:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL11]] %[[#INS10]] 3
+; CHECK-DAG: %[[#INS12:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL12]] %[[#UNDEF]] 0
+; CHECK-DAG: %[[#INS13:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL13]] %[[#INS12]] 1
+; CHECK-DAG: %[[#INS14:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL14]] %[[#INS13]] 2
+; CHECK-DAG: %[[#INS15:]] = OpCompositeInsert %[[#v4i32]] %[[#VAL15]] %[[#INS14]] 3
+ %0 = load <16 x i32>, ptr addrspace(10) @G_16, align 64
+
+; CHECK-DAG: %[[#PTR0_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C0]]
+; CHECK-DAG: %[[#VAL0_S:]] = OpCompositeExtract %[[#int]] %[[#INS3]] 0
+; CHECK-DAG: OpStore %[[#PTR0_S]] %[[#VAL0_S]] Aligned 64
+; CHECK-DAG: %[[#PTR1_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C1]]
+; CHECK-DAG: %[[#VAL1_S:]] = OpCompositeExtract %[[#int]] %[[#INS3]] 1
+; CHECK-DAG: OpStore %[[#PTR1_S]] %[[#VAL1_S]] Aligned 64
+; CHECK-DAG: %[[#PTR2_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C2]]
+; CHECK-DAG: %[[#VAL2_S:]] = OpCompositeExtract %[[#int]] %[[#INS3]] 2
+; CHECK-DAG: OpStore %[[#PTR2_S]] %[[#VAL2_S]] Aligned 64
+; CHECK-DAG: %[[#PTR3_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C3]]
+; CHECK-DAG: %[[#VAL3_S:]] = OpCompositeExtract %[[#int]] %[[#INS3]] 3
+; CHECK-DAG: OpStore %[[#PTR3_S]] %[[#VAL3_S]] Aligned 64
+; CHECK-DAG: %[[#PTR4_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C4]]
+; CHECK-DAG: %[[#VAL4_S:]] = OpCompositeExtract %[[#int]] %[[#INS7]] 0
+; CHECK-DAG: OpStore %[[#PTR4_S]] %[[#VAL4_S]] Aligned 64
+; CHECK-DAG: %[[#PTR5_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C5]]
+; CHECK-DAG: %[[#VAL5_S:]] = OpCompositeExtract %[[#int]] %[[#INS7]] 1
+; CHECK-DAG: OpStore %[[#PTR5_S]] %[[#VAL5_S]] Aligned 64
+; CHECK-DAG: %[[#PTR6_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C6]]
+; CHECK-DAG: %[[#VAL6_S:]] = OpCompositeExtract %[[#int]] %[[#INS7]] 2
+; CHECK-DAG: OpStore %[[#PTR6_S]] %[[#VAL6_S]] Aligned 64
+; CHECK-DAG: %[[#PTR7_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C7]]
+; CHECK-DAG: %[[#VAL7_S:]] = OpCompositeExtract %[[#int]] %[[#INS7]] 3
+; CHECK-DAG: OpStore %[[#PTR7_S]] %[[#VAL7_S]] Aligned 64
+; CHECK-DAG: %[[#PTR8_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C8]]
+; CHECK-DAG: %[[#VAL8_S:]] = OpCompositeExtract %[[#int]] %[[#INS11]] 0
+; CHECK-DAG: OpStore %[[#PTR8_S]] %[[#VAL8_S]] Aligned 64
+; CHECK-DAG: %[[#PTR9_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C9]]
+; CHECK-DAG: %[[#VAL9_S:]] = OpCompositeExtract %[[#int]] %[[#INS11]] 1
+; CHECK-DAG: OpStore %[[#PTR9_S]] %[[#VAL9_S]] Aligned 64
+; CHECK-DAG: %[[#PTR10_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C10]]
+; CHECK-DAG: %[[#VAL10_S:]] = OpCompositeExtract %[[#int]] %[[#INS11]] 2
+; CHECK-DAG: OpStore %[[#PTR10_S]] %[[#VAL10_S]] Aligned 64
+; CHECK-DAG: %[[#PTR11_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C11]]
+; CHECK-DAG: %[[#VAL11_S:]] = OpCompositeExtract %[[#int]] %[[#INS11]] 3
+; CHECK-DAG: OpStore %[[#PTR11_S]] %[[#VAL11_S]] Aligned 64
+; CHECK-DAG: %[[#PTR12_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C12]]
+; CHECK-DAG: %[[#VAL12_S:]] = OpCompositeExtract %[[#int]] %[[#INS15]] 0
+; CHECK-DAG: OpStore %[[#PTR12_S]] %[[#VAL12_S]] Aligned 64
+; CHECK-DAG: %[[#PTR13_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C13]]
+; CHECK-DAG: %[[#VAL13_S:]] = OpCompositeExtract %[[#int]] %[[#INS15]] 1
+; CHECK-DAG: OpStore %[[#PTR13_S]] %[[#VAL13_S]] Aligned 64
+; CHECK-DAG: %[[#PTR14_S:]] = OpAccessChain %[[#ptr_int]] %[[#G16]] %[[#C14]]
+; CHECK-DAG: %[[#VAL14_S:]] = OpCompositeExtract %[[#int]] %[[#INS15]] 2
+; CHECK-DAG: OpStore %[[#PTR14_S]] %[[#VAL14_S]] Aligned 64
+; CHECK-DAG: %[[#PTR15...
[truncated]
|
|
@farzonl, this PR should enable the SPIR-V backend to handle some loads and stores, and it should be able to handle the element-wise vector/matrix operations. It will still not be able to handle a variable index to get a particular element. |
| ; CHECK-DAG: %[[#C1:]] = OpConstant %[[#int]] 1{{$}} | ||
| ; CHECK-DAG: %[[#C0:]] = OpConstant %[[#int]] 0{{$}} | ||
|
|
||
| @G_16 = internal addrspace(10) global [16 x i32] zeroinitializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what the Align actually implies in SPIR-V stores?
Looking at:
@var = internal addrspace(10) global [5 x double] zeroinitializer
%tmp = load <5 x double>, ptr addrspace(10) @var
store <5 x double> %tmp, ptr addrspace(10) @varWe get 5 OpAccessChain, with 5 loads Aligned 8, one for each double.
But at the store, we have:
%37 = OpAccessChain %_ptr_Private_double %var %uint_0
%38 = OpCompositeExtract %double %tmp1 0
OpStore %37 %38 Aligned 64
%39 = OpAccessChain %_ptr_Private_double %var %uint_1
%40 = OpCompositeExtract %double %tmp1 1
OpStore %39 %40 Aligned 8
%41 = OpAccessChain %_ptr_Private_double %var %uint_2
%42 = OpCompositeExtract %double %tmp1 2
OpStore %41 %42 Aligned 16
%43 = OpAccessChain %_ptr_Private_double %var %uint_3
%44 = OpCompositeExtract %double %tmp1 3
OpStore %43 %44 Aligned 8
%45 = OpAccessChain %_ptr_Private_double %var %uint_4
%46 = OpCompositeExtract %double %tmp2 0
OpStore %45 %46 Aligned 32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment is a guarantee of a minimum alignment. The alignments on the loads could probably be improved. As long as there are no regressions, we can handle more cases in a follow up PR.
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) LLVMLLVM.CodeGen/SPIRV/legalization/vector-arithmetic-6.llIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
🪟 Windows x64 Test Results
Failed Tests(click on a test name to see its output) LLVMLLVM.CodeGen/SPIRV/legalization/vector-arithmetic-6.llIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
This patch improves the legalization of vector operations, particularly
focusing on vectors that exceed the maximum supported size (e.g., 4 elements
for shaders). This includes better handling for insert and extract element
operations, which facilitates the legalization of loads and stores for
long vectors—a common pattern when compiling HLSL matrices with Clang.
Key changes include:
arithmetic operations to handle splitting of large vectors.
types.
spv_insertelt intrinsic.
and vector element intrinsics (spv_insertelt, spv_extractelt).
requirements.
The strategy for insert and extract operations mirrors that of bitcasts:
incoming intrinsics are converted to generic MIR instructions (G_INSERT_VECTOR_ELT
and G_EXTRACT_VECTOR_ELT) to leverage standard legalization rules (like splitting).
After legalization, they are converted back to their respective SPIR-V intrinsics
(spv_insertelt, spv_extractelt) because later passes in the backend expect these
intrinsics rather than the generic instructions.
This ensures that operations on large vectors (e.g., <16 x float>) are
correctly broken down into legal sub-vectors.