Skip to content

[HLSL] Implement elementwise firstbitlow builtin #116858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion llvm/lib/Target/DirectX/DXIL.td
Original file line number Diff line number Diff line change
Expand Up @@ -621,7 +621,7 @@ def CountBits : DXILOp<31, unaryBits> {
def FirstbitLo : DXILOp<32, unaryBits> {
let Doc = "Returns the location of the first set bit starting from "
"the lowest order bit and working upward.";
let LLVMIntrinsic = int_dx_firstbitlow;
let intrinsics = [ IntrinSelect<int_dx_firstbitlow> ];
let arguments = [OverloadTy];
let result = Int32Ty;
let overloads =
Expand Down
136 changes: 112 additions & 24 deletions llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ class SPIRVInstructionSelector : public InstructionSelector {
unsigned Opcode) const;

bool selectFirstBitSet64(Register ResVReg, const SPIRVType *ResType,
MachineInstr &I, unsigned BitSetOpcode,
bool SwapPrimarySide) const;
MachineInstr &I, Register SrcReg,
unsigned BitSetOpcode, bool SwapPrimarySide) const;

bool selectGlobalValue(Register ResVReg, MachineInstr &I,
const MachineInstr *Init = nullptr) const;
Expand Down Expand Up @@ -3171,23 +3171,116 @@ bool SPIRVInstructionSelector::selectFirstBitSet32(Register ResVReg,
.constrainAllUses(TII, TRI, RBI);
}

bool SPIRVInstructionSelector::selectFirstBitSet64(Register ResVReg,
const SPIRVType *ResType,
MachineInstr &I,
unsigned BitSetOpcode,
bool SwapPrimarySide) const {
Register OpReg = I.getOperand(2).getReg();

// 1. Split int64 into 2 pieces using a bitcast
bool SPIRVInstructionSelector::selectFirstBitSet64(
Register ResVReg, const SPIRVType *ResType, MachineInstr &I,
Register SrcReg, unsigned BitSetOpcode, bool SwapPrimarySide) const {
unsigned ComponentCount = GR.getScalarOrVectorComponentCount(ResType);
SPIRVType *BaseType = GR.retrieveScalarOrVectorIntType(ResType);
bool ZeroAsNull = STI.isOpenCLEnv();
Register ConstIntZero =
GR.getOrCreateConstInt(0, I, BaseType, TII, ZeroAsNull);
Register ConstIntOne =
GR.getOrCreateConstInt(1, I, BaseType, TII, ZeroAsNull);

// SPIRV doesn't support vectors with more than 4 components. Since the
// algoritm below converts i64 -> i32x2 and i64x4 -> i32x8 it can only
// operate on vectors with 2 or less components. When largers vectors are
// seen. Split them, recurse, then recombine them.
if (ComponentCount > 2) {
unsigned LeftComponentCount = ComponentCount / 2;
Copy link
Contributor

@spall spall Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this code be affected if vectors of size greater than 4 are supported in hlsl in the future? This might be a question for someone besides @V-FEXrt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to write it to handle that case. It should just keep recursing and splitting the vectors in half until its under 2 components.

I do think we are strictly limited by SPIRV here though. Say hlsl supported u64x8, we still have to accept the vec8 in as a parameter and return a vec8 out. Both of which would require invalid SPIRV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually one thing I was considering was to just explicitly handle the vec3 and vec4 cases then assert for anything higher but they are equally as messy while being strictly less general

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think/wonder if its possible that a previous part of the code will force the vectors to be vec4 or smaller, but I'm unsure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to write it to handle that case. It should just keep recursing and splitting the vectors in half until its under 2 components.

I do think we are strictly limited by SPIRV here though. Say hlsl supported u64x8, we still have to accept the vec8 in as a parameter and return a vec8 out. Both of which would require invalid SPIRV

I don't think this code can validly handle vectors that are > vec4 because the splitting action will create vectors which are too large in some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example? I'm pretty sure the splitting will never create a vector too large (but the merging back together certainly can)

Example:

Given u64x12 the call stack becomes

selectFirstBitSet64(u64x12); // Top
selectFirstBitSet64Overflow(u64x12); // Top
  selectFirstBitSet64(u64x6); // Top.Left
  selectFirstBitSet64Overflow(u64x6);   // Top.Left
    selectFirstBitSet64(u64x3);  // Top.Left.Left
    selectFirstBitSet64Overflow(u64x3); // Top.Left.Left
      selectFirstBitSet64(u64);  // Top.Left.Left.Left
      selectFirstBitSet64(u64x2);  // Top.Left.Left.Right
    selectFirstBitSet64Overflow(u64x3); // Top.Left.Right
      selectFirstBitSet64(u64);  // Top.Left.Right.Left
      selectFirstBitSet64(u64x2);  // Top.Left.Right.Right
  selectFirstBitSet64Overflow(u64x6);   // Top.Right
    selectFirstBitSet64(u64x3);  // Top.Right.Left
    selectFirstBitSet64Overflow(u64x3); // Top.Right.Left
      selectFirstBitSet64(u64);  // Top.Right.Left.Left
      selectFirstBitSet64(u64x2);  // Top.Right.Left.Right
    selectFirstBitSet64Overflow(u64x3); // Top.Right.Right
      selectFirstBitSet64(u64);  // Top.Right.Right.Left
      selectFirstBitSet64(u64x2);  // Top.Right.Right.Right

Copy link
Contributor

@spall spall Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you split a size 12 vector for the potential recursive call, you create two intermediate registers which contain vectors of size 6. I think you can see this in your callstack actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep. :/

I was hoping it was clean and only SrcReg ResReg were "bad" but nope.

Probably just go with the assert that its never larger than u64x4 for now.

unsigned RightComponentCount = ComponentCount - LeftComponentCount;
bool LeftIsVector = LeftComponentCount > 1;

// Split the SrcReg in half into 2 smaller vec registers
// (ie i64x4 -> i64x2, i64x2)
MachineIRBuilder MIRBuilder(I);
SPIRVType *OpType = GR.getOrCreateSPIRVIntegerType(64, MIRBuilder);
SPIRVType *LeftVecOpType;
SPIRVType *LeftVecResType;
if (LeftIsVector) {
LeftVecOpType =
GR.getOrCreateSPIRVVectorType(OpType, LeftComponentCount, MIRBuilder);
LeftVecResType = GR.getOrCreateSPIRVVectorType(
BaseType, LeftComponentCount, MIRBuilder);
} else {
LeftVecOpType = OpType;
LeftVecResType = BaseType;
}

SPIRVType *RightVecOpType =
GR.getOrCreateSPIRVVectorType(OpType, RightComponentCount, MIRBuilder);
SPIRVType *RightVecResType = GR.getOrCreateSPIRVVectorType(
BaseType, RightComponentCount, MIRBuilder);

Register LeftSideIn =
MRI->createVirtualRegister(GR.getRegClass(LeftVecOpType));
Register RightSideIn =
MRI->createVirtualRegister(GR.getRegClass(RightVecOpType));

bool Result;

if (LeftIsVector) {
auto MIB =
BuildMI(*I.getParent(), I, I.getDebugLoc(),
TII.get(SPIRV::OpVectorShuffle))
.addDef(LeftSideIn)
.addUse(GR.getSPIRVTypeID(LeftVecOpType))
.addUse(SrcReg)
// Per the spec, repeat the vector if only one vec is needed
.addUse(SrcReg);

for (unsigned J = 0; J < LeftComponentCount; J++) {
MIB.addImm(J);
}

Result = MIB.constrainAllUses(TII, TRI, RBI);
} else {
Result =
selectOpWithSrcs(LeftSideIn, LeftVecOpType, I, {SrcReg, ConstIntZero},
SPIRV::OpVectorExtractDynamic);
}

auto MIB = BuildMI(*I.getParent(), I, I.getDebugLoc(),
TII.get(SPIRV::OpVectorShuffle))
.addDef(RightSideIn)
.addUse(GR.getSPIRVTypeID(RightVecOpType))
.addUse(SrcReg)
// Per the spec, repeat the vector if only one vec is needed
.addUse(SrcReg);

for (unsigned J = LeftComponentCount; J < ComponentCount; J++) {
MIB.addImm(J);
}

Result = Result && MIB.constrainAllUses(TII, TRI, RBI);

// Recursively call selectFirstBitSet64 on the 2 registers
Register LeftSideOut =
MRI->createVirtualRegister(GR.getRegClass(LeftVecResType));
Register RightSideOut =
MRI->createVirtualRegister(GR.getRegClass(RightVecResType));
Result = Result &&
selectFirstBitSet64(LeftSideOut, LeftVecResType, I, LeftSideIn,
BitSetOpcode, SwapPrimarySide);
Result = Result &&
selectFirstBitSet64(RightSideOut, RightVecResType, I, RightSideIn,
BitSetOpcode, SwapPrimarySide);

// Join the two resulting registers back into the return type
// (ie i32x2, i32x2 -> i32x4)
return Result &&
selectOpWithSrcs(ResVReg, ResType, I, {LeftSideOut, RightSideOut},
SPIRV::OpCompositeConstruct);
}

// 1. Split int64 into 2 pieces using a bitcast
MachineIRBuilder MIRBuilder(I);
SPIRVType *PostCastType =
GR.getOrCreateSPIRVVectorType(BaseType, 2 * ComponentCount, MIRBuilder);
Register BitcastReg =
MRI->createVirtualRegister(GR.getRegClass(PostCastType));
bool Result =
selectOpWithSrcs(BitcastReg, PostCastType, I, {OpReg}, SPIRV::OpBitcast);
selectOpWithSrcs(BitcastReg, PostCastType, I, {SrcReg}, SPIRV::OpBitcast);

// 2. Find the first set bit from the primary side for all the pieces in #1
Register FBSReg = MRI->createVirtualRegister(GR.getRegClass(PostCastType));
Expand All @@ -3198,20 +3291,15 @@ bool SPIRVInstructionSelector::selectFirstBitSet64(Register ResVReg,
Register HighReg = MRI->createVirtualRegister(GR.getRegClass(ResType));
Register LowReg = MRI->createVirtualRegister(GR.getRegClass(ResType));

bool ZeroAsNull = STI.isOpenCLEnv();
bool IsScalarRes = ResType->getOpcode() != SPIRV::OpTypeVector;
if (IsScalarRes) {
// if scalar do a vector extract
Result = Result &&
selectOpWithSrcs(HighReg, ResType, I,
{FBSReg, GR.getOrCreateConstInt(0, I, ResType,
TII, ZeroAsNull)},
SPIRV::OpVectorExtractDynamic);
Result = Result &&
selectOpWithSrcs(LowReg, ResType, I,
{FBSReg, GR.getOrCreateConstInt(1, I, ResType,
TII, ZeroAsNull)},
SPIRV::OpVectorExtractDynamic);
Result =
Result && selectOpWithSrcs(HighReg, ResType, I, {FBSReg, ConstIntZero},
SPIRV::OpVectorExtractDynamic);
Result =
Result && selectOpWithSrcs(LowReg, ResType, I, {FBSReg, ConstIntOne},
SPIRV::OpVectorExtractDynamic);
} else {
// if vector do a shufflevector
auto MIB = BuildMI(*I.getParent(), I, I.getDebugLoc(),
Expand Down Expand Up @@ -3324,7 +3412,7 @@ bool SPIRVInstructionSelector::selectFirstBitHigh(Register ResVReg,
case 32:
return selectFirstBitSet32(ResVReg, ResType, I, OpReg, BitSetOpcode);
case 64:
return selectFirstBitSet64(ResVReg, ResType, I, BitSetOpcode,
return selectFirstBitSet64(ResVReg, ResType, I, OpReg, BitSetOpcode,
/*SwapPrimarySide=*/false);
default:
report_fatal_error(
Expand All @@ -3350,7 +3438,7 @@ bool SPIRVInstructionSelector::selectFirstBitLow(Register ResVReg,
case 32:
return selectFirstBitSet32(ResVReg, ResType, I, OpReg, BitSetOpcode);
case 64:
return selectFirstBitSet64(ResVReg, ResType, I, BitSetOpcode,
return selectFirstBitSet64(ResVReg, ResType, I, OpReg, BitSetOpcode,
/*SwapPrimarySide=*/true);
default:
report_fatal_error("spv_firstbitlow only supports 16,32,64 bits.");
Expand Down
119 changes: 117 additions & 2 deletions llvm/test/CodeGen/SPIRV/hlsl-intrinsics/firstbitlow.ll
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
; CHECK-DAG: OpMemoryModel Logical GLSL450
; CHECK-DAG: [[u32_t:%.+]] = OpTypeInt 32 0
; CHECK-DAG: [[u32x2_t:%.+]] = OpTypeVector [[u32_t]] 2
; CHECK-DAG: [[u32x3_t:%.+]] = OpTypeVector [[u32_t]] 3
; CHECK-DAG: [[u32x4_t:%.+]] = OpTypeVector [[u32_t]] 4
; CHECK-DAG: [[const_0:%.*]] = OpConstant [[u32_t]] 0
; CHECK-DAG: [[const_0x2:%.*]] = OpConstantComposite [[u32x2_t]] [[const_0]] [[const_0]]
Expand All @@ -15,8 +16,12 @@
; CHECK-DAG: [[const_neg1x2:%.*]] = OpConstantComposite [[u32x2_t]] [[const_neg1]] [[const_neg1]]
; CHECK-DAG: [[u16_t:%.+]] = OpTypeInt 16 0
; CHECK-DAG: [[u16x2_t:%.+]] = OpTypeVector [[u16_t]] 2
; CHECK-DAG: [[u16x3_t:%.+]] = OpTypeVector [[u16_t]] 3
; CHECK-DAG: [[u16x4_t:%.+]] = OpTypeVector [[u16_t]] 4
; CHECK-DAG: [[u64_t:%.+]] = OpTypeInt 64 0
; CHECK-DAG: [[u64x2_t:%.+]] = OpTypeVector [[u64_t]] 2
; CHECK-DAG: [[u64x3_t:%.+]] = OpTypeVector [[u64_t]] 3
; CHECK-DAG: [[u64x4_t:%.+]] = OpTypeVector [[u64_t]] 4
; CHECK-DAG: [[bool_t:%.+]] = OpTypeBool
; CHECK-DAG: [[boolx2_t:%.+]] = OpTypeVector [[bool_t]] 2

Expand All @@ -30,8 +35,8 @@ entry:
ret i32 %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_2xi32
define noundef <2 x i32> @firstbitlow_2xi32(<2 x i32> noundef %a) {
; CHECK-LABEL: Begin function firstbitlow_v2xi32
define noundef <2 x i32> @firstbitlow_v2xi32(<2 x i32> noundef %a) {
entry:
; CHECK: [[a:%.+]] = OpFunctionParameter [[u32x2_t]]
; CHECK: [[ret:%.+]] = OpExtInst [[u32x2_t]] [[glsl_450_ext]] FindILsb [[a]]
Expand All @@ -40,6 +45,26 @@ entry:
ret <2 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_v3xi32
define noundef <3 x i32> @firstbitlow_v3xi32(<3 x i32> noundef %a) {
entry:
; CHECK: [[a:%.+]] = OpFunctionParameter [[u32x3_t]]
; CHECK: [[ret:%.+]] = OpExtInst [[u32x3_t]] [[glsl_450_ext]] FindILsb [[a]]
; CHECK: OpReturnValue [[ret]]
%elt.firstbitlow = call <3 x i32> @llvm.spv.firstbitlow.v3i32(<3 x i32> %a)
ret <3 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_v4xi32
define noundef <4 x i32> @firstbitlow_v4xi32(<4 x i32> noundef %a) {
entry:
; CHECK: [[a:%.+]] = OpFunctionParameter [[u32x4_t]]
; CHECK: [[ret:%.+]] = OpExtInst [[u32x4_t]] [[glsl_450_ext]] FindILsb [[a]]
; CHECK: OpReturnValue [[ret]]
%elt.firstbitlow = call <4 x i32> @llvm.spv.firstbitlow.v4i32(<4 x i32> %a)
ret <4 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_i16
define noundef i32 @firstbitlow_i16(i16 noundef %a) {
entry:
Expand All @@ -62,6 +87,28 @@ entry:
ret <2 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_v3xi16
define noundef <3 x i32> @firstbitlow_v3xi16(<3 x i16> noundef %a) {
entry:
; CHECK: [[a16:%.+]] = OpFunctionParameter [[u16x3_t]]
; CHECK: [[a32:%.+]] = OpUConvert [[u32x3_t]] [[a16]]
; CHECK: [[ret:%.+]] = OpExtInst [[u32x3_t]] [[glsl_450_ext]] FindILsb [[a32]]
; CHECK: OpReturnValue [[ret]]
%elt.firstbitlow = call <3 x i32> @llvm.spv.firstbitlow.v3i16(<3 x i16> %a)
ret <3 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_v4xi16
define noundef <4 x i32> @firstbitlow_v4xi16(<4 x i16> noundef %a) {
entry:
; CHECK: [[a16:%.+]] = OpFunctionParameter [[u16x4_t]]
; CHECK: [[a32:%.+]] = OpUConvert [[u32x4_t]] [[a16]]
; CHECK: [[ret:%.+]] = OpExtInst [[u32x4_t]] [[glsl_450_ext]] FindILsb [[a32]]
; CHECK: OpReturnValue [[ret]]
%elt.firstbitlow = call <4 x i32> @llvm.spv.firstbitlow.v4i16(<4 x i16> %a)
ret <4 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_i64
define noundef i32 @firstbitlow_i64(i64 noundef %a) {
entry:
Expand Down Expand Up @@ -96,6 +143,74 @@ entry:
ret <2 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_v3i64
define noundef <3 x i32> @firstbitlow_v3i64(<3 x i64> noundef %a) {
entry:
; Split the i64x3 into i64, i64x2
; CHECK: [[a:%.+]] = OpFunctionParameter [[u64x3_t]]
; CHECK: [[left:%.+]] = OpVectorExtractDynamic [[u64_t]] [[a]] [[const_0]]
; CHECK: [[right:%.+]] = OpVectorShuffle [[u64x2_t]] [[a]] [[a]] 1 2

; Do firstbitlow on i64, i64x2
; CHECK: [[left_cast:%.+]] = OpBitcast [[u32x2_t]] [[left]]
; CHECK: [[left_lsb_bits:%.+]] = OpExtInst [[u32x2_t]] [[glsl_450_ext]] FindILsb [[left_cast]]
; CHECK: [[left_high_bits:%.+]] = OpVectorExtractDynamic [[u32_t]] [[left_lsb_bits]] [[const_0]]
; CHECK: [[left_low_bits:%.+]] = OpVectorExtractDynamic [[u32_t]] [[left_lsb_bits]] [[const_1]]
; CHECK: [[left_should_use_high:%.+]] = OpIEqual [[bool_t]] [[left_low_bits]] [[const_neg1]]
; CHECK: [[left_ans_bits:%.+]] = OpSelect [[u32_t]] [[left_should_use_high]] [[left_high_bits]] [[left_low_bits]]
; CHECK: [[left_ans_offset:%.+]] = OpSelect [[u32_t]] [[left_should_use_high]] [[const_32]] [[const_0]]
; CHECK: [[left_res:%.+]] = OpIAdd [[u32_t]] [[left_ans_offset]] [[left_ans_bits]]

; CHECK: [[right_cast:%.+]] = OpBitcast [[u32x4_t]] [[right]]
; CHECK: [[right_lsb_bits:%.+]] = OpExtInst [[u32x4_t]] [[glsl_450_ext]] FindILsb [[right_cast]]
; CHECK: [[right_high_bits:%.+]] = OpVectorShuffle [[u32x2_t]] [[right_lsb_bits]] [[right_lsb_bits]] 0 2
; CHECK: [[right_low_bits:%.+]] = OpVectorShuffle [[u32x2_t]] [[right_lsb_bits]] [[right_lsb_bits]] 1 3
; CHECK: [[right_should_use_high:%.+]] = OpIEqual [[boolx2_t]] [[right_low_bits]] [[const_neg1x2]]
; CHECK: [[right_ans_bits:%.+]] = OpSelect [[u32x2_t]] [[right_should_use_high]] [[right_high_bits]] [[right_low_bits]]
; CHECK: [[right_ans_offset:%.+]] = OpSelect [[u32x2_t]] [[right_should_use_high]] [[const_32x2]] [[const_0x2]]
; CHECK: [[right_res:%.+]] = OpIAdd [[u32x2_t]] [[right_ans_offset]] [[right_ans_bits]]

; Merge the resulting i32, i32x2 into the final i32x3 and return it
; CHECK: [[ret:%.+]] = OpCompositeConstruct [[u32x3_t]] [[left_res]] [[right_res]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm allowed to use OpCompositeConstruct this way but would be good if someone could verify

Usages

u32x3 res = OpCompositeConstruct (left: u32) (right: u32x2)
u32x4 res = OpCompositeConstruct (left: u32x2) (right: u32x2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can do that.

; CHECK: OpReturnValue [[ret]]
%elt.firstbitlow = call <3 x i32> @llvm.spv.firstbitlow.v3i64(<3 x i64> %a)
ret <3 x i32> %elt.firstbitlow
}

; CHECK-LABEL: Begin function firstbitlow_v4i64
define noundef <4 x i32> @firstbitlow_v4i64(<4 x i64> noundef %a) {
entry:
; Split the i64x4 into 2 i64x2
; CHECK: [[a:%.+]] = OpFunctionParameter [[u64x4_t]]
; CHECK: [[left:%.+]] = OpVectorShuffle [[u64x2_t]] [[a]] [[a]] 0 1
; CHECK: [[right:%.+]] = OpVectorShuffle [[u64x2_t]] [[a]] [[a]] 2 3

; Do firstbitlow on the 2 i64x2
; CHECK: [[left_cast:%.+]] = OpBitcast [[u32x4_t]] [[left]]
; CHECK: [[left_lsb_bits:%.+]] = OpExtInst [[u32x4_t]] [[glsl_450_ext]] FindILsb [[left_cast]]
; CHECK: [[left_high_bits:%.+]] = OpVectorShuffle [[u32x2_t]] [[left_lsb_bits]] [[left_lsb_bits]] 0 2
; CHECK: [[left_low_bits:%.+]] = OpVectorShuffle [[u32x2_t]] [[left_lsb_bits]] [[left_lsb_bits]] 1 3
; CHECK: [[left_should_use_high:%.+]] = OpIEqual [[boolx2_t]] [[left_low_bits]] [[const_neg1x2]]
; CHECK: [[left_ans_bits:%.+]] = OpSelect [[u32x2_t]] [[left_should_use_high]] [[left_high_bits]] [[left_low_bits]]
; CHECK: [[left_ans_offset:%.+]] = OpSelect [[u32x2_t]] [[left_should_use_high]] [[const_32x2]] [[const_0x2]]
; CHECK: [[left_res:%.+]] = OpIAdd [[u32x2_t]] [[left_ans_offset]] [[left_ans_bits]]

; CHECK: [[right_cast:%.+]] = OpBitcast [[u32x4_t]] [[right]]
; CHECK: [[right_lsb_bits:%.+]] = OpExtInst [[u32x4_t]] [[glsl_450_ext]] FindILsb [[right_cast]]
; CHECK: [[right_high_bits:%.+]] = OpVectorShuffle [[u32x2_t]] [[right_lsb_bits]] [[right_lsb_bits]] 0 2
; CHECK: [[right_low_bits:%.+]] = OpVectorShuffle [[u32x2_t]] [[right_lsb_bits]] [[right_lsb_bits]] 1 3
; CHECK: [[right_should_use_high:%.+]] = OpIEqual [[boolx2_t]] [[right_low_bits]] [[const_neg1x2]]
; CHECK: [[right_ans_bits:%.+]] = OpSelect [[u32x2_t]] [[right_should_use_high]] [[right_high_bits]] [[right_low_bits]]
; CHECK: [[right_ans_offset:%.+]] = OpSelect [[u32x2_t]] [[right_should_use_high]] [[const_32x2]] [[const_0x2]]
; CHECK: [[right_res:%.+]] = OpIAdd [[u32x2_t]] [[right_ans_offset]] [[right_ans_bits]]

; Merge the resulting 2 i32x2 into the final i32x4 and return it
; CHECK: [[ret:%.+]] = OpCompositeConstruct [[u32x4_t]] [[left_res]] [[right_res]]
; CHECK: OpReturnValue [[ret]]
%elt.firstbitlow = call <4 x i32> @llvm.spv.firstbitlow.v4i64(<4 x i64> %a)
ret <4 x i32> %elt.firstbitlow
}

;declare i16 @llvm.spv.firstbitlow.i16(i16)
;declare i32 @llvm.spv.firstbitlow.i32(i32)
;declare i64 @llvm.spv.firstbitlow.i64(i64)
Expand Down
Loading