-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[HLSL] [DXIL] Implement the AddUint64
HLSL function and the UAddc
DXIL op
#125319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
a1ee186
daa1f12
06a77df
b819c13
11ff86e
1b12151
3119116
ffcbceb
fc14861
99b2303
96119a9
138fce3
2bf83e1
4629505
3cb7c29
68c434c
684b1f0
ed3b459
de6164c
b11c69e
07fe365
564a401
ef61517
5427cc5
4e7205c
3bd3131
63c24fc
3cdbcb8
bbb418d
f54efa9
35d2363
a19bd2e
09623c1
5cc672e
ea411bc
e9409c0
32c3da0
fe5259f
81499c1
3edfa00
ad9b3ad
1d5e5f1
0033cd2
cc2e0a5
0fd992f
b6f01ad
8179d64
4c3549e
862cf2d
03fe9b9
2415638
796b754
eb62d36
bdabea2
47ecb7a
85c6b03
d400713
8cfcb74
c471e51
9423ff8
67bb421
53f7dea
aedb2ad
58fbcc6
2b52e4f
9489ea4
0b96c6b
9d9d77c
63ab8be
6893320
91b3b7b
dfe35d9
61d3277
c533cf9
9aa5b6f
a305150
4acc5f4
81710a2
99547a2
90d1c36
8f61e7d
338e786
d9d3492
1770d07
708065e
ae9586a
741b2b0
da916af
3287c94
83bcb8d
ce65cee
863f4aa
9985834
6fd2e76
2e079b2
4cd98d1
cfcdc10
624c312
1b5dc8d
bc4f49b
d36e0df
2ce713c
f74c91c
46dd60d
08d98ee
3cbed80
282e356
6725ebc
22beca5
82546b0
4c86e20
5f86f9f
432fc7c
3c1b8aa
0b925b9
460d1b5
eb530f7
46f8146
a7b9026
f5fbf54
67353dc
6c963c7
d055209
23a961e
3f8bc23
5536cdd
24a7921
1fe820c
b95bd97
cd799e6
9dd96df
43f3479
56dbc47
0d7ea81
169ae3a
1b2a822
a6231bc
67334a3
755e4d1
df03536
4ac3b7f
2c905f1
e9cf570
2b8ea55
8623bb8
504f7d9
be0beb1
f9294d1
64c2536
aa1bb43
af76105
aabc8f6
ce9917b
3555b1a
4dd0923
9fc81c4
fb54089
93c0c41
b241f1f
f023dd7
c8006b3
6bcaf3a
8e9f5c5
5f9038e
38eade4
253a950
c6808c1
f7a2bdf
1abbb1d
d324e38
4770a65
f13ee84
ddc1394
04554c6
2b40642
c25e4e7
bbef9c7
3a39d6b
88d78c6
cd9ade0
4a583f2
2516012
f096256
a40b0c4
c328c5d
e639ffa
61b708e
f830a19
168164e
37a1156
a3384b5
ae2c603
45e24db
c6246c0
4efac9b
a6eb9e4
975d42c
65ebe3a
fabf269
b690a76
1c1bc96
8a7a66d
203992f
4e5f70b
5cd3a55
461c4c9
ffa5907
df2fe27
37c4aaf
1dcb996
e3304f6
2360295
fd5cd01
18d6cd3
4c3a9ae
66de8d7
7de9daa
1e12d17
1f0b48d
11bf20b
1c7600c
4948689
9ad5843
a49ce92
060f864
8605608
2155741
a0d916d
283df81
f05d2d1
7127d8a
81fe251
e9113ce
8e640c8
eb6ebf2
b4edecc
255f225
82e7289
d60c797
f65eb70
2d2fb2b
166c42f
5d0ca9d
f0d3036
eb49b1a
d7c0f83
17c7833
98aaff0
e784664
7475996
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 | ||
// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.3-library %s \ | ||
// RUN: -emit-llvm -disable-llvm-passes -o - | \ | ||
// RUN: FileCheck %s --check-prefixes=CHECK | ||
|
||
|
||
// CHECK-LABEL: define noundef <2 x i32> @_Z20test_AddUint64_uint2Dv2_jS_( | ||
// CHECK-SAME: <2 x i32> noundef [[A:%.*]], <2 x i32> noundef [[B:%.*]]) #[[ATTR0:[0-9]+]] { | ||
// CHECK-NEXT: [[ENTRY:.*:]] | ||
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <2 x i32>, align 8 | ||
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <2 x i32>, align 8 | ||
// CHECK-NEXT: store <2 x i32> [[A]], ptr [[A_ADDR]], align 8 | ||
// CHECK-NEXT: store <2 x i32> [[B]], ptr [[B_ADDR]], align 8 | ||
// CHECK-NEXT: [[A_LOAD:%.*]] = load <2 x i32>, ptr [[A_ADDR]], align 8 | ||
// CHECK-NEXT: [[B_LOAD:%.*]] = load <2 x i32>, ptr [[B_ADDR]], align 8 | ||
// CHECK-NEXT: [[LowA:%.*]] = extractelement <2 x i32> [[A_LOAD]], i64 0 | ||
// CHECK-NEXT: [[HighA:%.*]] = extractelement <2 x i32> [[A_LOAD]], i64 1 | ||
// CHECK-NEXT: [[LowB:%.*]] = extractelement <2 x i32> [[B_LOAD]], i64 0 | ||
// CHECK-NEXT: [[HighB:%.*]] = extractelement <2 x i32> [[B_LOAD]], i64 1 | ||
// CHECK-NEXT: [[UAddc:%.*]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 [[LowA]], i32 [[LowB]]) | ||
// CHECK-NEXT: [[Carry:%.*]] = extractvalue { i32, i1 } [[UAddc]], 1 | ||
// CHECK-NEXT: [[LowSum:%.*]] = extractvalue { i32, i1 } [[UAddc]], 0 | ||
// CHECK-NEXT: [[CarryZExt:%.*]] = zext i1 [[Carry]] to i32 | ||
// CHECK-NEXT: [[HighSum:%.*]] = add i32 [[HighA]], [[HighB]] | ||
// CHECK-NEXT: [[HighSumPlusCarry:%.*]] = add i32 [[HighSum]], [[CarryZExt]] | ||
// CHECK-NEXT: [[HLSL_ADDUINT64_UPTO0:%.*]] = insertelement <2 x i32> poison, i32 [[LowSum]], i64 0 | ||
// CHECK-NEXT: [[HLSL_ADDUINT64:%.*]] = insertelement <2 x i32> [[HLSL_ADDUINT64_UPTO0]], i32 [[HighSumPlusCarry]], i64 1 | ||
// CHECK-NEXT: ret <2 x i32> [[HLSL_ADDUINT64]] | ||
// | ||
uint2 test_AddUint64_uint2(uint2 a, uint2 b) { | ||
return AddUint64(a, b); | ||
} | ||
|
||
// CHECK-LABEL: define noundef <4 x i32> @_Z20test_AddUint64_uint4Dv4_jS_( | ||
// CHECK-SAME: <4 x i32> noundef [[A:%.*]], <4 x i32> noundef [[B:%.*]]) #[[ATTR0]] { | ||
// CHECK-NEXT: [[ENTRY:.*:]] | ||
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca <4 x i32>, align 16 | ||
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <4 x i32>, align 16 | ||
// CHECK-NEXT: store <4 x i32> [[A]], ptr [[A_ADDR]], align 16 | ||
// CHECK-NEXT: store <4 x i32> [[B]], ptr [[B_ADDR]], align 16 | ||
// CHECK-NEXT: [[A_LOAD:%.*]] = load <4 x i32>, ptr [[A_ADDR]], align 16 | ||
// CHECK-NEXT: [[B_LOAD:%.*]] = load <4 x i32>, ptr [[B_ADDR]], align 16 | ||
// CHECK-NEXT: [[LowA:%.*]] = extractelement <4 x i32> [[A_LOAD]], i64 0 | ||
// CHECK-NEXT: [[HighA:%.*]] = extractelement <4 x i32> [[A_LOAD]], i64 1 | ||
// CHECK-NEXT: [[LowB:%.*]] = extractelement <4 x i32> [[B_LOAD]], i64 0 | ||
// CHECK-NEXT: [[HighB:%.*]] = extractelement <4 x i32> [[B_LOAD]], i64 1 | ||
// CHECK-NEXT: [[UAddc:%.*]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 [[LowA]], i32 [[LowB]]) | ||
// CHECK-NEXT: [[Carry:%.*]] = extractvalue { i32, i1 } [[UAddc]], 1 | ||
// CHECK-NEXT: [[LowSum:%.*]] = extractvalue { i32, i1 } [[UAddc]], 0 | ||
// CHECK-NEXT: [[CarryZExt:%.*]] = zext i1 [[Carry]] to i32 | ||
// CHECK-NEXT: [[HighSum:%.*]] = add i32 [[HighA]], [[HighB]] | ||
// CHECK-NEXT: [[HighSumPlusCarry:%.*]] = add i32 [[HighSum]], [[CarryZExt]] | ||
// CHECK-NEXT: [[HLSL_ADDUINT64_UPTO0:%.*]] = insertelement <4 x i32> poison, i32 [[LowSum]], i64 0 | ||
// CHECK-NEXT: [[HLSL_ADDUINT64_UPTO1:%.*]] = insertelement <4 x i32> [[HLSL_ADDUINT64_UPTO0]], i32 [[HighSumPlusCarry]], i64 1 | ||
// CHECK-NEXT: [[LowA1:%.*]] = extractelement <4 x i32> [[A_LOAD]], i64 2 | ||
// CHECK-NEXT: [[HighA1:%.*]] = extractelement <4 x i32> [[A_LOAD]], i64 3 | ||
// CHECK-NEXT: [[LowB1:%.*]] = extractelement <4 x i32> [[B_LOAD]], i64 2 | ||
// CHECK-NEXT: [[HighB1:%.*]] = extractelement <4 x i32> [[B_LOAD]], i64 3 | ||
// CHECK-NEXT: [[UAddc1:%.*]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 [[LowA1]], i32 [[LowB1]]) | ||
// CHECK-NEXT: [[Carry1:%.*]] = extractvalue { i32, i1 } [[UAddc1]], 1 | ||
// CHECK-NEXT: [[LowSum1:%.*]] = extractvalue { i32, i1 } [[UAddc1]], 0 | ||
// CHECK-NEXT: [[CarryZExt1:%.*]] = zext i1 [[Carry1]] to i32 | ||
// CHECK-NEXT: [[HighSum1:%.*]] = add i32 [[HighA1]], [[HighB1]] | ||
// CHECK-NEXT: [[HighSumPlusCarry1:%.*]] = add i32 [[HighSum1]], [[CarryZExt1]] | ||
// CHECK-NEXT: [[HLSL_ADDUINT64_UPTO2:%.*]] = insertelement <4 x i32> [[HLSL_ADDUINT64_UPTO1]], i32 [[LowSum1]], i64 2 | ||
// CHECK-NEXT: [[HLSL_ADDUINT64:%.*]] = insertelement <4 x i32> [[HLSL_ADDUINT64_UPTO2]], i32 [[HighSumPlusCarry1]], i64 3 | ||
// CHECK-NEXT: ret <4 x i32> [[HLSL_ADDUINT64]] | ||
// | ||
uint4 test_AddUint64_uint4(uint4 a, uint4 b) { | ||
return AddUint64(a, b); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
// RUN: %clang_cc1 -finclude-default-header -triple dxil-pc-shadermodel6.6-library %s -fnative-half-type -emit-llvm-only -disable-llvm-passes -verify | ||
|
||
uint2 test_too_few_arg() { | ||
return __builtin_hlsl_adduint64(); | ||
// expected-error@-1 {{too few arguments to function call, expected 2, have 0}} | ||
} | ||
|
||
uint4 test_too_many_arg(uint4 a) { | ||
return __builtin_hlsl_adduint64(a, a, a); | ||
// expected-error@-1 {{too many arguments to function call, expected 2, have 3}} | ||
} | ||
|
||
uint2 test_mismatched_arg_types(uint2 a, uint4 b) { | ||
return __builtin_hlsl_adduint64(a, b); | ||
// expected-error@-1 {{all arguments to '__builtin_hlsl_adduint64' must have the same type}} | ||
} | ||
|
||
uint2 test_bad_num_arg_elements(uint3 a, uint3 b) { | ||
return __builtin_hlsl_adduint64(a, b); | ||
// expected-error@-1 {{invalid element count of 3 in vector operand (expected an even element count in the range of 2 and 4)}} | ||
} | ||
|
||
uint2 test_scalar_arg_type(uint a) { | ||
return __builtin_hlsl_adduint64(a, a); | ||
// expected-error@-1 {{all arguments to AddUint64 must be vectors}} | ||
} | ||
|
||
uint2 test_signed_integer_args(int2 a, int2 b) { | ||
return __builtin_hlsl_adduint64(a, b); | ||
// expected-error@-1 {{passing 'int2' (aka 'vector<int, 2>') to parameter of incompatible type '__attribute__((__vector_size__(2 * sizeof(unsigned int)))) unsigned int' (vector of 2 'unsigned int' values)}} | ||
} | ||
|
||
struct S { | ||
uint2 a; | ||
}; | ||
|
||
uint2 test_incorrect_arg_type(S a) { | ||
return __builtin_hlsl_adduint64(a, a); | ||
// expected-error@-1 {{passing 'S' to parameter of incompatible type 'unsigned int'}} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,6 +50,7 @@ def HandleTy : DXILOpParamType; | |
def ResBindTy : DXILOpParamType; | ||
def ResPropsTy : DXILOpParamType; | ||
def SplitDoubleTy : DXILOpParamType; | ||
def BinaryWithCarryTy : DXILOpParamType; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is reminding me how much I didn't like how splitdouble ended up. Seems like we need a more generic way to define our anonymous struct return types. |
||
|
||
class DXILOpClass; | ||
|
||
|
@@ -738,6 +739,18 @@ def UMin : DXILOp<40, binary> { | |
let attributes = [Attributes<DXIL1_0, [ReadNone]>]; | ||
} | ||
|
||
def UAddc : DXILOp<44, binaryWithCarryOrBorrow > { | ||
let Doc = "Unsigned 32-bit integer arithmetic add with carry. uaddc(a,b) = (a+b, a+b overflowed ? 1 : 0)"; | ||
// TODO: This `let intrinsics = ...` line may be uncommented when | ||
// https://github.com/llvm/llvm-project/issues/113192 is fixed | ||
// let intrinsics = [IntrinSelect<int_uadd_with_overflow>]; | ||
let arguments = [OverloadTy, OverloadTy]; | ||
let result = BinaryWithCarryTy; | ||
let overloads = [Overloads<DXIL1_0, [Int32Ty]>]; | ||
let stages = [Stages<DXIL1_0, [all_stages]>]; | ||
let attributes = [Attributes<DXIL1_0, [ReadNone]>]; | ||
} | ||
|
||
def FMad : DXILOp<46, tertiary> { | ||
let Doc = "Floating point arithmetic multiply/add operation. fmad(m,a,b) = m " | ||
"* a + b."; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -230,6 +230,14 @@ static StructType *getSplitDoubleType(LLVMContext &Context) { | |
return StructType::create({Int32Ty, Int32Ty}, "dx.types.splitdouble"); | ||
} | ||
|
||
static StructType *getBinaryWithCarryType(LLVMContext &Context) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is maybe something that we could consider being generated from |
||
if (auto *ST = StructType::getTypeByName(Context, "dx.types.i32c")) | ||
return ST; | ||
Type *Int32Ty = Type::getInt32Ty(Context); | ||
Type *Int1Ty = Type::getInt1Ty(Context); | ||
return StructType::create({Int32Ty, Int1Ty}, "dx.types.i32c"); | ||
} | ||
|
||
static Type *getTypeFromOpParamType(OpParamType Kind, LLVMContext &Ctx, | ||
Type *OverloadTy) { | ||
switch (Kind) { | ||
|
@@ -273,6 +281,8 @@ static Type *getTypeFromOpParamType(OpParamType Kind, LLVMContext &Ctx, | |
return getResPropsType(Ctx); | ||
case OpParamType::SplitDoubleTy: | ||
return getSplitDoubleType(Ctx); | ||
case OpParamType::BinaryWithCarryTy: | ||
return getBinaryWithCarryType(Ctx); | ||
} | ||
llvm_unreachable("Invalid parameter kind"); | ||
return nullptr; | ||
|
@@ -539,6 +549,10 @@ StructType *DXILOpBuilder::getSplitDoubleType(LLVMContext &Context) { | |
return ::getSplitDoubleType(Context); | ||
} | ||
|
||
StructType *DXILOpBuilder::getBinaryWithCarryType(LLVMContext &Context) { | ||
return ::getBinaryWithCarryType(Context); | ||
} | ||
|
||
StructType *DXILOpBuilder::getHandleType() { | ||
return ::getHandleType(IRB.getContext()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I know why we weren't able to re-use the
__builtin_add_c
but for other reviewers it would be good to add context as a pr comment here. Maybe they will have suggestions as to how we could use it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that you have it in the commit notes. I still think it would be worth noting with more context here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__builtin_addc
was not able to be used to implementAddUint64
inhlsl_intrinsics.h
and (by extension)hlsl_detail.h
because itscarryout
argument is a pointer (as documented here).Since pointers are not supported in HLSL, an error is emitted when running HLSL codegen tests with an example implementation like the following in
hlsl_intrinsics.h
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So while HLSL does not support pointers we do have a concept of out args. if you search for
EmitHLSLOutArgExpr
I think you can find some uses. My thinking is maybe we could do our own builtin like you have done but without the pointer and have an anonymous struct returned. then we could still piggy back off of the code genen for __builtin_addc even if we don't use the builtin itself. Maybe thats more complicated than it has to be, but it could be a way to keep the codegen for theuadd_with_overflow
intrinsic in one place.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is something I should do for this implementation?
Are there other HLSL functions that would benefit from / reuse the new builtin using the out args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the downside of what I suggested is that it would be a hybrid solution. You are writing the algorithm in HLSL, but you are also massaging the codegen to do out args instead of pointers, and write sema checks because we have to introduce a new builtin.
My thinking was there would be less total codgen if we did it the way I suggested and some of the sema checks would benefit from language rules instead of us having to put a bunch of effort into
HLSLSema.cpp
. I don't have a strong opinion. So I won't make a requirement here.