-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[libclc] Use __ocml_cos/sin/tan/exp*/lgamma/log*/fmax/fmin/sqrt for AMDGPU #153328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…MDGPU Motivation is to upstream corresponding implementations at: https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/amdgcn/math https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math
Changes of this PR:
Use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fully rejecting this as a concept. libclc shall not depend on OCML.
libclc is a fork of OCML. It's dysfunctional that it's come to this, that libclc would end up calling into the original base. If you want to use these implementations, I'd rather merge the OCML content into libclc and migrate over to using it.
At the current moment, most of the f32 and f16 operations here should just use the llvm intrinsics. The intrinsics have inline implementations that are identical to the library content (some cases in OCML are already directly calling the intrinsic, so this is a pointless level of indirection). We do need to use some kind of external call for the trig and lgamma cases.
In the longer term, we should exclusively emit the llvm intrinsics. The choice of how to implement the libm equivalents involving a call or not should be a compiler backend decision. At the current time, neither libclc or ocml are in a state usable as a compiler runtime library. The usage model is backwards, I want to migrate to a state where the actual implementations have a usage model closer to compiler-rt. What we do now for both libraries is more like a precompiled header that cannot be relied on from the compiler.
float __ocml_exp_f32(float); | ||
_CLC_OVERLOAD _CLC_DEF float __clc_exp(float x) { return __ocml_exp_f32(x); } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the llvm intrinsic
#ifdef cl_khr_fp16 | ||
#pragma OPENCL EXTENSION cl_khr_fp16 : enable | ||
half __ocml_exp_f16(half); | ||
_CLC_OVERLOAD _CLC_DEF half __clc_exp(half x) { return __ocml_exp_f16(x); } | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the llvm intrinsic
float __ocml_exp10_f32(float); | ||
_CLC_OVERLOAD _CLC_DEF float __clc_exp10(float x) { | ||
return __ocml_exp10_f32(x); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the llvm intrinsic
#ifdef cl_khr_fp16 | ||
#pragma OPENCL EXTENSION cl_khr_fp16 : enable | ||
half __ocml_exp10_f16(half); | ||
_CLC_OVERLOAD _CLC_DEF half __clc_exp10(half x) { return __ocml_exp10_f16(x); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the llvm intrinsic
float __ocml_fmax_f32(float, float); | ||
_CLC_OVERLOAD _CLC_DEF float __clc_fmax(float x, float y) { | ||
return __ocml_fmax_f32(x, y); | ||
} | ||
|
||
#ifdef cl_khr_fp64 | ||
#pragma OPENCL EXTENSION cl_khr_fp64 : enable | ||
double __ocml_fmax_f64(double, double); | ||
_CLC_OVERLOAD _CLC_DEF double __clc_fmax(double x, double y) { | ||
return __ocml_fmax_f64(x, y); | ||
} | ||
#endif | ||
|
||
#ifdef cl_khr_fp16 | ||
#pragma OPENCL EXTENSION cl_khr_fp16 : enable | ||
half __ocml_fmax_f16(half, half); | ||
_CLC_OVERLOAD _CLC_DEF half __clc_fmax(half x, half y) { | ||
return __ocml_fmax_f16(x, y); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should directly use llvm intrinsics
#ifdef cl_khr_fp64 | ||
#pragma OPENCL EXTENSION cl_khr_fp64 : enable | ||
double __ocml_sqrt_f64(double); | ||
_CLC_OVERLOAD _CLC_DEF double __clc_sqrt(double x) { | ||
return __ocml_sqrt_f64(x); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just use the llvm intrinsic
float __ocml_exp2_f32(float); | ||
_CLC_OVERLOAD _CLC_DEF float __clc_exp2(float x) { return __ocml_exp2_f32(x); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the llvm intrinsic
#ifdef cl_khr_fp16 | ||
#pragma OPENCL EXTENSION cl_khr_fp16 : enable | ||
half __ocml_exp2_f16(half); | ||
_CLC_OVERLOAD _CLC_DEF half __clc_exp2(half x) { return __ocml_exp2_f16(x); } | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the llvm intrinsic
thanks @arsenm. I see a lot of llvm-diff change due to use of OCML when doing refactoring in the downstream, so I put up this to consult with you. I'll close this PR.
Just to confirm, libclc will emit llvm intrinsic and this is the state that we want for the future libclc, right? |
Also delete double/half type implementations, which are not allowed per spec. llvm-diff shows lots of changes in libspirv-amdgcn--amdhsa.bc and libspirv-nvptx64--nvidiacl.bc, due to use of __ocml_* and __nv_* built-ins in clc and libspirv in intel/llvm repo. Based on review comment in llvm/llvm-project#153328, libclc shouldn't use __ocml_*. So bitcode change in this PR is expected.
Tag @frasercrmck |
…#19779) Also delete double/half type implementations, which are not allowed per spec. llvm-diff shows lots of changes in libspirv-amdgcn--amdhsa.bc and libspirv-nvptx64--nvidiacl.bc, due to use of __ocml_* and __nv_* built-ins in clc and libspirv in intel/llvm repo. Based on review comment in llvm/llvm-project#153328, libclc shouldn't use __ocml_*. So bitcode change in this PR is expected.
Kind of yes and kind of no. libclc should have some code splitting and not be responsible for the lowest level implementation pieces. I'm going to give a talk about this as the upcoming GPU workshop at the conference |
That's great. Unfortunately I can't to attend the workshop. Will it be recorded and uploaded e.g. to youtube? I'd like to watch the talk. |
No, but I can post the slides after |
thanks, I look forward to it. |
Motivation is to upstream corresponding implementations at: https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/amdgcn/math https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math