Skip to content

Conversation

wenju-he
Copy link
Contributor

@wenju-he wenju-he requested a review from frasercrmck August 13, 2025 01:27
@llvmbot llvmbot added the libclc libclc OpenCL library label Aug 13, 2025
@wenju-he wenju-he requested a review from arsenm August 13, 2025 01:27
@wenju-he
Copy link
Contributor Author

Changes of this PR:

Use of _ocml_* functions scalarizes vector implementation since generic implementations are vectorized for vector input. So generic implementation has an advantage.
My understanding is fmax/fmin changes in this PR are not necessary due to the scalarization. However, I do see rocm-6.3.3/lib/llvm/lib/clang/18/lib/amdgcn/bitcode/opencl.bc fmax/fmin is implemented with scalar __ocml_fmax/fmin.
@ArsenArsen please review if the changes in this PR is necessary. If not, we can remove downstream implementations. The downstream implementations cause llvm-diff change for amdgcn--amdhsa target when refactoring libspirv to use _clc* functions.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fully rejecting this as a concept. libclc shall not depend on OCML.

libclc is a fork of OCML. It's dysfunctional that it's come to this, that libclc would end up calling into the original base. If you want to use these implementations, I'd rather merge the OCML content into libclc and migrate over to using it.

At the current moment, most of the f32 and f16 operations here should just use the llvm intrinsics. The intrinsics have inline implementations that are identical to the library content (some cases in OCML are already directly calling the intrinsic, so this is a pointless level of indirection). We do need to use some kind of external call for the trig and lgamma cases.

In the longer term, we should exclusively emit the llvm intrinsics. The choice of how to implement the libm equivalents involving a call or not should be a compiler backend decision. At the current time, neither libclc or ocml are in a state usable as a compiler runtime library. The usage model is backwards, I want to migrate to a state where the actual implementations have a usage model closer to compiler-rt. What we do now for both libraries is more like a precompiled header that cannot be relied on from the compiler.

Comment on lines +12 to +14
float __ocml_exp_f32(float);
_CLC_OVERLOAD _CLC_DEF float __clc_exp(float x) { return __ocml_exp_f32(x); }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the llvm intrinsic

Comment on lines +21 to +25
#ifdef cl_khr_fp16
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
half __ocml_exp_f16(half);
_CLC_OVERLOAD _CLC_DEF half __clc_exp(half x) { return __ocml_exp_f16(x); }
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the llvm intrinsic

Comment on lines +12 to +15
float __ocml_exp10_f32(float);
_CLC_OVERLOAD _CLC_DEF float __clc_exp10(float x) {
return __ocml_exp10_f32(x);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the llvm intrinsic

#ifdef cl_khr_fp16
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
half __ocml_exp10_f16(half);
_CLC_OVERLOAD _CLC_DEF half __clc_exp10(half x) { return __ocml_exp10_f16(x); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the llvm intrinsic

Comment on lines +12 to +31
float __ocml_fmax_f32(float, float);
_CLC_OVERLOAD _CLC_DEF float __clc_fmax(float x, float y) {
return __ocml_fmax_f32(x, y);
}

#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
double __ocml_fmax_f64(double, double);
_CLC_OVERLOAD _CLC_DEF double __clc_fmax(double x, double y) {
return __ocml_fmax_f64(x, y);
}
#endif

#ifdef cl_khr_fp16
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
half __ocml_fmax_f16(half, half);
_CLC_OVERLOAD _CLC_DEF half __clc_fmax(half x, half y) {
return __ocml_fmax_f16(x, y);
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should directly use llvm intrinsics

Comment on lines +12 to +18
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
double __ocml_sqrt_f64(double);
_CLC_OVERLOAD _CLC_DEF double __clc_sqrt(double x) {
return __ocml_sqrt_f64(x);
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just use the llvm intrinsic

Comment on lines +12 to +13
float __ocml_exp2_f32(float);
_CLC_OVERLOAD _CLC_DEF float __clc_exp2(float x) { return __ocml_exp2_f32(x); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the llvm intrinsic

Comment on lines +23 to +27
#ifdef cl_khr_fp16
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
half __ocml_exp2_f16(half);
_CLC_OVERLOAD _CLC_DEF half __clc_exp2(half x) { return __ocml_exp2_f16(x); }
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the llvm intrinsic

@wenju-he
Copy link
Contributor Author

wenju-he commented Aug 13, 2025

I'm fully rejecting this as a concept. libclc shall not depend on OCML.

thanks @arsenm. I see a lot of llvm-diff change due to use of OCML when doing refactoring in the downstream, so I put up this to consult with you. I'll close this PR.

In the longer term, we should exclusively emit the llvm intrinsics. The choice of how to implement the libm equivalents involving a call or not should be a compiler backend decision. At the current time, neither libclc or ocml are in a state usable as a compiler runtime library. The usage model is backwards, I want to migrate to a state where the actual implementations have a usage model closer to compiler-rt. What we do now for both libraries is more like a precompiled header that cannot be relied on from the compiler.

Just to confirm, libclc will emit llvm intrinsic and this is the state that we want for the future libclc, right?

@wenju-he wenju-he closed this Aug 13, 2025
@wenju-he wenju-he deleted the libclc-amdgcn-math-ocml branch August 13, 2025 02:01
wenju-he added a commit to wenju-he/llvm that referenced this pull request Aug 13, 2025
Also delete double/half type implementations, which are not allowed per spec.

llvm-diff shows lots of changes in libspirv-amdgcn--amdhsa.bc and
libspirv-nvptx64--nvidiacl.bc, due to use of __ocml_* and __nv_*
built-ins in clc and libspirv in intel/llvm repo.
Based on review comment in llvm/llvm-project#153328,
libclc shouldn't use __ocml_*. So bitcode change in this PR is expected.
@wenju-he
Copy link
Contributor Author

If you want to use these implementations, I'd rather merge the OCML content into libclc and migrate over to using it.

Tag @frasercrmck
The amdgcn implementation probably needs improvement to use llvm elementwise builtin for e.g. half/float exp*

steffenlarsen pushed a commit to intel/llvm that referenced this pull request Aug 14, 2025
…#19779)

Also delete double/half type implementations, which are not allowed per
spec.

llvm-diff shows lots of changes in libspirv-amdgcn--amdhsa.bc and
libspirv-nvptx64--nvidiacl.bc, due to use of __ocml_* and __nv_*
built-ins in clc and libspirv in intel/llvm repo.
Based on review comment in
llvm/llvm-project#153328, libclc shouldn't use
__ocml_*. So bitcode change in this PR is expected.
@arsenm
Copy link
Contributor

arsenm commented Oct 11, 2025

Just to confirm, libclc will emit llvm intrinsic and this is the state that we want for the future libclc, right?

Kind of yes and kind of no. libclc should have some code splitting and not be responsible for the lowest level implementation pieces. I'm going to give a talk about this as the upcoming GPU workshop at the conference

@wenju-he
Copy link
Contributor Author

Just to confirm, libclc will emit llvm intrinsic and this is the state that we want for the future libclc, right?

Kind of yes and kind of no. libclc should have some code splitting and not be responsible for the lowest level implementation pieces. I'm going to give a talk about this as the upcoming GPU workshop at the conference

That's great. Unfortunately I can't to attend the workshop. Will it be recorded and uploaded e.g. to youtube? I'd like to watch the talk.

@arsenm
Copy link
Contributor

arsenm commented Oct 11, 2025

That's great. Unfortunately I can't to attend the workshop. Will it be recorded and uploaded e.g. to youtube? I'd like to watch the talk.

No, but I can post the slides after

@wenju-he
Copy link
Contributor Author

That's great. Unfortunately I can't to attend the workshop. Will it be recorded and uploaded e.g. to youtube? I'd like to watch the talk.

No, but I can post the slides after

thanks, I look forward to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libclc libclc OpenCL library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants