Skip to content

Conversation

wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Oct 9, 2025

Current implementation has two issues:

  • unconditionally soft flushes denormal.
  • can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from Intel Math Functions (IMF) Device Library. It supports denormal and can pass OpenCL CTS test.

Current implementation has two issues:
- unconditionally soft flushes denormal.
- can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from Intel Math Functions (IMF)
Device Library. It supports denormal and can pass OpenCL CTS test.
@wenju-he wenju-he requested a review from Copilot October 9, 2025 12:26
@llvmbot llvmbot added the libclc libclc OpenCL library label Oct 9, 2025
@wenju-he wenju-he requested review from arsenm and frasercrmck October 9, 2025 12:26
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the current float remquo implementation with an Intel Math Functions (IMF) version to address denormal handling issues and pass OpenCL CTS tests.

  • Replaces existing algorithm with IMF-based implementation that properly handles denormal values
  • Adds comprehensive special case handling for NaN, infinity, and zero inputs
  • Introduces optimized path selection based on input ranges to improve performance

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
libclc/clc/lib/generic/math/clc_remquo.inc Complete rewrite of remquo implementation with new algorithm and special case handling
libclc/clc/lib/generic/math/clc_remquo.cl Added include for clc_rint.h dependency

exp_x = (int)((tmp_x & (0x7FF00000L)) >> 23) - 127;
exp_y = (int)((tmp_y & (0x7FF00000L)) >> 23) - 127;
// Test for NaNs, Infs, and Zeros
if ((exp_x == (0x00000080L)) || (exp_y == (0x00000080L)) ||
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for infinity/NaN uses 0x00000080L (128) but should be 128 (0x80) since the bias was already subtracted. For single-precision floats, the special exponent value after bias subtraction should be 128 (255 - 127).

Copilot uses AI. Check for mistakes.

Comment on lines 54 to 57
result = x * 1.7f;
// y is NaN
else if ((signif_y != (0x00000000L)) && (exp_y == (0x00000080L)))
result = y * 1.7f;
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 1.7f used for NaN propagation is unclear. Consider using a named constant or adding a comment explaining why this specific value is used for NaN handling.

Copilot uses AI. Check for mistakes.

@arsenm
Copy link
Contributor

arsenm commented Oct 9, 2025

Can you post the IR comparison between the old and new

@wenju-he
Copy link
Contributor Author

Can you post the IR comparison between the old and new

The diff is large, so I post them as attachment:
amdgcn--amdhsa.bc._Z6remquoffPU3AS1i.NEW.txt
amdgcn--amdhsa.bc._Z6remquoffPU3AS1i.OLD.txt

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.

The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:

Drop the second canonicalize:
ROCm@9a7bc19

Defer the flush of denormal:
ROCm@e9198f7

@wenju-he
Copy link
Contributor Author

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.

The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:

Drop the second canonicalize: ROCm@9a7bc19

Defer the flush of denormal: ROCm@e9198f7

I have tried to port ocml implementation to replace intel gpu implementation, but the ported code can't pass OpenCL CTS.
The ocml implementation __ocml_remquo_f32 should satisfies ulp requirement in OpenCL spec, right?

@arsenm
Copy link
Contributor

arsenm commented Oct 10, 2025

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.
The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:
Drop the second canonicalize: ROCm@9a7bc19
Defer the flush of denormal: ROCm@e9198f7

I have tried to port ocml implementation to replace intel gpu implementation, but the ported code can't pass OpenCL CTS. The ocml implementation __ocml_remquo_f32 should satisfies ulp requirement in OpenCL spec, right?

Yes, it passes CL conformance

@arsenm
Copy link
Contributor

arsenm commented Oct 10, 2025

What values are failing?

@wenju-he
Copy link
Contributor Author

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.
The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:
Drop the second canonicalize: ROCm@9a7bc19
Defer the flush of denormal: ROCm@e9198f7

I have tried to port ocml implementation to replace intel gpu implementation, but the ported code can't pass OpenCL CTS. The ocml implementation __ocml_remquo_f32 should satisfies ulp requirement in OpenCL spec, right?

Yes, it passes CL conformance

ok, thanks, let me re-examine my porting and then get back to you.

@wenju-he wenju-he closed this Oct 10, 2025
@wenju-he wenju-he deleted the libclc-remquo-float branch October 10, 2025 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libclc libclc OpenCL library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants