-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[libclc] Replace float remquo with Intel IMF version #162643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current implementation has two issues: - unconditionally soft flushes denormal. - can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu. This PR upstreams remquo implementation from Intel Math Functions (IMF) Device Library. It supports denormal and can pass OpenCL CTS test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the current float remquo implementation with an Intel Math Functions (IMF) version to address denormal handling issues and pass OpenCL CTS tests.
- Replaces existing algorithm with IMF-based implementation that properly handles denormal values
- Adds comprehensive special case handling for NaN, infinity, and zero inputs
- Introduces optimized path selection based on input ranges to improve performance
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
File | Description |
---|---|
libclc/clc/lib/generic/math/clc_remquo.inc |
Complete rewrite of remquo implementation with new algorithm and special case handling |
libclc/clc/lib/generic/math/clc_remquo.cl |
Added include for clc_rint.h dependency |
exp_x = (int)((tmp_x & (0x7FF00000L)) >> 23) - 127; | ||
exp_y = (int)((tmp_y & (0x7FF00000L)) >> 23) - 127; | ||
// Test for NaNs, Infs, and Zeros | ||
if ((exp_x == (0x00000080L)) || (exp_y == (0x00000080L)) || |
Copilot
AI
Oct 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for infinity/NaN uses 0x00000080L (128) but should be 128 (0x80) since the bias was already subtracted. For single-precision floats, the special exponent value after bias subtraction should be 128 (255 - 127).
Copilot uses AI. Check for mistakes.
result = x * 1.7f; | ||
// y is NaN | ||
else if ((signif_y != (0x00000000L)) && (exp_y == (0x00000080L))) | ||
result = y * 1.7f; |
Copilot
AI
Oct 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 1.7f used for NaN propagation is unclear. Consider using a named constant or adding a comment explaining why this specific value is used for NaN handling.
Copilot uses AI. Check for mistakes.
Can you post the IR comparison between the old and new |
The diff is large, so I post them as attachment: |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.
The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:
Drop the second canonicalize:
ROCm@9a7bc19
Defer the flush of denormal:
ROCm@e9198f7
I have tried to port ocml implementation to replace intel gpu implementation, but the ported code can't pass OpenCL CTS. |
Yes, it passes CL conformance |
What values are failing? |
ok, thanks, let me re-examine my porting and then get back to you. |
Current implementation has two issues:
This PR upstreams remquo implementation from Intel Math Functions (IMF) Device Library. It supports denormal and can pass OpenCL CTS test.