[libclc] Replace float remquo with Intel IMF version #162643

wenju-he · 2025-10-09T12:26:27Z

Current implementation has two issues:

unconditionally soft flushes denormal.
can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from Intel Math Functions (IMF) Device Library. It supports denormal and can pass OpenCL CTS test.

Current implementation has two issues: - unconditionally soft flushes denormal. - can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu. This PR upstreams remquo implementation from Intel Math Functions (IMF) Device Library. It supports denormal and can pass OpenCL CTS test.

Copilot

Pull Request Overview

This PR replaces the current float remquo implementation with an Intel Math Functions (IMF) version to address denormal handling issues and pass OpenCL CTS tests.

Replaces existing algorithm with IMF-based implementation that properly handles denormal values
Adds comprehensive special case handling for NaN, infinity, and zero inputs
Introduces optimized path selection based on input ranges to improve performance

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
`libclc/clc/lib/generic/math/clc_remquo.inc`	Complete rewrite of remquo implementation with new algorithm and special case handling
`libclc/clc/lib/generic/math/clc_remquo.cl`	Added include for clc_rint.h dependency

libclc/clc/lib/generic/math/clc_remquo.inc

Copilot · 2025-10-09T12:29:00Z

libclc/clc/lib/generic/math/clc_remquo.inc

+  exp_x = (int)((tmp_x & (0x7FF00000L)) >> 23) - 127;
+  exp_y = (int)((tmp_y & (0x7FF00000L)) >> 23) - 127;
+  // Test for NaNs, Infs, and Zeros
+  if ((exp_x == (0x00000080L)) || (exp_y == (0x00000080L)) ||


The check for infinity/NaN uses 0x00000080L (128) but should be 128 (0x80) since the bias was already subtracted. For single-precision floats, the special exponent value after bias subtraction should be 128 (255 - 127).

libclc/clc/lib/generic/math/clc_remquo.inc

Copilot · 2025-10-09T12:29:01Z

libclc/clc/lib/generic/math/clc_remquo.inc

+      result = x * 1.7f;
+    // y is NaN
+    else if ((signif_y != (0x00000000L)) && (exp_y == (0x00000080L)))
+      result = y * 1.7f;


The magic number 1.7f used for NaN propagation is unclear. Consider using a named constant or adding a comment explaining why this specific value is used for NaN handling.

arsenm · 2025-10-09T16:48:39Z

Can you post the IR comparison between the old and new

wenju-he · 2025-10-10T00:16:48Z

Can you post the IR comparison between the old and new

The diff is large, so I post them as attachment:
amdgcn--amdhsa.bc._Z6remquoffPU3AS1i.NEW.txt
amdgcn--amdhsa.bc._Z6remquoffPU3AS1i.OLD.txt

Co-authored-by: Copilot <[email protected]>

arsenm

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.

The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:

Drop the second canonicalize:
ROCm@9a7bc19

Defer the flush of denormal:
ROCm@e9198f7

wenju-he · 2025-10-10T07:30:33Z

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.

The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:

Drop the second canonicalize: ROCm@9a7bc19

Defer the flush of denormal: ROCm@e9198f7

I have tried to port ocml implementation to replace intel gpu implementation, but the ported code can't pass OpenCL CTS.
The ocml implementation __ocml_remquo_f32 should satisfies ulp requirement in OpenCL spec, right?

arsenm · 2025-10-10T07:32:24Z

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.
The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:
Drop the second canonicalize: ROCm@9a7bc19
Defer the flush of denormal: ROCm@e9198f7

I have tried to port ocml implementation to replace intel gpu implementation, but the ported code can't pass OpenCL CTS. The ocml implementation __ocml_remquo_f32 should satisfies ulp requirement in OpenCL spec, right?

Yes, it passes CL conformance

arsenm · 2025-10-10T07:34:36Z

What values are failing?

wenju-he · 2025-10-10T07:36:56Z

This new codegen looks a lot worse. I think you should just fix the bugs in the existing implementation.
The canonicalize / flush handling is plainly broken. I think you should follow along with these patches:
Drop the second canonicalize: ROCm@9a7bc19
Defer the flush of denormal: ROCm@e9198f7

I have tried to port ocml implementation to replace intel gpu implementation, but the ported code can't pass OpenCL CTS. The ocml implementation __ocml_remquo_f32 should satisfies ulp requirement in OpenCL spec, right?

Yes, it passes CL conformance

ok, thanks, let me re-examine my porting and then get back to you.

wenju-he requested a review from Copilot October 9, 2025 12:26

llvmbot added the libclc libclc OpenCL library label Oct 9, 2025

wenju-he requested review from arsenm and frasercrmck October 9, 2025 12:26

Copilot AI reviewed Oct 9, 2025

View reviewed changes

wenju-he and others added 4 commits October 10, 2025 08:17

Apply suggestion from @Copilot

a9f0402

Co-authored-by: Copilot <[email protected]>

Update libclc/clc/lib/generic/math/clc_remquo.inc

30c5e3c

Co-authored-by: Copilot <[email protected]>

0x007FFFFFL -> MANTBITS_SP32, remove L suffix

30373c2

share internal_remquo for difference address space implementations

39cb467

arsenm reviewed Oct 10, 2025

View reviewed changes

wenju-he closed this Oct 10, 2025

wenju-he deleted the libclc-remquo-float branch October 10, 2025 07:37

[libclc] Replace float remquo with Intel IMF version #162643

[libclc] Replace float remquo with Intel IMF version #162643

Uh oh!

Conversation

wenju-he commented Oct 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm commented Oct 9, 2025

Uh oh!

wenju-he commented Oct 10, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

wenju-he commented Oct 10, 2025

Uh oh!

arsenm commented Oct 10, 2025

Uh oh!

arsenm commented Oct 10, 2025

Uh oh!

wenju-he commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants