Skip to content

unify float comparison helpers with isclose#8131

Open
NitishNaineni wants to merge 8 commits intoNVIDIA:mainfrom
NitishNaineni:unify-float-comparison
Open

unify float comparison helpers with isclose#8131
NitishNaineni wants to merge 8 commits intoNVIDIA:mainfrom
NitishNaineni:unify-float-comparison

Conversation

@NitishNaineni
Copy link

Description

closes #7662

replaced 6 different float comparison implementations across cub/thrust/c2h with one isclose function using the PEP-0485 symmetric formula: |a - b| <= max(a_tol, r_tol * max(|a|, |b|))

what changed:

  • added c2h/include/c2h/isclose.h with the canonical isclose function
  • rewrote REQUIRE_APPROX_EQ* macros to use isclose
  • replaced CompareResults float/double in test_util.h
  • replaced WithinRel calls in check_results.cuh and thread_reduce tests
  • replaced require_almost_equal in thrust complex tests
  • replaced hand-rolled inline comparison in segmented scan tests

default tolerance is 1000 * numeric_limits<T>::epsilon() which is derived from the type's precision instead of hardcoded magic numbers. callers can still pass custom tolerances when needed.

left per-algorithm wilkinson tolerance tuning and libcudacxx/cudax test unification for follow-ups.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

replaced 6 different float comparison implementations (fptest_close, is_about, almost_equal, CompareResults, WithinRel, REQUIRE_APPROX_EQ macros) with one canonical isclose function using the PEP-0485 symmetric formula. closes NVIDIA#7662
@NitishNaineni NitishNaineni requested review from a team as code owners March 22, 2026 05:50
@NitishNaineni NitishNaineni requested a review from jrhemstad March 22, 2026 05:50
@github-project-automation github-project-automation bot moved this to Todo in CCCL Mar 22, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Mar 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Mar 22, 2026
@bernhardmgruber
Copy link
Contributor

Thanks for this contribution!

@oleksandr-pavlyk and @fbusato can you take a look at this?

@@ -0,0 +1,45 @@
// SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
// SPDX-License-Identifier: BSD-3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SPDX-License-Identifier: BSD-3
// SPDX-License-Identifier: BSD-3-Clause

I see that some headers in c2h use BSD-3, but the documented identifier is BSD-3-Clause.

}
else
{
return a == b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If isclose is used on a complex type, it would use a == b branch, perhaps unexpectedly to the user. Perhaps it is worth adding static_assert(std::is_integral_v<T>, "Non-integral type using exact comparison");, or perhaps a static assertion that type T does not expose real and imag methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have added an assert

updated documented identifier to BSD-3-Clause

Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
{
if constexpr (std::is_floating_point_v<T>)
{
return isclose(a, b, T(1000) * std::numeric_limits<T>::epsilon(), T(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use powers of 2 here. T( 1 << 10 ) would mean discrepancy of 10 binary ULPs is tolerated.

10 binary units is about half of 23 total explicit mantissa units for single precision floating point numbers, but is all most of 11 total explicit half-precision explicit mantissa units.

Given that std::is_floating_point_v<std::float16_t> is true, perhaps we should use T(1 << 8) as default multiplier instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, switched to T(1 << 8)

auto test_imag = test_results[i].imag();
REQUIRE_THAT(expected_real, Catch::Matchers::WithinRel(test_real));
REQUIRE_THAT(expected_imag, Catch::Matchers::WithinRel(test_imag));
INFO("index " << i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of INFO added which would litter the output on failure in comparisons of large arrays.

I think it would be useful to save values of isclose checks, and place INFO("index" << i); in the branch executed only if isclose check is not met. REQUIRE would reuse the result of testing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made INFO conditional on failed check

std::accumulate(h_in_float.begin(), h_in_float.begin() + num_items, operator_identity, std_reduce_op);
run_thread_reduce_kernel(num_items, d_in, d_out, reduce_op);
verify_results(reference_result, float{c2h::host_vector<value_t>(d_out)[0]});
float test_result = float{c2h::host_vector<value_t>(d_out)[0]};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
float test_result = float{c2h::host_vector<value_t>(d_out)[0]};
float test_result{c2h::host_vector<value_t>(d_out)[0]};

Comment on lines +56 to +57
CHECK(isclose(static_cast<double>(a.real()), static_cast<double>(b.real())));
CHECK(isclose(static_cast<double>(a.imag()), static_cast<double>(b.imag())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cast to double? The cast results in relative tolerance specific to double precision to be used when comparing complex values with less precise real/imaginary types.

Is anything wrong with checking isclose(a.rea(), b.real()) and isclose(a.imag(), b.imag())?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the issue is that a common type is needed, perhaps use std::common_type<T1, T2>: https://en.cppreference.com/w/cpp/types/common_type.html

::cuda::std::enable_if_t<!is_complex<T1> && !is_complex<T2>> require_almost_equal(const T1& a, const T2& b)
{
CHECK(a == Catch::Approx(b).margin(DEFAULT_ABSOLUTE_TOL).epsilon(DEFAULT_RELATIVE_TOL));
CHECK(isclose(static_cast<double>(a), static_cast<double>(b)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here.

@oleksandr-pavlyk
Copy link
Contributor

/ok to test 97f7b85

Comment on lines +1 to +2
// SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
// SPDX-License-Identifier: BSD-3-Clause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Any net-new source file should be under Apache-2.0 WITH LLVM-exception

Suggested change
// SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
// SPDX-License-Identifier: BSD-3-Clause
// SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

@github-actions
Copy link
Contributor

😬 CI Workflow Results

🟥 Finished in 1h 28m: Pass: 79%/287 | Total: 7d 14h | Max: 1h 27m | Hits: 80%/170175

See results here.

@fbusato fbusato self-requested a review March 23, 2026 16:38
@NitishNaineni
Copy link
Author

reverted the thrust complex test changes for now. isclose lives in c2h which is only accessible from cub and c/parallel tests. thrust, libcudacxx, and cudax tests can't reach it. is there a shared location where a test utility like this could live so all test frameworks can use it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Harmonize methods used for floating-point approximate equality testing across CCCL test suite

3 participants