Skip to content

Port back ordered block id feature to rocPRIM rocm 7.0.x#2257

Closed
stanleytsang-amd wants to merge 6 commits intorelease/rocm-rel-7.0from
users/stanleytsang-amd/rocprim-gfx942-ordered-block-id-7.0
Closed

Port back ordered block id feature to rocPRIM rocm 7.0.x#2257
stanleytsang-amd wants to merge 6 commits intorelease/rocm-rel-7.0from
users/stanleytsang-amd/rocprim-gfx942-ordered-block-id-7.0

Conversation

@stanleytsang-amd
Copy link
Contributor

@stanleytsang-amd stanleytsang-amd commented Oct 23, 2025

DO NOT MERGE UNTIL PM APPROVAL GIVEN.

Motivation

See #1981 for full explanation. This feature/fix needs to be ported back to ROCm 7.0.x

Technical Details

Porting back 7.1 ordered block id changes to 7.0 codebase. I had to pull in some other (slightly) unrelated 7.1 code changes to make the porting work.

Also had to cherry pick 6c24aff to get the Windows job to pass.

Test Plan

Running CI.

Test Result

Submission Checklist

stanleytsang-amd and others added 4 commits October 22, 2025 19:18
Currently, on Windows, GTest cannot print 128-bit ints. We have a check
in `test_utils::protected_assert_eq` that avoids calling `ASSERT_EQ` on
128-bit int values directly, since this will cause the values to be
printed in the event of an error.

This check was relying on the `is_int128` alias, which was being set to
`false_type` when `ROCPRIM_HAS_INT128_SUPPORT` was `false`. As a result,
when 128-bit types were passed in, our check could not detect them and
would fail to stop the printing.

In
[rocprim/types.hpp](https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocprim/rocprim/include/rocprim/types.hpp#L69),
the types `rocprim::int128_t` and `rocprim::uint128_t` are now defined
regardless of how `ROCPRIM_HAS_INT128_SUPPORT` is set. This means we no
longer need to guard against usage of these types in our test code (we
only need to use `ROCPRIM_HAS_INT128_SUPPORT` in cases where we're doing
some operation that explicitly won't work on 128-bit ints).

This change removes the code that sets the `is_int128` alias to
`false_type` when `ROCPRIM_HAS_INT128_SUPPORT` is not set. Doing this is
enough to fix the check in `test_utils::protected_assert_eq`.
Reverse UseGTestAssert condition on Windows
    
Recently, we added a check to see if GTest's ASSERT_EQ assertion
should be used within the assert_eq function. The code in the if/else
blocks that act on the results of this check was inverted (the "else"
code block should be the "if" block, and vice-versa).
This change fixes the issue by swapping the code blocks.

This change should be merged after #449.
Copy link
Contributor

@NB4444 NB4444 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The back port looks fine. To summarize it back ports the new config launch system, the ordered block id code and some small fixes.

As we add support for more GPU architectures, the size of the fat binary
is growing. For some builds, it's large enough that we are running into
linker errors because symbols are out of range.

This change adds a cmake option called `BUILD_OFFLOAD_COMPRESS`,
(defaults to `ON`) which enables/disables the `--offload-compress`
compiler option. This option causes the compiler to compress the binary
after it's generated. When the binary is run, it is decompressed in one
shot on startup. The performance penalty for this decompression seems to
be minimal. The reduction in size to the fat binary is significant for
some targets (up to about 80% in some cases).

It also removes these obsolete default build targets: gfx803, gfx900.

Build rocPRIM for all default architectures. Ensure there are no linker
errors.

No linker errors locally - will wait for CI completion to confirm there
as well.

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@stanleytsang-amd stanleytsang-amd requested a review from a team as a code owner October 31, 2025 19:54
@jharryma jharryma deleted the users/stanleytsang-amd/rocprim-gfx942-ordered-block-id-7.0 branch January 15, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants