Skip to content

Cherry-Pick StreamK Changes to rocm 7.0#1753

Merged
vamovsik merged 4 commits intoROCm:release/rocm-rel-7.0from
aliry95amd:users/aliry95amd/cherry_pick_SK_changes_rocm7
Sep 26, 2025
Merged

Cherry-Pick StreamK Changes to rocm 7.0#1753
vamovsik merged 4 commits intoROCm:release/rocm-rel-7.0from
aliry95amd:users/aliry95amd/cherry_pick_SK_changes_rocm7

Conversation

@aliry95amd
Copy link
Contributor

@aliry95amd aliry95amd commented Sep 23, 2025

Motivation

Some StreamK features/improvements are needed.

Technical Details

This PR avoids multiple potential overflows in StreamK math.

Test Plan

Locally on GFX950 and CI

Test Result

[----------] Global test environment tear-down
[==========] 19997 tests from 12 test suites ran. (1601396 ms total)
[ PASSED ] 19997 tests.
hipBLASLt version: 100000
hipBLASLt git version: 20250912-42-17-gb1537e7cb6-dirty
command line: ./hipblaslt-test

Submission Checklist

This PR unpacks skGridAndTiles to two different SGPRs.
This PR prevents overflow in divisions related to StreamK.

ScalarU32 division in rocisa is only accurate for values less than 2^24
as it converts U32 to F32. There are occasions that we need to divide
values larger than 2^24 in StreamK and it is not possible to use this
division algo. The alternative approach requires adding two SGPRs to
kernel args, but one SGPR is removed. Therefore, some kernels were not
built with this extra SGPR, and those failures were minimally modified.

CI and locally.

Performance and validation results have been shared.

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
@vamovsik vamovsik merged commit 0a25de4 into ROCm:release/rocm-rel-7.0 Sep 26, 2025
4 of 5 checks passed
assistant-librarian bot pushed a commit to ROCm/hipBLASLt that referenced this pull request Sep 26, 2025
Cherry-Pick StreamK Changes to rocm 7.0

## Motivation

Some StreamK features/improvements are needed.

## Technical Details

This PR avoids multiple potential overflows in StreamK math.

## Test Plan

Locally on GFX950 and CI

## Test Result

[----------] Global test environment tear-down
[==========] 19997 tests from 12 test suites ran. (1601396 ms total)
[  PASSED  ] 19997 tests.
hipBLASLt version: 100000
hipBLASLt git version: 20250912-42-17-gb1537e7cb6-dirty
command line: ./hipblaslt-test

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants