Cherry-Pick StreamK Changes to rocm 7.0#1753
Merged
vamovsik merged 4 commits intoROCm:release/rocm-rel-7.0from Sep 26, 2025
Merged
Cherry-Pick StreamK Changes to rocm 7.0#1753vamovsik merged 4 commits intoROCm:release/rocm-rel-7.0from
vamovsik merged 4 commits intoROCm:release/rocm-rel-7.0from
Conversation
This PR unpacks skGridAndTiles to two different SGPRs.
This PR prevents overflow in divisions related to StreamK. ScalarU32 division in rocisa is only accurate for values less than 2^24 as it converts U32 to F32. There are occasions that we need to divide values larger than 2^24 in StreamK and it is not possible to use this division algo. The alternative approach requires adding two SGPRs to kernel args, but one SGPR is removed. Therefore, some kernels were not built with this extra SGPR, and those failures were minimally modified. CI and locally. Performance and validation results have been shared. - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
AlexBrownAMD
approved these changes
Sep 25, 2025
assistant-librarian bot
pushed a commit
to ROCm/hipBLASLt
that referenced
this pull request
Sep 26, 2025
Cherry-Pick StreamK Changes to rocm 7.0 ## Motivation Some StreamK features/improvements are needed. ## Technical Details This PR avoids multiple potential overflows in StreamK math. ## Test Plan Locally on GFX950 and CI ## Test Result [----------] Global test environment tear-down [==========] 19997 tests from 12 test suites ran. (1601396 ms total) [ PASSED ] 19997 tests. hipBLASLt version: 100000 hipBLASLt git version: 20250912-42-17-gb1537e7cb6-dirty command line: ./hipblaslt-test ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Some StreamK features/improvements are needed.
Technical Details
This PR avoids multiple potential overflows in StreamK math.
Test Plan
Locally on GFX950 and CI
Test Result
[----------] Global test environment tear-down
[==========] 19997 tests from 12 test suites ran. (1601396 ms total)
[ PASSED ] 19997 tests.
hipBLASLt version: 100000
hipBLASLt git version: 20250912-42-17-gb1537e7cb6-dirty
command line: ./hipblaslt-test
Submission Checklist