WIP: TF32 Performance and Refactor#2274
Open
carsonbrownlee wants to merge 20 commits intoROCm:developfrom
Open
WIP: TF32 Performance and Refactor#2274carsonbrownlee wants to merge 20 commits intoROCm:developfrom
carsonbrownlee wants to merge 20 commits intoROCm:developfrom
Conversation
tf32 support for different vector widths and depthu tf32 lds debug adding attribute check to issueLatency calls tf32 fix lds tf32 fix for hipblaslt build tf32 lds fix for broken wave dims support for tf32 16x16x32 mfma tf32 16x16x32 debug adding absolute and relative error output to hipblaslt-bench client tf32 16x16x32 debug fixing broken fp32 runs from instOffset tf32 MI 16x16x32 working with 256b reads fixing broken tf32 16x16x16 kernels tf32 emulation optimization tf32 fix for lds tf32 performance optimizations. k32 MI working with 256 tile sizes tf32 yaml for high compute intensity
32d04c1 to
2d7b251
Compare
4f80d52 to
5dee74a
Compare
5dee74a to
6c6919b
Compare
TF32 origami libs with 16x16x32 and optional lsu
…r tf32 test build
Temporarily disable warning message on missing instruction latency for tf32 test build
c3ba315 to
ede0beb
Compare
ammallya
pushed a commit
that referenced
this pull request
Nov 21, 2025
## Motivation Recently a customer requested to support sigmoid activation function in hipblaslt. ## Technical Details Tensilelite already supports sigmoid and has the GPU code Module named "Sigmoid" implemented in Activation.py. In order to enable this feature in hipblaslt, we had to update the enum types for hipblaslt_activation_type, epilogue types in hipblaslt and rocblaslt abstractions. Added the gflops count in flops.hpp and updated the utility functions. ## Test Plan New additional smoke tests and Matmul tests were added and existing tests were extended to consider sigmoid activation function as well. Unit tests were added to test for the newly added enum values for sigmoid activation_type and epilogue. ## Test Result All the above mentioned tests passed when running hipblaslt-test --------- Co-authored-by: Madhusoodhanan Prabha <amadhuso@ctr2-alola-login-01.amd.com> Co-authored-by: Madhusoodhanan Prabha <amadhuso@ctr2-alola-ctrl-01.amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
refactor: tf32 codegen into F32XEmulation module
add: tf32 support for different vector widths and depthu
add: lds support for both inputs in TF32 MFMA
fix: adding attribute check to issueLatency and other calls that assume type
add: support for tf32 16x16x32 mfma and 256x256 tile sizes
add: adding absolute and relative error output to hipblaslt-bench client
add: tuned tf32 yamls for high compute intensity