Skip to content

WIP: TF32 Performance and Refactor#2274

Open
carsonbrownlee wants to merge 20 commits intoROCm:developfrom
carsonbrownlee:cbrownle/tf32_performance
Open

WIP: TF32 Performance and Refactor#2274
carsonbrownlee wants to merge 20 commits intoROCm:developfrom
carsonbrownlee:cbrownle/tf32_performance

Conversation

@carsonbrownlee
Copy link
Contributor

refactor: tf32 codegen into F32XEmulation module

add: tf32 support for different vector widths and depthu

add: lds support for both inputs in TF32 MFMA

fix: adding attribute check to issueLatency and other calls that assume type

add: support for tf32 16x16x32 mfma and 256x256 tile sizes

add: adding absolute and relative error output to hipblaslt-bench client

add: tuned tf32 yamls for high compute intensity

tf32 support for different vector widths and depthu

tf32 lds debug

adding attribute check to issueLatency calls

tf32 fix lds

tf32 fix for hipblaslt build

tf32 lds fix for broken wave dims

support for tf32 16x16x32 mfma

tf32 16x16x32 debug

adding absolute and relative error output to hipblaslt-bench client

tf32 16x16x32 debug

fixing broken fp32 runs from instOffset

tf32 MI 16x16x32 working with 256b reads

fixing broken tf32 16x16x16 kernels

tf32 emulation optimization

tf32 fix for lds

tf32 performance optimizations. k32 MI working with 256 tile sizes

tf32 yaml for high compute intensity
@b-shi b-shi force-pushed the cbrownle/tf32_performance branch from 32d04c1 to 2d7b251 Compare July 9, 2025 18:46
@carsonbrownlee carsonbrownlee force-pushed the cbrownle/tf32_performance branch from 4f80d52 to 5dee74a Compare July 13, 2025 20:32
@carsonbrownlee carsonbrownlee force-pushed the cbrownle/tf32_performance branch from 5dee74a to 6c6919b Compare July 14, 2025 08:39
AlexBrownAMD and others added 4 commits July 14, 2025 12:38
TF32 origami libs with 16x16x32 and optional lsu
Temporarily disable warning message on missing instruction latency for tf32 test build
@carsonbrownlee carsonbrownlee force-pushed the cbrownle/tf32_performance branch from c3ba315 to ede0beb Compare July 17, 2025 06:28
ammallya pushed a commit that referenced this pull request Nov 21, 2025
## Motivation

Recently a customer requested to support sigmoid activation function in
hipblaslt.

## Technical Details

Tensilelite already supports sigmoid and has the GPU code Module named
"Sigmoid" implemented in Activation.py. In order to enable this feature
in hipblaslt, we had to update the enum types for
hipblaslt_activation_type, epilogue types in hipblaslt and rocblaslt
abstractions.

Added the gflops count in flops.hpp and updated the utility functions. 

## Test Plan

New additional smoke tests and Matmul tests were added and existing
tests were extended to consider sigmoid activation function as well.
Unit tests were added to test for the newly added enum values for
sigmoid activation_type and epilogue.

## Test Result

All the above mentioned tests passed when running hipblaslt-test

---------

Co-authored-by: Madhusoodhanan Prabha <amadhuso@ctr2-alola-login-01.amd.com>
Co-authored-by: Madhusoodhanan Prabha <amadhuso@ctr2-alola-ctrl-01.amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants