Skip to content

rocBLAS support #31

@dhjoo98

Description

@dhjoo98

In the attempt to use Omniprobe AddressLogger for PyTorch ML workloads, I first tried to instrument rocBLAS kernels - the goto for GeMM ops.

As rocBLAS APIs themselves launch precompiled HIP kernels internally, simply adding instrumentation plugins to a program that calls rocBLAS APIs (e.g. rocblas_sgemm()), didn't work and returned 833 code: 2.

I then moved onto building the rocBLAS from source (https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.4.0/install/Linux_Install_Guide.html),
linking against plugin shared lib:
./install.sh -d -a auto --cmake-arg='-DCMAKE_CXX_FLAGS="-fpass-plugin=/opt/logduration/lib/libAMDGCNSubmitAddressMessages-rocm.so -fgpu-rdc"'

I think this does inject the necessary functions, as compile log shows messages such as:
Injecting Mem Trace Function Into AMDGPU Kernel: __amd_crk__ZL19rocblas_trsv_deviceILi64ELi16ELb0ELb1ELb0ELb0E19rocblas_complex_numIfEPKS1_PKS3_PKPS1_EviT7_lllT6_T8_lllPiiPv

Then, I set the LOGDUR_KERNEL_CACHE to the build subdirectory that contains rocBLAS kernel objects (.hsaco)

Now omniprobe does progress beyond

HANDLER: libLogMessages64.so
handlerManager: OpenedlibLogMessages64.so

without 833 code:2,

but ends with

>>>>>>>> HSA intercept registered.
signal_runner is shutting down
Comms Runner shutting down
Cache Watcher shutting down
hsaInterceptor: Application elapsed usecs: 9751595us

Without proper tracing.

So far I've tried both AddressLogger and MemoryAnaylzer. I also tried both hipblaslt and Tensile backend for rocBLAS, but both has same effect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions