-
Notifications
You must be signed in to change notification settings - Fork 5
Description
In the attempt to use Omniprobe AddressLogger for PyTorch ML workloads, I first tried to instrument rocBLAS kernels - the goto for GeMM ops.
As rocBLAS APIs themselves launch precompiled HIP kernels internally, simply adding instrumentation plugins to a program that calls rocBLAS APIs (e.g. rocblas_sgemm()), didn't work and returned 833 code: 2.
I then moved onto building the rocBLAS from source (https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.4.0/install/Linux_Install_Guide.html),
linking against plugin shared lib:
./install.sh -d -a auto --cmake-arg='-DCMAKE_CXX_FLAGS="-fpass-plugin=/opt/logduration/lib/libAMDGCNSubmitAddressMessages-rocm.so -fgpu-rdc"'
I think this does inject the necessary functions, as compile log shows messages such as:
Injecting Mem Trace Function Into AMDGPU Kernel: __amd_crk__ZL19rocblas_trsv_deviceILi64ELi16ELb0ELb1ELb0ELb0E19rocblas_complex_numIfEPKS1_PKS3_PKPS1_EviT7_lllT6_T8_lllPiiPv
Then, I set the LOGDUR_KERNEL_CACHE to the build subdirectory that contains rocBLAS kernel objects (.hsaco)
Now omniprobe does progress beyond
HANDLER: libLogMessages64.so
handlerManager: OpenedlibLogMessages64.so
without 833 code:2,
but ends with
>>>>>>>> HSA intercept registered.
signal_runner is shutting down
Comms Runner shutting down
Cache Watcher shutting down
hsaInterceptor: Application elapsed usecs: 9751595us
Without proper tracing.
So far I've tried both AddressLogger and MemoryAnaylzer. I also tried both hipblaslt and Tensile backend for rocBLAS, but both has same effect.