[AMD] refactor proton to use rocprofiler-sdk and deprecate roctracer #8894
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I am not comfortable enough with the CI to know best way for figuring out how to incorporate this for these tests.
Right now lazy loading isn't supported by rocprofiler-sdk. As a result, since we need to initialize it before hip does to properly intercept kernels, I am using
ROCP_TOOL_LIBRARIESto get around that. PyTorch will initialize hip if we don't do it first.rocprofiler-sdklooks for this env variable as i understand it and then callsrocprofiler_configure(note for self: audit later).rocprofiler-sdk most importantly supports the ability to attach and detach at any point so you can minimize overhead by not having to profile all the time.
proton-cli is the way to use the new library, as a consequence of 2. Lazy loading may be added in the future.
We use
rocprofiler-registerto make our lives easier. Each library (HIP, ROCr) calls into the register and provide interception table. That's the mechanism widely accepted (as a standard) in ROCm stack.Pytorch seems to be using this
/opt/rocm/lib/libroctx64.sobut rocprofiler-sdk intercepts/opt/rocm/lib/librocprofiler-sdk-roctx.so. A few of the nvtx tests were failingFor this PR I think the ergonomics of the libraries we're loading will need to be looked at to make sure they're ideal.
NOTE: Stochastic sampling will be added in a future PR. The work is complete but it seems to work.