You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These events can be successfully added, but are unable to be profiled:
[tburgess@hopper1 bin]$ ./papi_command_line cuda:::CTC.TriageCompute.ctc__rx_bytes
This utility lets you add events from the command line interface to see if they work.
Successfully added: cuda:::CTC.TriageCompute.ctc__rx_bytes
Error! PAPI_start
To successfully profile CTC events you must be using the CUPTI Range Profiling API. Currently the cuda component supports the Legacy APIs and the Perfworks Metrics API.
Due to this PR #458 added a note to the CTC metric's descriptions which read:
NOTE: The NVIDIA Perfworks API that the cuda component utilizes does not support profiling CTC metrics.
This PR will go ahead and remove the CTC events from the available Cuda native events on a GH200 due to:
These events are unable to be profiled with our current API support and the note which is left by us is fairly hidden.
The Cuda component test refactor which will select the first available event on a device if no event was provided by a user. In the case of the GH200 this first event is a CTC event and would result in the tests failing.
Support for the CUPTI Range Profiling API will be added in the future.
Testing
Testing was done on Hopper1 at Oregon which has the setup:
CPU: ARM Neoverse V2
GPU: 1 * GH200
OS: RHEL 9.4
Cuda Toolkit 12.8.1
papi_component_avail: ✅
papi_native_avail: ✅ (does not output CTC events)
papi_command_line: ✅ (does not add CTC events)
cuda component tests: ✅
Author Checklist
Description Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits Commits are self contained and only do one thing Commits have a header of the form: module: short description Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
I have tested these changes on the NVIDIA Hopper architecture using CUDA 13.0.0. The only differences when running the utilities (papi_avail, papi_native_avail, papi_component_avail, papi_hardware_avail, and papi_mem_info) between this feature branch and master are the number of available native events (64820 in master vs 64550 in this feature branch) in the output of papi_component_avail. However, this is the exact number of events, which have names starting with "cuda:::[ctc|CTC]*" in the output of papi_native_avail from the master branch. These (and only these) events are not present in the output of papi_native_avail from this feature branch, but they are present in the output from the master branch.
The cuda component tests behave as expected.
Note that I did not test these changes on the NVIDIA Blackwell architecture because it does not contain CTC events.
@Treece-Burgess Can you please rebase this PR so that we can merge it?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
Issue:
On NVIDIA GH200's you can find CTC events:
These events can be successfully added, but are unable to be profiled:
To successfully profile CTC events you must be using the CUPTI Range Profiling API. Currently the
cudacomponent supports the Legacy APIs and the Perfworks Metrics API.Due to this PR #458 added a note to the CTC metric's descriptions which read:
This PR will go ahead and remove the CTC events from the available Cuda native events on a GH200 due to:
Testing
Testing was done on Hopper1 at Oregon which has the setup:
CPU: ARM Neoverse V2
GPU: 1 * GH200
OS: RHEL 9.4
Cuda Toolkit 12.8.1
papi_component_avail: ✅papi_native_avail: ✅ (does not output CTC events)papi_command_line: ✅ (does not add CTC events)cudacomponent tests: ✅Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short descriptionCommits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests