Skip to content

cuda perfworks api: Update internal string handling#513

Merged
Treece-Burgess merged 1 commit intoicl-utk-edu:masterfrom
Treece-Burgess:11-20-2025-cuda-increase-string-length
Jan 8, 2026
Merged

cuda perfworks api: Update internal string handling#513
Treece-Burgess merged 1 commit intoicl-utk-edu:masterfrom
Treece-Burgess:11-20-2025-cuda-increase-string-length

Conversation

@Treece-Burgess
Copy link
Contributor

@Treece-Burgess Treece-Burgess commented Nov 20, 2025

Pull Request Description

Issue:
Currently when using the master branch with the following configure ./configure --prefix=$PWD/test-install --with-components="cuda" --with-debug=yes and on a machine (Hopper1 at Oregon) with an NVIDIA GH200, the following will output when running papi_native_avail:

[tburgess@hopper1 bin]$ ./papi_native_avail > ntv.out
PAPI Error: Error Code -6,Internal error, please send mail to the developers
PAPI Error: Error Code -6,Internal error, please send mail to the developers
PAPI Error: Error Code -6,Internal error, please send mail to the developers
PAPI Error: Error Code -6,Internal error, please send mail to the developers
PAPI Error: Error Code -6,Internal error, please send mail to the developers
PAPI Error: Error Code -6,Internal error, please send mail to the developers

This is due to a few of the CUPTI metrics having a total of 128 characters or more and us internally only copying 128 characters (PAPI_MAX_STR_LEN). Which results in us internally chopping off the last few chars and the CUPTI call failing.

This PR resolves this behavior.

Testing

Testing was done on Hopper1 at Oregon with the setup:

  • CPU: ARM Neoverse V2
  • GPU: 1 * GH200
  • OS: RHEL 9.4
  • Cuda Toolkit: 12.8.1

The PAPI utilities papi_component_avail, papi_native_avail, and papi_command_line all ran successfully. Along with the utilities, the Cuda component tests all passed.

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@Treece-Burgess Treece-Burgess added component-cuda PRs and Issues related to the cuda component type-bug Issues discussing bugs or PRs fixing bugs status-ready-for-review PR is ready to be reviewed labels Nov 20, 2025
@dbarry9
Copy link
Contributor

dbarry9 commented Dec 11, 2025

I am reviewing this PR.

@dbarry9
Copy link
Contributor

dbarry9 commented Dec 12, 2025

I am still reviewing this PR. I encounter the following error message when running papi_native_avail:

PAPI Error: Error Code -6,Internal error, please send mail to the developers

on an NVIDIA GeForce RTX 5080 (Blackwell architecture), using CUDA 12.9.

@dbarry9
Copy link
Contributor

dbarry9 commented Jan 6, 2026

I did two rounds of testing these changes on the NVIDIA Hopper architecture using CUDA 13.0.0. For the first round of testing, I enabled the debug messages. This confirmed that the changes in this PR indeed resolve the reported error messages.

For the second round of testing, I disabled the debug messages to more easily compare (between this feature branch and master) the output of the PAPI utilities (papi_avail, papi_native_avail, papi_component_avail, papi_hardware_avail, and papi_mem_info) and the cuda component tests. The PAPI utilities produce the same output with the exception of the following:

diff papi_native_avail__from_PR.txt papi_native_avail__from_master.txt

15506,15512d15505
< | cuda:::TPC.TriageCompute.tpc__l1tex_sram_bytes_mem_untagged_data_shared_allocated_compute_realtime.peak_sustained_active.per_second    |
< |            . Units=(bytes/gpc_cycles/seconds). Numpass=1.                    |
< |     :stat=avg                                                                |
< |            Mandatory stat qualifier [avg, max, min, sum]                     |
< |     :device=0                                                                |
< |            Mandatory device qualifier [0]                                    |
< --------------------------------------------------------------------------------
15520,15526d15512
< | cuda:::TPC.TriageCompute.tpc__l1tex_sram_bytes_mem_untagged_data_shared_allocated_compute_realtime.peak_sustained_elapsed.per_second    |
< |            . Units=(bytes/gpc_cycles/seconds). Numpass=1.                    |
< |     :stat=avg                                                                |
< |            Mandatory stat qualifier [avg, max, min, sum]                     |
< |     :device=0                                                                |
< |            Mandatory device qualifier [0]                                    |
< --------------------------------------------------------------------------------
15548,15554d15533
< | cuda:::TPC.TriageCompute.tpc__l1tex_sram_bytes_mem_untagged_data_shared_allocated_compute_realtime.peak_sustained_region.per_second    |
< |            . Units=(bytes/gpc_cycles/seconds). Numpass=1.                    |
< |     :stat=avg                                                                |
< |            Mandatory stat qualifier [avg, max, min, sum]                     |
< |     :device=0                                                                |
< |            Mandatory device qualifier [0]                                    |
< --------------------------------------------------------------------------------
15772,15778d15750
< | cuda:::TPC.TriageCompute.tpc__l1tex_sram_lines_mem_untagged_data_shared_allocated_compute_realtime.peak_sustained_active.per_second    |
< |            . Units=(gpc_cycles/l1tex_lines/seconds). Numpass=1.              |
< |     :stat=avg                                                                |
< |            Mandatory stat qualifier [avg, max, min, sum]                     |
< |     :device=0                                                                |
< |            Mandatory device qualifier [0]                                    |
< --------------------------------------------------------------------------------
15786,15792d15757
< | cuda:::TPC.TriageCompute.tpc__l1tex_sram_lines_mem_untagged_data_shared_allocated_compute_realtime.peak_sustained_elapsed.per_second    |
< |            . Units=(gpc_cycles/l1tex_lines/seconds). Numpass=1.              |
< |     :stat=avg                                                                |
< |            Mandatory stat qualifier [avg, max, min, sum]                     |
< |     :device=0                                                                |
< |            Mandatory device qualifier [0]                                    |
< --------------------------------------------------------------------------------
15814,15820d15778
< | cuda:::TPC.TriageCompute.tpc__l1tex_sram_lines_mem_untagged_data_shared_allocated_compute_realtime.peak_sustained_region.per_second    |
< |            . Units=(gpc_cycles/l1tex_lines/seconds). Numpass=1.              |
< |     :stat=avg                                                                |
< |            Mandatory stat qualifier [avg, max, min, sum]                     |
< |     :device=0                                                                |
< |            Mandatory device qualifier [0]                                    |
< --------------------------------------------------------------------------------
527886c527844
< Total events reported: 64793
---
> Total events reported: 64787

This makes sense because these events were excluded from the event table due to the string truncation errors that this PR addresses. Additionally, the component tests behave as expected.

Copy link
Contributor

@dbarry9 dbarry9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Treece-Burgess Can you please rebase this PR so that we can merge it? I have attached to this message what I believe src/components/cuda/cupti_profiler.c should be to resolve the merge conflict.
cupti_profiler.c

@Treece-Burgess Treece-Burgess force-pushed the 11-20-2025-cuda-increase-string-length branch from 6d72bc0 to c006d15 Compare January 8, 2026 18:16
@Treece-Burgess Treece-Burgess merged commit e990fca into icl-utk-edu:master Jan 8, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component-cuda PRs and Issues related to the cuda component status-ready-for-review PR is ready to be reviewed type-bug Issues discussing bugs or PRs fixing bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants