Skip to content

nvml: Update getDeviceFieldValue to check value of nvmlReturn and use correct valueType#564

Open
Treece-Burgess wants to merge 1 commit intoicl-utk-edu:masterfrom
Treece-Burgess:02-16-2026-nvml-use-correct-value-type
Open

nvml: Update getDeviceFieldValue to check value of nvmlReturn and use correct valueType#564
Treece-Burgess wants to merge 1 commit intoicl-utk-edu:masterfrom
Treece-Burgess:02-16-2026-nvml-use-correct-value-type

Conversation

@Treece-Burgess
Copy link
Contributor

@Treece-Burgess Treece-Burgess commented Feb 16, 2026

Pull Request Description

This PR updates the function getDeviceFieldValue to:

  1. Properly check the value of the member variable nvmlReturn as it must be checked before looking at value as value is undefined if nvmlReturn != NVML_SUCCESS.
  2. Account for the different valueType as there are multiple:
NVML_VALUE_TYPE_DOUBLE = 0
NVML_VALUE_TYPE_UNSIGNED_INT = 1
NVML_VALUE_TYPE_UNSIGNED_LONG = 2
NVML_VALUE_TYPE_UNSIGNED_LONG_LONG = 3
NVML_VALUE_TYPE_SIGNED_LONG_LONG = 4
NVML_VALUE_TYPE_SIGNED_INT = 5
NVML_VALUE_TYPE_UNSIGNED_SHORT = 6

Due to not checking the valueType there were cases when extremely large numbers would output for nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power:

[ICL:methane bin]$ ./papi_command_line nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power

This utility lets you add events from the command line interface to see if they work.

Successfully added: nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power

nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power : 	4991201436214789335 mW

Testing

Setup

Testing was done on Methane at ICL with:

  • OS: RHEL 9.6
  • CPU: Intel Xeon Gold 6140
  • GPU: 1 * A100
  • Cuda Toolkit: 12.9

Results

  • PAPI build: ✅
  • PAPI utilities*: ✅
# Sanity checking nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power via 
# papi_command_line did not show an extremely large value after 10 runs:

[ICL:methane bin]$ ./papi_command_line nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power

This utility lets you add events from the command line interface to see if they work.

Successfully added: nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power

nvml:::NVIDIA_A100-PCIE-40GB:device_0:gpu_inst_power : 	33925 mW

  • nvml HelloWorld.cu test: ✅

* - papi_component_avail, papi_native_avail, and papi_command_line

Author Checklist

  • Description
    Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
  • Commits
    Commits are self contained and only do one thing
    Commits have a header of the form: module: short description
    Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
  • Tests
    The PR needs to pass all the tests

@Treece-Burgess Treece-Burgess added type-bug Issues discussing bugs or PRs fixing bugs component-nvml PRs and Issues related to the nvml component status-ready-for-review PR is ready to be reviewed labels Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component-nvml PRs and Issues related to the nvml component status-ready-for-review PR is ready to be reviewed type-bug Issues discussing bugs or PRs fixing bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant