Skip to content

Conversation

@yalmusaf
Copy link
Contributor

Motivation

RDC error messages are vague as well as when creating fieldgroups you cannot add field ID's later.

Technical Details

changes were made to error code returned in multiple files.

JIRA ID

https://ontrack-internal.amd.com/browse/SWDEV-380626

Test Plan

N/A

Test Result

N/A

Submission Checklist

N/A

Screenshot 2026-01-21 140118

Copilot AI review requested due to automatic review settings January 29, 2026 20:13
@yalmusaf yalmusaf requested review from a team as code owners January 29, 2026 20:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves error reporting in the RDC (ROCm Data Center) library by introducing more specific error codes for group and field group operations, and removes deprecated driver reload functionality from the amdsmi CLI.

Changes:

  • Replaced generic RDC_ST_NOT_FOUND errors with specific RDC_ST_GROUP_NOT_FOUND and RDC_ST_FLDGROUP_NOT_FOUND error codes
  • Added new error status codes to RDC header, C++ implementation, and Python bindings
  • Removed deprecated --reload-driver option from amdsmi CLI and documented the removal

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
projects/rdc/include/rdc/rdc.h Defines new error status codes for group and field group not found scenarios
projects/rdc/rdc_libs/rdc/src/RdcGroupSettingsImpl.cc Updates error returns to use specific group/field group error codes
projects/rdc/rdc_libs/bootstrap/src/RdcBootStrap.cc Adds string representations for new error codes
projects/rdc/python_binding/rdc_bootstrap.py Adds Python enum values for new error codes
projects/amdsmi/amdsmi_cli/amdsmi_parser.py Removes deprecated driver reload CLI option
projects/amdsmi/CHANGELOG.md Documents removal of driver reload option

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

reset_power_cap_help = "Reset the PPT0 and PPT1 power capacity limit to max capable"
reset_gpu_clean_local_data_help = "Clean up local data in LDS/GPRs on a per partition basis"
reset_gpu_driver_help = "Triggers a chain that resets all GPU's"

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty lines should not have trailing whitespace. Remove the whitespace from this line.

Copilot uses AI. Check for mistakes.
# Add Baremetal and Virtual OS reset arguments
reset_exclusive_group.add_argument('-l', '--clean-local-data', action='store_true', required=False, help=reset_gpu_clean_local_data_help)
reset_exclusive_group.add_argument('-r', '--reload-driver', action='store_true', required=False, help=reset_gpu_driver_help)

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty lines should not have trailing whitespace. Remove the whitespace from this line.

Copilot uses AI. Check for mistakes.
Signed-off-by: yalmusaf_amdeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants