Skip to content

[SWDEV-545128] Implement some changes in AMD-SMI to aggregate metric from primary and secondary KFD processes#2554

Merged
JeniferC99 merged 4 commits intodevelopfrom
users/koushikbillakanti-amd/amdsmi_changes_for_kfd
Feb 18, 2026
Merged

[SWDEV-545128] Implement some changes in AMD-SMI to aggregate metric from primary and secondary KFD processes#2554
JeniferC99 merged 4 commits intodevelopfrom
users/koushikbillakanti-amd/amdsmi_changes_for_kfd

Conversation

@koushikbillakanti-amd
Copy link
Contributor

Motivation

This PR adds support for parsing per-GPU KFD process files that use the _NNNN suffix naming convention (e.g., vram_46775, stats_46775/).
The ROCm kernel driver exposes per-GPU process statistics in /sys/class/kfd/kfd/proc/PID/ with GPU-specific file suffixes.
Without this change, AMD-SMI cannot read process memory usage and statistics on systems using this KFD file structure.

Technical Details

Modified src/rocm_smi_kfd.cc and rocm_smi_kfd.cc to parse files with _NNNN GPU ID suffixes.
Added logic to read cu_occupancy and evicted_ms from stats_NNNN/ subdirectories under each process.
The implementation maintains backward compatibility with older KFD structures while supporting the new per-GPU naming convention.

JIRA ID

Resolves [SWDEV-545128]

Test Plan

Built AMD-SMI from source and ran amd-smi process command with active GPU workloads.
Verified KFD directory structure parsing by inspecting /sys/class/kfd/kfd/proc/PID/ contents.
Tested both text and JSON output formats to confirm correct data retrieval.

Test Result

Process detection working correctly - PID, process name, and memory usage displayed.
Per-GPU stats (cu_occupancy=0, evicted_ms=0) read successfully from stats_46775/ directory.
All tests passed on MI300A GPU with KFD_ID 46775.

Submission Checklist

@koushikbillakanti-amd koushikbillakanti-amd force-pushed the users/koushikbillakanti-amd/amdsmi_changes_for_kfd branch 3 times, most recently from bc74e16 to 6acf4a1 Compare January 20, 2026 19:58
@koushikbillakanti-amd
Copy link
Contributor Author

/AzurePipelines run rocm-ci-caller

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@adam360x adam360x force-pushed the users/koushikbillakanti-amd/amdsmi_changes_for_kfd branch from 6acf4a1 to 2bafe43 Compare January 21, 2026 15:51
@adam360x adam360x requested a review from a team as a code owner January 21, 2026 15:51
@koushikbillakanti-amd
Copy link
Contributor Author

/AzurePipelines run rocm-ci-caller

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@adam360x adam360x force-pushed the users/koushikbillakanti-amd/amdsmi_changes_for_kfd branch from 2bafe43 to 15882e2 Compare January 23, 2026 10:20
Koushik Billakanti and others added 2 commits January 27, 2026 14:18
@adam360x adam360x force-pushed the users/koushikbillakanti-amd/amdsmi_changes_for_kfd branch from 15882e2 to 00d5d1f Compare January 27, 2026 20:18
Copy link
Contributor

@oliveiradan oliveiradan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koushikbillakanti-amd, please take a look at my comments and let me know if you have any questions.

Copy link
Contributor

@oliveiradan oliveiradan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koushikbillakanti-amd, some notes are in.

@JeniferC99 JeniferC99 merged commit d95d4fe into develop Feb 18, 2026
33 of 35 checks passed
@JeniferC99 JeniferC99 deleted the users/koushikbillakanti-amd/amdsmi_changes_for_kfd branch February 18, 2026 21:56
amd-josnarlo pushed a commit that referenced this pull request Feb 23, 2026
…from primary and secondary KFD processes (#2554)

* [SWDEV-545128] Implement some changes in AMD-SMI to aggregate metrics from primary and secondary KFD processes

* Update rocm_smi_kfd.cc

* Resolving Daniel's review comments

* Latest updates on daniel's comments

---------

Co-authored-by: Koushik Billakanti <kbillaka@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants