-
Notifications
You must be signed in to change notification settings - Fork 124
[SWDEV-550669] amdsmi: soft-depend on libdrm #2987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[SWDEV-550669] amdsmi: soft-depend on libdrm #2987
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR removes the hard dependency on libDRM in the AMDSMI library, allowing it to function when libDRM is unavailable by falling back to alternative data sources (ROCm-SMI, KFD, sysfs). The changes standardize device path construction using DRM render minors and improve robustness when libDRM cannot be loaded.
Changes:
- Standardized sysfs path construction to use
renderD<N>format based on DRM render minor instead of the libDRM device path - Added fallback logic for BDF, ASIC info, VRAM info, and VBIOS info retrieval when libDRM is unavailable
- Optimized device identifier lookup by removing unnecessary iteration and adding bounds checking
- Fixed CLI formatting alignment issues when libDRM is unavailable
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| projects/amdsmi/src/amd_smi/amd_smi_utils.cc | Updated sysfs path construction to use renderD<N> format across multiple functions |
| projects/amdsmi/src/amd_smi/amd_smi_gpu_device.cc | Added fallback logic to retrieve BDF info from ROCm-SMI when libDRM is unavailable |
| projects/amdsmi/src/amd_smi/amd_smi.cc | Enhanced ASIC info, VRAM info, and VBIOS info functions with libDRM fallback handling and code refactoring |
| projects/amdsmi/rocm_smi/src/rocm_smi_device.cc | Optimized device identifier retrieval by replacing iteration with direct indexing |
| projects/amdsmi/amdsmi_cli/amdsmi_logger.py | Fixed formatting alignment in CLI default output |
| projects/amdsmi/CHANGELOG.md | Added release notes for version 7.12.0 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7af2b00 to
1289c1a
Compare
Make AMD SMI work reliably when libdrm is missing by switching internal sysfs lookups to DRM render minors (renderD<N>) and improving fallbacks. - Standardize SYSFS path construction around renderD<N> instead of libdrm paths - Optimize rsmi_dev_device_identifiers_get() (direct indexing + bounds check) - Fix GPU BDF retrieval without libdrm by falling back to ROCm-SMI/KFD - Fix board manufacturer string normalization for AMD vendor ID (0x1002) - Improve ASIC info behavior/caching: return available fields even without libdrm This improves correctness in minimal environments while keeping behavior intact when libdrm is available.
Signed-off-by: Charis Poag <[email protected]>
1289c1a to
303e5a6
Compare
Motivation
This PR improves AMD SMI behavior in environments where libdrm is not present or cannot be loaded. The primary goal is to keep core queries working (and keep CLI output stable) by using consistent DRM render-minor based sysfs paths and adding safe fallbacks, rather than treating missing libdrm as a hard failure.
Technical Details
JIRA ID
SWDEV-550669
Test Plan
Before changes: Get Golden Values (with libDRM)
amd-smi(default command)amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --rasamd-smi metric --clock --powerBefore changes: Display issues in a container (without libDRM)
amd-smi(default command)amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --rasamd-smi metric --clock --powerAfter changes: Functional sanity (with
libdrmavailable)amd-smi(default command)amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --rasamd-smi metric --clock --powerAfter changes: Functional sanity (without
libdrm)libdrmis not installed (or where loading it is prevented).amd-smi(default command)amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --rasamd-smi metric --clock --powerTest Result
Note: Device is in NPS1, DPX (to confirm partition nodes get partial data - when available). We will be primarily looking at gpu 0 and 1 (for simplicity).
libdrmavailable)libdrm)Submission Checklist