Skip to content

Conversation

@charis-poag-amd
Copy link
Contributor

@charis-poag-amd charis-poag-amd commented Jan 30, 2026

Motivation

This PR improves AMD SMI behavior in environments where libdrm is not present or cannot be loaded. The primary goal is to keep core queries working (and keep CLI output stable) by using consistent DRM render-minor based sysfs paths and adding safe fallbacks, rather than treating missing libdrm as a hard failure.

Technical Details

  • Standardizes internal SYSFS path construction to use DRM render minors in the renderD form for more consistent device mapping and sysfs access.
  • Improves device-identifier retrieval in the ROCm-SMI device layer by removing unnecessary iteration and adding bounds checking for invalid device_id inputs.
  • Fixes GPU BDF retrieval when libdrm is unavailable by falling back to ROCm-SMI/KFD-based paths and validating indices before dereferencing.
  • Fixes manufacturer name handling so AMD’s vendor ID (0x1002) is normalized to the expected “Advanced Micro Devices Inc. [AMD/ATI]” string in board info output.
  • Updates ASIC info retrieval to return partial information (and succeed) when libdrm isn’t installed, instead of failing as not supported.
  • Ensures amd-smi default output remains aligned when falling back to OS Kernel Version (when amdgpu Version can’t be queried via libdrm).

JIRA ID

SWDEV-550669

Test Plan

  • Before changes: Get Golden Values (with libDRM)

    • Run:
      • amd-smi (default command)
      • amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras
      • amd-smi metric --clock --power
    • Verify: N/A - use values to verify against with and without libDRM.
  • Before changes: Display issues in a container (without libDRM)

    • Run:
      • amd-smi (default command)
      • amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras
      • amd-smi metric --clock --power
    • Verify: N/A - use values to track fixes seen after libDRM soft dependency changes.
  • After changes: Functional sanity (with libdrm available)

    • Run:
      • amd-smi (default command)
      • amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras
      • amd-smi metric --clock --power
    • Verify expected values and alignment. Verify against "Golden Values" confirm no changes (except for alignment fixes).
  • After changes: Functional sanity (without libdrm)

    • Test in an environment/container where libdrm is not installed (or where loading it is prevented).
    • Run:
      • amd-smi (default command)
      • amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras
      • amd-smi metric --clock --power
    • Verify expected values and alignment. Verify against "Golden Values" & confirm the minimal partial data comes in as expected.

Test Result

Note: Device is in NPS1, DPX (to confirm partition nodes get partial data - when available). We will be primarily looking at gpu 0 and 1 (for simplicity).

  • Before changes: Get Golden Values (with libDRM)
image
$ amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras -g 0 1
GPU: 0
    ASIC:
        MARKET_NAME: AMD Instinct MI350X
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: 0x1002
        DEVICE_ID: 0x75a0
        SUBSYSTEM_ID: 0x75a0
        REV_ID: 0x00
        ASIC_SERIAL: <redacted>
        OAM_ID: 6
        NUM_COMPUTE_UNITS: 128
        TARGET_GRAPHICS_VERSION: gfx950
        FLAGS: 16
    BUS:
        BDF: 0000:05:00.0
        MAX_PCIE_WIDTH: 16
        MAX_PCIE_SPEED: 32 GT/s
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: Gen 5
        SLOT_TYPE: OAM
    IFWI:
        NAME: AMD MI350X
        BUILD_DATE: 2025/06/19 21:03
        PART_NUMBER: 113-M350-01-1K0-920A
        VERSION: 00154951
    DRIVER:
        NAME: amdgpu
        VERSION: 6.18.3
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: 102-G36212-0C
        PRODUCT_SERIAL: <redacted>
        FRU_ID: 113-AMDG362120C01-100-300000082
        PRODUCT_NAME: AMD Instinct MI350 OAM
        MANUFACTURER_NAME: AMD
    RAS:
        EEPROM_VERSION: 0x10000
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: DISABLED
        SINGLE_BIT_SCHEMA: DISABLED
        DOUBLE_BIT_SCHEMA: DISABLED
        POISON_SCHEMA: ENABLED
        ECC_BLOCK_STATE:
            UMC: ENABLED
            SDMA: ENABLED
            GFX: ENABLED
            MMHUB: ENABLED
            ATHUB: ENABLED
            PCIE_BIF: ENABLED
            HDP: ENABLED
            XGMI_WAFL: ENABLED
            DF: ENABLED
            SMN: ENABLED
            SEM: DISABLED
            MP0: ENABLED
            MP1: ENABLED
            FUSE: DISABLED
            MCA: ENABLED
            VCN: DISABLED
            JPEG: DISABLED
            IH: ENABLED
            MPIO: ENABLED
    VRAM:
        TYPE: HBM3E
        VENDOR: SAMSUNG
        SIZE: 258032 MB
        BIT_WIDTH: 8192
        MAX_BANDWIDTH: 6810 GB/s

GPU: 1
    ASIC:
        MARKET_NAME: AMD Instinct MI350X
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: N/A
        DEVICE_ID: 0x75a0
        SUBSYSTEM_ID: N/A
        REV_ID: 0x00
        ASIC_SERIAL: <redacted>
        OAM_ID: N/A
        NUM_COMPUTE_UNITS: 128
        TARGET_GRAPHICS_VERSION: gfx950
        FLAGS: 16
    BUS:
        BDF: 0000:05:00.1
        MAX_PCIE_WIDTH: N/A
        MAX_PCIE_SPEED: N/A
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: N/A
        SLOT_TYPE: N/A
    IFWI:
        NAME: AMD MI350X
        BUILD_DATE: 2025/06/19 21:03
        PART_NUMBER: 113-M350-01-1K0-920A
        VERSION: 023.040.001.008.000001
    DRIVER:
        NAME: amdgpu
        VERSION: 6.18.3
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: N/A
        PRODUCT_SERIAL: N/A
        FRU_ID: N/A
        PRODUCT_NAME: N/A
        MANUFACTURER_NAME: Advanced Micro Devices, Inc. [AMD/ATI]
    RAS:
        EEPROM_VERSION: N/A
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: N/A
        SINGLE_BIT_SCHEMA: N/A
        DOUBLE_BIT_SCHEMA: N/A
        POISON_SCHEMA: N/A
        ECC_BLOCK_STATE: N/A
    VRAM:
        TYPE: HBM3E
        VENDOR: UNKNOWN
        SIZE: 129016 MB
        BIT_WIDTH: 8192
        MAX_BANDWIDTH: N/A
$ amd-smi metric --power --clock -g 0 1                                          
GPU: 0
    POWER:
        SOCKET_POWER: 225 W
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: ENABLED
    CLOCK:
        GFX_0:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_1:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_2:
            CLK: 144 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_3:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_4:
            CLK: 168 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_5:
            CLK: 167 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_6:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_7:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        MEM_0:
            CLK: 1900 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: DISABLED
        VCLK_0:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_1:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_2:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_3:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        DCLK_0:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_1:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_2:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_3:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        FCLK_0:
            CLK: 1500 MHz
            MIN_CLK: 1250 MHz
            MAX_CLK: 1500 MHz
            DEEP_SLEEP: DISABLED
        SOCCLK_0:
            CLK: 40 MHz
            MIN_CLK: 1200 MHz
            MAX_CLK: 1200 MHz
            DEEP_SLEEP: ENABLED

GPU: 1
    POWER:
        SOCKET_POWER: N/A
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: N/A
    CLOCK: N/A
  • Before changes: Display issues in a container (without libDRM)
image
$ amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras -g 0 1
Fail to open libdrm_amdgpu.so.1: libdrm_amdgpu.so.1: cannot open shared object file: No such file or directory
GPU: 0
    ASIC:
        MARKET_NAME: N/A
        VENDOR_ID: N/A
        VENDOR_NAME: N/A
        SUBVENDOR_ID: N/A
        DEVICE_ID: N/A
        SUBSYSTEM_ID: N/A
        REV_ID: N/A
        ASIC_SERIAL: N/A
        OAM_ID: N/A
        NUM_COMPUTE_UNITS: N/A
        TARGET_GRAPHICS_VERSION: N/A
    BUS:
        BDF: 695f7570672f:31:05.7
        MAX_PCIE_WIDTH: N/A
        MAX_PCIE_SPEED: N/A
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: N/A
        SLOT_TYPE: N/A
    IFWI: N/A
    DRIVER:
        NAME: N/A
        VERSION: N/A
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: N/A
        PRODUCT_SERIAL: 692517010989
        FRU_ID: N/A
        PRODUCT_NAME: AMD Instinct MI350 OAM
        MANUFACTURER_NAME: 0x1002
    RAS:
        EEPROM_VERSION: 0x10000
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: DISABLED
        SINGLE_BIT_SCHEMA: DISABLED
        DOUBLE_BIT_SCHEMA: DISABLED
        POISON_SCHEMA: ENABLED
        ECC_BLOCK_STATE: N/A
    VRAM:
        TYPE: N/A
        VENDOR: N/A
        SIZE: N/A
        BIT_WIDTH: N/A
        MAX_BANDWIDTH: N/A

GPU: 1
    ASIC:
        MARKET_NAME: N/A
        VENDOR_ID: N/A
        VENDOR_NAME: N/A
        SUBVENDOR_ID: N/A
        DEVICE_ID: N/A
        SUBSYSTEM_ID: N/A
        REV_ID: N/A
        ASIC_SERIAL: N/A
        OAM_ID: N/A
        NUM_COMPUTE_UNITS: N/A
        TARGET_GRAPHICS_VERSION: N/A
    BUS:
        BDF: 0000:00:00.0
        MAX_PCIE_WIDTH: N/A
        MAX_PCIE_SPEED: N/A
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: N/A
        SLOT_TYPE: N/A
    IFWI: N/A
    DRIVER:
        NAME: N/A
        VERSION: N/A
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: N/A
        PRODUCT_SERIAL: N/A
        FRU_ID: N/A
        PRODUCT_NAME: N/A
        MANUFACTURER_NAME: 0x1002
    RAS:
        EEPROM_VERSION: N/A
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: N/A
        SINGLE_BIT_SCHEMA: N/A
        DOUBLE_BIT_SCHEMA: N/A
        POISON_SCHEMA: N/A
        ECC_BLOCK_STATE: N/A
    VRAM:
        TYPE: N/A
        VENDOR: N/A
        SIZE: N/A
        BIT_WIDTH: N/A
        MAX_BANDWIDTH: N/A
$ amd-smi metric --power --clock -g 0 1
Fail to open libdrm_amdgpu.so.1: libdrm_amdgpu.so.1: cannot open shared object file: No such file or directory
GPU: 0
    POWER:
        SOCKET_POWER: 225 W
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: N/A
    CLOCK:
        GFX_0:
            CLK: 145 MHz
            CLK_LOCKED: DISABLED
        GFX_1:
            CLK: 145 MHz
            CLK_LOCKED: DISABLED
        GFX_2:
            CLK: 144 MHz
            CLK_LOCKED: DISABLED
        GFX_3:
            CLK: 145 MHz
            CLK_LOCKED: DISABLED
        GFX_4:
            CLK: 146 MHz
            CLK_LOCKED: DISABLED
        GFX_5:
            CLK: 147 MHz
            CLK_LOCKED: DISABLED
        GFX_6:
            CLK: 145 MHz
            CLK_LOCKED: DISABLED
        GFX_7:
            CLK: 145 MHz
            CLK_LOCKED: DISABLED
        MEM_0:
            CLK: 1900 MHz
        VCLK_0:
            CLK: 59 MHz
        VCLK_1:
            CLK: 59 MHz
        VCLK_2:
            CLK: 59 MHz
        VCLK_3:
            CLK: 59 MHz
        DCLK_0:
            CLK: 48 MHz
        DCLK_1:
            CLK: 48 MHz
        DCLK_2:
            CLK: 48 MHz
        DCLK_3:
            CLK: 48 MHz
        FCLK_0:
            CLK: 1500 MHz
        SOCCLK_0:
            CLK: 38 MHz

GPU: 1
    POWER:
        SOCKET_POWER: N/A
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: N/A
    CLOCK: N/A
  • After changes: Functional sanity (with libdrm available)
image
$ amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras -g 0 1
GPU: 0
    ASIC:
        MARKET_NAME: AMD Instinct MI350X
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: 0x1002
        DEVICE_ID: 0x75a0
        SUBSYSTEM_ID: 0x75a0
        REV_ID: 0x00
        ASIC_SERIAL: <redacted>
        OAM_ID: 6
        NUM_COMPUTE_UNITS: 128
        TARGET_GRAPHICS_VERSION: gfx950
        FLAGS: 16
    BUS:
        BDF: 0000:05:00.0
        MAX_PCIE_WIDTH: 16
        MAX_PCIE_SPEED: 32 GT/s
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: Gen 5
        SLOT_TYPE: OAM
    IFWI:
        NAME: AMD MI350X
        BUILD_DATE: 2025/06/19 21:03
        PART_NUMBER: 113-M350-01-1K0-920A
        VERSION: 00154951
    DRIVER:
        NAME: amdgpu
        VERSION: 6.18.3
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: 102-G36212-0C
        PRODUCT_SERIAL: <redacted>
        FRU_ID: 113-AMDG362120C01-100-300000082
        PRODUCT_NAME: AMD Instinct MI350 OAM
        MANUFACTURER_NAME: AMD
    RAS:
        EEPROM_VERSION: 0x10000
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: DISABLED
        SINGLE_BIT_SCHEMA: DISABLED
        DOUBLE_BIT_SCHEMA: DISABLED
        POISON_SCHEMA: ENABLED
        ECC_BLOCK_STATE:
            UMC: ENABLED
            SDMA: ENABLED
            GFX: ENABLED
            MMHUB: ENABLED
            ATHUB: ENABLED
            PCIE_BIF: ENABLED
            HDP: ENABLED
            XGMI_WAFL: ENABLED
            DF: ENABLED
            SMN: ENABLED
            SEM: DISABLED
            MP0: ENABLED
            MP1: ENABLED
            FUSE: DISABLED
            MCA: ENABLED
            VCN: DISABLED
            JPEG: DISABLED
            IH: ENABLED
            MPIO: ENABLED
    VRAM:
        TYPE: HBM3E
        VENDOR: SAMSUNG
        SIZE: 258032 MB
        BIT_WIDTH: 8192
        MAX_BANDWIDTH: 6810 GB/s

GPU: 1
    ASIC:
        MARKET_NAME: AMD Instinct MI350X
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: N/A
        DEVICE_ID: 0x75a0
        SUBSYSTEM_ID: N/A
        REV_ID: 0x00
        ASIC_SERIAL: <redacted>
        OAM_ID: N/A
        NUM_COMPUTE_UNITS: 128
        TARGET_GRAPHICS_VERSION: gfx950
        FLAGS: 16
    BUS:
        BDF: 0000:05:00.1
        MAX_PCIE_WIDTH: N/A
        MAX_PCIE_SPEED: N/A
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: N/A
        SLOT_TYPE: N/A
    IFWI:
        NAME: AMD MI350X
        BUILD_DATE: 2025/06/19 21:03
        PART_NUMBER: 113-M350-01-1K0-920A
        VERSION: 023.040.001.008.000001
    DRIVER:
        NAME: amdgpu
        VERSION: 6.18.3
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: N/A
        PRODUCT_SERIAL: N/A
        FRU_ID: N/A
        PRODUCT_NAME: N/A
        MANUFACTURER_NAME: Advanced Micro Devices, Inc. [AMD/ATI]
    RAS:
        EEPROM_VERSION: N/A
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: N/A
        SINGLE_BIT_SCHEMA: N/A
        DOUBLE_BIT_SCHEMA: N/A
        POISON_SCHEMA: N/A
        ECC_BLOCK_STATE: N/A
    VRAM:
        TYPE: HBM3E
        VENDOR: UNKNOWN
        SIZE: 129016 MB
        BIT_WIDTH: 8192
        MAX_BANDWIDTH: N/A
$ amd-smi metric --power --clock -g 0 1
GPU: 0
    POWER:
        SOCKET_POWER: 225 W
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: ENABLED
    CLOCK:
        GFX_0:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_1:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_2:
            CLK: 144 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_3:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_4:
            CLK: 159 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_5:
            CLK: 160 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_6:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_7:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        MEM_0:
            CLK: 1900 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: DISABLED
        VCLK_0:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_1:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_2:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_3:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        DCLK_0:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_1:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_2:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_3:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        FCLK_0:
            CLK: 1500 MHz
            MIN_CLK: 1250 MHz
            MAX_CLK: 1500 MHz
            DEEP_SLEEP: DISABLED
        SOCCLK_0:
            CLK: 40 MHz
            MIN_CLK: 1200 MHz
            MAX_CLK: 1200 MHz
            DEEP_SLEEP: ENABLED

GPU: 1
    POWER:
        SOCKET_POWER: N/A
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: N/A
    CLOCK: N/A
  • After changes: Functional sanity (without libdrm)
image
$ amd-smi static --bus --ifwi --asic --board --vbios --driver --vram --ras -g 0 1
Fail to open libdrm_amdgpu.so.1: libdrm_amdgpu.so.1: cannot open shared object file: No such file or directory
GPU: 0
    ASIC:
        MARKET_NAME: AMD Instinct MI350 OAM
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: 0x1002
        DEVICE_ID: 0x75a0
        SUBSYSTEM_ID: 0x75a0
        REV_ID: 0x00
        ASIC_SERIAL: <redacted>
        OAM_ID: 6
        NUM_COMPUTE_UNITS: 128
        TARGET_GRAPHICS_VERSION: gfx950
        FLAGS: 0
    BUS:
        BDF: 0000:05:00.0
        MAX_PCIE_WIDTH: 16
        MAX_PCIE_SPEED: 32 GT/s
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: Gen 5
        SLOT_TYPE: OAM
    IFWI:
        NAME: N/A
        BUILD_DATE: N/A
        PART_NUMBER: 113-M350-01-1K0-920A
        VERSION: 00154951
    DRIVER:
        NAME: N/A
        VERSION: N/A
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: 102-G36212-0C
        PRODUCT_SERIAL: <redacted>
        FRU_ID: 113-AMDG362120C01-100-300000082
        PRODUCT_NAME: AMD Instinct MI350 OAM
        MANUFACTURER_NAME: AMD
    RAS:
        EEPROM_VERSION: 0x10000
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: DISABLED
        SINGLE_BIT_SCHEMA: DISABLED
        DOUBLE_BIT_SCHEMA: DISABLED
        POISON_SCHEMA: ENABLED
        ECC_BLOCK_STATE:
            UMC: ENABLED
            SDMA: ENABLED
            GFX: ENABLED
            MMHUB: ENABLED
            ATHUB: ENABLED
            PCIE_BIF: ENABLED
            HDP: ENABLED
            XGMI_WAFL: ENABLED
            DF: ENABLED
            SMN: ENABLED
            SEM: DISABLED
            MP0: ENABLED
            MP1: ENABLED
            FUSE: DISABLED
            MCA: ENABLED
            VCN: DISABLED
            JPEG: DISABLED
            IH: ENABLED
            MPIO: ENABLED
    VRAM:
        TYPE: UNKNOWN
        VENDOR: SAMSUNG
        SIZE: 258032 MB
        BIT_WIDTH: N/A
        MAX_BANDWIDTH: 6810 GB/s

GPU: 1
    ASIC:
        MARKET_NAME: N/A
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: N/A
        DEVICE_ID: 0x75a0
        SUBSYSTEM_ID: N/A
        REV_ID: 0x00
        ASIC_SERIAL: <redacted>
        OAM_ID: N/A
        NUM_COMPUTE_UNITS: 128
        TARGET_GRAPHICS_VERSION: gfx950
        FLAGS: 0
    BUS:
        BDF: 0000:05:00.1
        MAX_PCIE_WIDTH: N/A
        MAX_PCIE_SPEED: N/A
        PCIE_LEVELS: N/A
        PCIE_INTERFACE_VERSION: N/A
        SLOT_TYPE: N/A
    IFWI: N/A
    DRIVER:
        NAME: N/A
        VERSION: N/A
        OS_KERNEL_VERSION: 5.15.0-164-generic
    BOARD:
        MODEL_NUMBER: N/A
        PRODUCT_SERIAL: N/A
        FRU_ID: N/A
        PRODUCT_NAME: N/A
        MANUFACTURER_NAME: Advanced Micro Devices Inc. [AMD/ATI]
    RAS:
        EEPROM_VERSION: N/A
        BAD_PAGE_THRESHOLD: N/A
        BAD_PAGE_THRESHOLD_EXCEEDED: N/A
        PARITY_SCHEMA: N/A
        SINGLE_BIT_SCHEMA: N/A
        DOUBLE_BIT_SCHEMA: N/A
        POISON_SCHEMA: N/A
        ECC_BLOCK_STATE: N/A
    VRAM:
        TYPE: UNKNOWN
        VENDOR: UNKNOWN
        SIZE: 129016 MB
        BIT_WIDTH: N/A
        MAX_BANDWIDTH: N/A
$ amd-smi metric --power --clock -g 0 1                     
Fail to open libdrm_amdgpu.so.1: libdrm_amdgpu.so.1: cannot open shared object file: No such file or directory
GPU: 0
    POWER:
        SOCKET_POWER: 225 W
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: ENABLED
    CLOCK:
        GFX_0:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_1:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_2:
            CLK: 144 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_3:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_4:
            CLK: 172 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_5:
            CLK: 171 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_6:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        GFX_7:
            CLK: 145 MHz
            MIN_CLK: 500 MHz
            MAX_CLK: 2200 MHz
            CLK_LOCKED: DISABLED
            DEEP_SLEEP: ENABLED
        MEM_0:
            CLK: 1900 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: DISABLED
        VCLK_0:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_1:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_2:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        VCLK_3:
            CLK: 59 MHz
            MIN_CLK: 1900 MHz
            MAX_CLK: 1900 MHz
            DEEP_SLEEP: ENABLED
        DCLK_0:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_1:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_2:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        DCLK_3:
            CLK: 48 MHz
            MIN_CLK: 1520 MHz
            MAX_CLK: 1520 MHz
            DEEP_SLEEP: ENABLED
        FCLK_0:
            CLK: 1500 MHz
            MIN_CLK: 1250 MHz
            MAX_CLK: 1500 MHz
            DEEP_SLEEP: DISABLED
        SOCCLK_0:
            CLK: 38 MHz
            MIN_CLK: 1200 MHz
            MAX_CLK: 1200 MHz
            DEEP_SLEEP: ENABLED

GPU: 1
    POWER:
        SOCKET_POWER: N/A
        GFX_VOLTAGE: N/A
        SOC_VOLTAGE: N/A
        MEM_VOLTAGE: N/A
        THROTTLE_STATUS: N/A
        POWER_MANAGEMENT: N/A
    CLOCK: N/A

Submission Checklist

Copilot AI review requested due to automatic review settings January 30, 2026 17:19
@charis-poag-amd charis-poag-amd requested a review from a team as a code owner January 30, 2026 17:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the hard dependency on libDRM in the AMDSMI library, allowing it to function when libDRM is unavailable by falling back to alternative data sources (ROCm-SMI, KFD, sysfs). The changes standardize device path construction using DRM render minors and improve robustness when libDRM cannot be loaded.

Changes:

  • Standardized sysfs path construction to use renderD<N> format based on DRM render minor instead of the libDRM device path
  • Added fallback logic for BDF, ASIC info, VRAM info, and VBIOS info retrieval when libDRM is unavailable
  • Optimized device identifier lookup by removing unnecessary iteration and adding bounds checking
  • Fixed CLI formatting alignment issues when libDRM is unavailable

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
projects/amdsmi/src/amd_smi/amd_smi_utils.cc Updated sysfs path construction to use renderD<N> format across multiple functions
projects/amdsmi/src/amd_smi/amd_smi_gpu_device.cc Added fallback logic to retrieve BDF info from ROCm-SMI when libDRM is unavailable
projects/amdsmi/src/amd_smi/amd_smi.cc Enhanced ASIC info, VRAM info, and VBIOS info functions with libDRM fallback handling and code refactoring
projects/amdsmi/rocm_smi/src/rocm_smi_device.cc Optimized device identifier retrieval by replacing iteration with direct indexing
projects/amdsmi/amdsmi_cli/amdsmi_logger.py Fixed formatting alignment in CLI default output
projects/amdsmi/CHANGELOG.md Added release notes for version 7.12.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@charis-poag-amd charis-poag-amd force-pushed the users/charis-poag-amd/SWDEV-550669_libDRMNotInstalledFixes branch from 7af2b00 to 1289c1a Compare January 30, 2026 17:38
Make AMD SMI work reliably when libdrm is missing by switching internal sysfs
lookups to DRM render minors (renderD<N>) and improving fallbacks.

- Standardize SYSFS path construction around renderD<N> instead of libdrm paths
- Optimize rsmi_dev_device_identifiers_get() (direct indexing + bounds check)
- Fix GPU BDF retrieval without libdrm by falling back to ROCm-SMI/KFD
- Fix board manufacturer string normalization for AMD vendor ID (0x1002)
- Improve ASIC info behavior/caching: return available fields even without libdrm

This improves correctness in minimal environments while keeping behavior intact
when libdrm is available.
@charis-poag-amd charis-poag-amd force-pushed the users/charis-poag-amd/SWDEV-550669_libDRMNotInstalledFixes branch from 1289c1a to 303e5a6 Compare January 30, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants