Skip to content

[Issue]: Exception caught: map::at: key not found when iGPU is not disabled #208

@systems-assistant

Description

@systems-assistant

[Migrated from original issue] ROCm/rocm_smi_lib#220

Original issue author: @AngryLoki

Problem Description

Hi,

There is a small but annoying issue in rocm-smi, when it attempts to output stats about iGPU:

Exception caught: map::at:  key not found
========================================== ROCm System Management Interface ==========================================
==================================================== Concise Info ====================================================
Device  Node  IDs              Temp    Power    Partitions          SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Avg)    (Mem, Compute, ID)
======================================================================================================================
0       1     0x744c,   57491  36.0°C  85.0W    N/A, N/A, 0         422Mhz  96Mhz    0%   auto  291.0W  0%     14%
1       2     0x164e,   16626  45.0°C  34.233W  N/A, N/A, 0         N/A     2400Mhz  0%   auto  N/A     3%     0%
======================================================================================================================
================================================ End of ROCm SMI Log =================================================

I. e. with any options it outputs Exception caught: map::at: key not found. Here is why:

  1. setVoltSensorLabelMap is populated for exactly one sensor -- vddgfx.

  2. It enumerates in#_label files

for f in /sys/class/drm/renderD*/device/hwmon/hwmon*/in*_label; do echo "$f:"; cat "$f"; done

/sys/class/drm/renderD128/device/hwmon/hwmon4/in0_label:
vddgfx

/sys/class/drm/renderD129/device/hwmon/hwmon5/in0_label:
vddgfx
/sys/class/drm/renderD129/device/hwmon/hwmon5/in1_label:
vddnb

where renderD128 is gfx1100 and renderD129 is gfx1036 (iGPU). Note: sensor 1 is vddnb voltage for the north bridge.

  1. it tries to process get_supported_sensors 0 and 1 for iGPU
    https://github.com/ROCm/rocm_smi_lib/blob/ff7561607ef47829940f87b140421fdb4934a0a0/src/rocm_smi_monitor.cc#L633-L634

  2. getVoltSensorEnum throws map::at: key not found exception, as there is no vddnb (1) voltage type in index_volt_type_map_.

A simple solution that fixes the issue is to register vddnb voltage type. Please check the attached pull-request.

Operating System

Gentoo

CPU

GPU

gfx1036 + gfx1100

ROCm Version

ROCm 6.4.1

ROCm Component

rocm_smi_lib

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions