Skip to content

Commit 02941d9

Browse files
Dong Jun WounDong Jun Woun
authored andcommitted
amd_smi: AMD GPU System Management Interface via AMD SMI library.
1 parent 7338010 commit 02941d9

21 files changed

+9343
-0
lines changed

src/components/amd_smi/README.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# AMD_SMI Component
2+
3+
The **AMD_SMI** (AMD System Management Interface) component exposes hardware
4+
management counters (and selected controls) for AMD GPUs — e.g., power usage,
5+
temperatures, clocks, PCIe link metrics, VRAM information, and RAS/ECC status —
6+
by querying the AMD SMI library at runtime (ROCm ≥ 6.3.4).
7+
8+
- [Environment Variables](#environment-variables)
9+
- [Enabling the AMD_SMI Component](#enabling-the-amd_smi-component)
10+
11+
---
12+
13+
## Environment Variables
14+
15+
For AMD_SMI, PAPI requires the environment variable `PAPI_AMDSMI_ROOT` to be set
16+
so that the AMD SMI shared library and headers can be found. This variable is
17+
required at both **compile** and **run** time.
18+
19+
There is a single case to consider (AMD SMI is available on ROCm ≥ 6.0):
20+
21+
1. **For ROCm versions 6.0 and newer:**
22+
Set `PAPI_AMDSMI_ROOT` to the top-level ROCm directory. For example:
23+
24+
```bash
25+
export PAPI_AMDSMI_ROOT=/opt/rocm-6.4.0
26+
# or
27+
export PAPI_AMDSMI_ROOT=/opt/rocm
28+
```
29+
30+
The directory specified by `PAPI_AMDSMI_ROOT` **must contain** the following
31+
subdirectories:
32+
33+
- `PAPI_AMDSMI_ROOT/lib` (which should include the dynamic library `libamd_smi.so`)
34+
- `PAPI_AMDSMI_ROOT/include/amd_smi` (AMD SMI headers)
35+
36+
If the library is not found or is not functional at runtime, the component will
37+
appear as "disabled" in `papi_component_avail`, with a message describing the
38+
problem (e.g., library not found).
39+
40+
---
41+
42+
## Enabling the AMD_SMI Component
43+
44+
To enable reading (and where supported, writing) of AMD_SMI counters, build
45+
PAPI with this component enabled. For example:
46+
47+
```bash
48+
./configure --with-components="amd_smi"
49+
make
50+
```
51+
52+
You can verify availability with the utilities in `papi/src/utils/`:
53+
54+
```bash
55+
papi_component_avail # shows enabled/disabled components
56+
papi_native_avail -i amd_smi # lists native events for this component
57+
```
58+
59+
---
60+
61+
## File-by-file Summary
62+
63+
- **`linux-amd-smi.c`**
64+
Declares the `papi_vector_t` for this component; initializes on first use; hands off work to `amds_*` for device/event management; implements PAPI hooks (`init_component`, `update_control_state`, `start`, `read`, `stop`, `reset`, `shutdown`, and native-event queries).
65+
66+
- **`amds.c`**
67+
Dynamically loads `libamd_smi.so`, resolves AMD SMI symbols, discovers sockets/devices, and **builds the native event table**. Defines helpers to add simple and counter-based events. Manages global teardown (destroy event table, close library).
68+
69+
- **`amds_accessors.c`**
70+
Implements the **accessors** that read/write individual metrics (e.g., temperatures, fans, PCIe, energy, power caps, RAS/ECC, clocks, VRAM, link topology, XGMI/PCIe metrics, firmware/board info, etc.). Each accessor maps an event’s `(variant, subvariant)` to the right SMI call and returns the value.
71+
72+
- **`amds_ctx.c`**
73+
Provides the **per-eventset context**:
74+
- `amds_ctx_open/close` — acquire/release devices, run per-event open/close hooks.
75+
- `amds_ctx_start/stop` — start/stop counters where needed.
76+
- `amds_ctx_read/write/reset` — read current values, optionally write supported controls (e.g., power cap), zero software view.
77+
78+
- **`amds_evtapi.c`**
79+
Implements native-event enumeration for PAPI (`enum`, `code_to_name`, `name_to_code`, `code_to_descr`) using the in-memory event table and a small hash map for fast lookups.
80+
81+
- **`amds_priv.h`**
82+
Internal definitions: `native_event_t` (name/descr/device/mode/value + open/close/start/stop/access callbacks), global getters, and the AMD SMI function-pointer declarations (via `amds_funcs.h`).
83+
84+
- **`amds_funcs.h`**
85+
Centralized macro list of **AMD SMI APIs** used by the component; generates function-pointer declarations/definitions so `amds.c` can `dlsym()` them at runtime. Conditional entries handle newer SMI features.
86+
87+
- **`htable.h`**
88+
Minimal chained hash table for **name→event** mapping; used by `amds_evtapi.c` to resolve native event names quickly.
89+
90+
- **`amds.h`**
91+
Public, component-internal API across files: init/shutdown, native-event queries, context ops, and error-string retrieval.
92+
93+
- **`Rules.amd_smi`**
94+
Build integration for PAPI’s make system; compiles this component and sets include/library paths for AMD SMI.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Set default if the root environment variable is not already set.
2+
# Note PAPI_AMDSMI_ROOT is an environment variable that must be set.
3+
# There are four other environment variables that must be exported
4+
# for runtime operation; see the README file.
5+
6+
PAPI_AMDSMI_ROOT ?= /opt/rocm
7+
8+
# There is one library used by the AMD_SMI component: libamd_smi64.so
9+
# By default, the software tries to find this in system paths, including
10+
# those listed in the environment variable LD_LIBRARY_PATH. If not found
11+
# there it looks in $(PAPI_AMDSMI_ROOT)/lib/libamd_smi64.so
12+
13+
# However, this can be overridden by exporting PAPI_AMD_SMI_LIB as
14+
# something else. It would still need to be a full path and library name.
15+
# If it is exported, it must work or the component will be disabled. e.g.
16+
# export PAPI_AMD_SMI_LIB=$(PAPI_AMD_SMI_LIB)/lib/libamd_smi64.so
17+
# This allows users to overcome non-standard ROCM installs or specify
18+
# specific version of the libamd_smi64.so library.
19+
20+
# PAPI_AMDSMI_ROOT is used at both at compile time and run time.
21+
22+
# There are many ways to cause this path to be known. Spack is a package
23+
# manager used on supercomputers, Linux and MacOS. If Spack is aware of ROCM,
24+
# it encodes the paths to the necessary libraries.
25+
26+
# The environment variable LD_LIBRARY_PATH encodes a list of paths to
27+
# search for libraries; separated by a colon (:). New paths can be
28+
# added to LD_LIBRARY_PATH.
29+
#
30+
# Warning: LD_LIBRARY_PATH often contains directories that apply to other
31+
# installed packages you may be using. Always add to LD_LIBRARY_PATH
32+
# recursively; for example:
33+
34+
# >export LD_LIBRARY_PATH=someNewLibraryDirectory:$LD_LIBRARY_PATH which would
35+
# append the existing LD_LIBRARY_PATH to the new directory you wish to add.
36+
# Alternatively, you can prepend it:
37+
# >export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:someNewLibraryDirectory Which will
38+
# search the existing libraries first, then your new directory.
39+
40+
# You can check on the value of LD_LIBRARY_PATH with
41+
# >echo $LD_LIBRARY_PATH
42+
43+
# There may be other package managers or utilities, for example on a system
44+
# with modules; the command 'module load rocm' may modify LD_LIBRARY_PATH.
45+
46+
# A Linux system will also search for libraries by default in the directories
47+
# listed by /etc/ld.so.conf, and /usr/lib64, /lib64, /usr/lib, /lib.
48+
49+
# Note: If you change the exports, PAPI should be rebuilt from scratch; see
50+
# note below.
51+
52+
# Note: AMD_SMI is typically provided with the ROCM libraries, but in PAPI
53+
# ROCM and AMD_SMI are treated as separate components, and must be given
54+
# separately on the configure option --with-components. e.g.
55+
56+
# From within the papi/src/ director:
57+
# make clobber
58+
# ./configure --with-components="amd_smi"
59+
# make
60+
61+
# An alternative, for both rocm and amd_smi components:
62+
# ./configure --with-components="rocm amd_smi"
63+
64+
# OPERATION, per library:
65+
# 1) If an override is not empty, we will use it explicitly and fail if it
66+
# does not work. This means disabling the component; a reason for disabling
67+
# is shown using the papi utility, papi/src/utils/papi_component_avail
68+
69+
# 2) We will attempt to open the library using the normal system library search
70+
# paths; if Spack is present and configured correctly it should deliver the
71+
# proper library. A failure here will be silent; we will proceed to (3).
72+
73+
# 3) If that fails, we will try to find the library in the standard installed
74+
# locations listed above. If this fails, we disable the component, the reason
75+
# for disabling is shown using the papi utility,
76+
# papi/src/utils/papi_component_avail.
77+
78+
COMPSRCS += components/amd_smi/amds.c \
79+
components/amd_smi/linux-amd-smi.c \
80+
components/amd_smi/amds_accessors.c \
81+
components/amd_smi/amds_evtapi.c \
82+
components/amd_smi/amds_ctx.c
83+
COMPOBJS += amds.o \
84+
linux-amd-smi.o \
85+
amds_accessors.o \
86+
amds_evtapi.o \
87+
amds_ctx.o
88+
89+
# CFLAGS specifies compile flags; need include files here, and macro defines.
90+
# Where to find amd_smi.h varied in early ROCM releases. If it changes again,
91+
# for backward compatibility add *more* -I paths, do not just replace this one.
92+
93+
CFLAGS += -I$(PAPI_AMDSMI_ROOT)/include/amd_smi
94+
CFLAGS += -I$(PAPI_AMDSMI_ROOT)/include
95+
CFLAGS += -g
96+
LDFLAGS += $(LDL) -g
97+
98+
linux-amd-smi.o: components/amd_smi/linux-amd-smi.c $(HEADERS)
99+
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/amd_smi/linux-amd-smi.c -o linux-amd-smi.o
100+
101+
amds.o: components/amd_smi/amds.c $(HEADERS)
102+
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/amd_smi/amds.c -o amds.o
103+
104+
amds_accessors.o: components/amd_smi/amds_accessors.c $(HEADERS)
105+
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/amd_smi/amds_accessors.c -o amds_accessors.o
106+
107+
amds_evtapi.o: components/amd_smi/amds_evtapi.c $(HEADERS)
108+
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/amd_smi/amds_evtapi.c -o amds_evtapi.o
109+
110+
amds_ctx.o: components/amd_smi/amds_ctx.c $(HEADERS)
111+
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/amd_smi/amds_ctx.c -o amds_ctx.o

0 commit comments

Comments
 (0)