Skip to content

Commit 8c9f531

Browse files
Merge pull request #479 from Treece-Burgess/07-25-25-events-and-metrics-api-support
cuda: Event and Metric API Support
2 parents c4227a2 + 1a8d4f9 commit 8c9f531

14 files changed

+2441
-282
lines changed

src/components/cuda/README.md

Lines changed: 65 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,93 @@
1-
# CUDA Component
1+
# Cuda Component
22

3-
The CUDA component exposes counters and controls for NVIDIA GPUs.
3+
The `cuda` component exposes counters and controls for NVIDIA GPUs.
44

55
* [Enabling the CUDA Component](#enabling-the-cuda-component)
6-
* [Environment Variables](#environment-variables)
76
* [Known Limitations](#known-limitations)
87
* [FAQ](#faq)
98
***
10-
## Enabling the CUDA Component
119

12-
To enable reading or writing of CUDA counters the user needs to link against a
13-
PAPI library that was configured with the CUDA component enabled. As an
14-
example the following command:
15-
16-
./configure --with-components="cuda"
17-
18-
is sufficient to enable the component.
19-
20-
Typically, the utility `papi_components_avail` (available in
21-
`papi/src/utils/papi_components_avail`) will display the components available
22-
to the user, and whether they are disabled, and when they are disabled why.
10+
## Enabling the `cuda` Component
2311

24-
## Environment Variables
25-
26-
For CUDA, PAPI requires one environment variable: `PAPI_CUDA_ROOT`. This is
27-
required for both compiling and runtime.
12+
To enable reading or writing of CUDA counters the user needs to link against a
13+
PAPI library that was configured with the `cuda` component. As an example:
14+
```
15+
./configure --with-components="cuda"
16+
```
2817

29-
Typically in Linux one would export this (examples are shown below) variable but
30-
some systems have software to manage environment variables (such as `module` or
31-
`spack`), so consult with your sysadmin if you have such management software. Eg:
18+
For the component to be active, PAPI requires one environment variable to be set: `PAPI_CUDA_ROOT`. This environment variable **must** be set to the root of the Cuda Toolkit that is desired to be used for both compiling and runtime. As an example:
3219

33-
export PAPI_CUDA_ROOT=/path/to/installed/cuda
20+
```
21+
PAPI_CUDA_ROOT=/packages/cuda/#.#.#
22+
```
3423

35-
Within PAPI_CUDA_ROOT, we expect the following standard directories for building:
24+
Within `PAPI_CUDA_ROOT`, we expect the following standard directories for building:
3625

37-
PAPI_CUDA_ROOT/include
38-
PAPI_CUDA_ROOT/extras/CUPTI/include
26+
```
27+
PAPI_CUDA_ROOT/include
28+
PAPI_CUDA_ROOT/extras/CUPTI/include
29+
```
3930

4031
and for runtime:
4132

42-
PAPI_CUDA_ROOT/lib64
43-
PAPI_CUDA_ROOT/extras/CUPTI/lib64
33+
```
34+
PAPI_CUDA_ROOT/lib64
35+
PAPI_CUDA_ROOT/extras/CUPTI/lib64
36+
```
4437

45-
As of this writing (07/2021) Nvidia has overhauled performance reporting;
46-
divided now into "Legacy CUpti" and "CUpti_11", the new approach. Legacy
47-
Cupti works on devices up to Compute Capability 7.0; while only CUpti_11
48-
works on devices with Compute Capability >=7.0. Both work on CC==7.0.
38+
To verify the `cuda` component was configured with your PAPI build and is active,
39+
run `papi_component_avail` (available in `utils/papi_component_avail`). This
40+
utility will display the components configured in your PAPI build and whether they are active or disabled. If a component is disabled a message on why the component
41+
has been disabled will be directly below it.
4942

50-
This component automatically distinguishes between the two; but it cannot
51-
handle a "mix", one device that can only work with Legacy and another that
52-
can only work with CUpti_11.
43+
At the time of writing this, the `cuda` component supports the following three APIs:
5344

54-
For the CUDA component to be operational, both versions require
55-
the following dynamic libraries be found at runtime:
45+
| API | Supported Compute Capabilities | Example GPU |
46+
| ------------- | :-------------: | :-------------: |
47+
| Event API | CC <= 7.0 | P100 |
48+
| Metric API | CC <= 7.0 | P100 |
49+
| Perfworks API | CC >= 7.0 | A100 |
5650

57-
libcuda.so
58-
libcudart.so
59-
libcupti.so
51+
For the `cuda` component to be operational, the following dynamic libraries must be found at runtime for both the Event/Metric APIs and the Perfworks API:
6052

61-
CUpti\_11 also requires:
53+
```
54+
libcuda.so
55+
libcudart.so
56+
libcupti.so
57+
```
6258

63-
libnvperf_host.so
59+
For the Perfworks API, the dynamic library `libnvperf_host.so` must also be found.
6460

6561
If those libraries cannot be found or some of those are stub libraries in the
6662
standard `PAPI_CUDA_ROOT` subdirectories, you must add the correct paths,
6763
e.g. `/usr/lib64` or `/usr/lib` to `LD_LIBRARY_PATH`, separated by colons `:`.
6864
This can be set using export; e.g.
6965

70-
export LD_LIBRARY_PATH=$PAPI_CUDA_ROOT/lib64:$LD_LIBRARY_PATH
66+
```
67+
export LD_LIBRARY_PATH=$PAPI_CUDA_ROOT/lib64:$LD_LIBRARY_PATH
68+
```
7169

72-
## Known Limitations
73-
* In CUpti\_11, the number of possible events is vastly expanded; e.g. from
74-
some hundreds of events per device to over 110,000 events per device. this can
75-
make the utility `papi/src/utils/papi_native_avail` run for several minutes;
76-
as much as 2 minutes per GPU. If the output is redirected to a file, this
77-
may appear to "hang up". Give it time.
70+
## Partially Disabled Cuda Component
71+
As previously mentioned the `cuda` component supports three primary APIs to expose counters and controls for NVIDIA GPUs.
7872

79-
* Currently the CUDA component profiling only works with GPUs with compute capability > 7.0 using the NVIDIA Perfworks libraries.
73+
The Event/Metric API only overlaps with the Perfworks API at CC 7.0 (V100). Meaning in the case of machines with NVIDIA GPUs with mixed compute capabilities e.g. P100 - CC 6.0 and A100 - CC 8.0 a choice must be made for which CCs the counters and controls will be exposed for.
8074

75+
To allow for this choice to be made the `cuda` component supports being ***Partially Disabled***. Which means:
76+
77+
* If exposing counters and controls for CCs <= 7.0 (e.g. P100 and V100), then support for exposing counters and controls for CCs > 7.0 will be disabled
78+
* If exposing counters and controls for CCs >= 7.0 (e.g. V100 and A100), then support for exposing counters and controls for CCs < 7.0 will be disabled
79+
80+
By default on mixed compute capability machines, counters and controls for CCs >= 7.0 will be exposed. However, at runtime the choice of which CCs the counter and controls will be exposed for can be changed via the environment variable `PAPI_CUDA_API`. Simply
81+
set `PAPI_CUDA_API` equal to `LEGACY`, e.g:
82+
83+
```
84+
export PAPI_CUDA_API=LEGACY
85+
```
86+
87+
Important note, in the case of machines that only have GPUs with CCs = 7.0 there will be no partially disabled Cuda component. Counter and controls will be exposed via the Perfworks Metrics API; however, if you would like to expose counters and controls via the Legacy APIs please see the aforementioned environment variable.
88+
89+
## Known Limitations
90+
* Exposing counters on machines that have NVIDIA GPUs with CCS >= 7.0 is done via the Pefworks API. This API vastly expands the number of possible counters from roughly a few hundred to over 140,000 per GPU. Due to this, the PAPI utility `utils/papi_native_avail` may take a few minutes to run (as much as 2 minutes per GPU). If the output from `utils/papi_native_avail` is redirected to a file, it may appear as if it has "hung"; however, give it time and it will complete.
8191
***
8292

8393
## FAQ
@@ -99,9 +109,11 @@ subdirectories mentioned above, or `PAPI_CUDA_ROOT` does not exist at runtime, t
99109
usually `/usr/lib64`, `/lib64`, `/usr/lib` and `/lib`.
100110

101111
The system will also search the directories listed in `LD_LIBRARY_PATH`,
102-
separated by colons `:`. This can be set using export; e.g.
112+
separated by colons `:`. This can be set using export; e.g.
103113

104-
export LD_LIBRARY_PATH=/WhereLib1CanBeFound:/WhereLib2CanBeFound:$LD_LIBRARY_PATH
114+
```
115+
export LD_LIBRARY_PATH=/WhereLib1CanBeFound:WhereLib2CanBeFound:$LD_LIBRARY_PATH
116+
```
105117

106118
* If CUDA libraries are installed on your system, such that the OS can find `nvcc`, the header files, and the shared libraries, then `PAPI_CUDA_ROOT` and `LD_LIBRARY_PATH` may not be necessary.
107119

@@ -121,7 +133,7 @@ However, it is possible to load each of these libraries from custom paths by set
121133
- `PAPI_CUDA_PERFWORKS` to point to `libnvperf_host.so`
122134

123135
## Compute capability 7.0 with CUDA toolkit version 11.0
124-
NVIDIA GPUs with compute capability 7.0 support profiling on both PerfWorks API and the older Events & Metrics API.
136+
NVIDIA GPUs with compute capability 7.0 support profiling on both PerfWorks API and the older Event/Metric APIs.
125137

126138
If CUDA toolkit version > 11.0 is used, then PAPI uses the newer API, but using toolkit version 11.0, PAPI uses the events API by default.
127139

src/components/cuda/Rules.cuda

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ COMPSRCS += components/cuda/linux-cuda.c \
1414
components/cuda/cupti_utils.c \
1515
components/cuda/papi_cupti_common.c \
1616
components/cuda/cupti_profiler.c \
17-
components/cuda/cupti_events.c \
17+
components/cuda/cupti_event_and_metric.c \
1818

19-
COMPOBJS += linux-cuda.o cupti_dispatch.o cupti_utils.o papi_cupti_common.o cupti_profiler.o cupti_events.o
19+
COMPOBJS += linux-cuda.o cupti_dispatch.o cupti_utils.o papi_cupti_common.o cupti_profiler.o cupti_event_and_metric.o
2020

2121
# CFLAGS specifies compile flags; need include files here, and macro defines.
2222
CFLAGS += -I$(PAPI_CUDA_ROOT)/include -I$(PAPI_CUDA_ROOT)/extras/CUPTI/include -g $(CUDA_MACS)
@@ -37,5 +37,5 @@ papi_cupti_common.o: components/cuda/papi_cupti_common.c
3737
cupti_profiler.o: components/cuda/cupti_profiler.c
3838
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/cuda/cupti_profiler.c -o cupti_profiler.o
3939

40-
cupti_events.o: components/cuda/cupti_events.c
41-
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/cuda/cupti_events.c -o cupti_events.o
40+
cupti_event_and_metric.o: components/cuda/cupti_event_and_metric.c
41+
$(CC) $(LIBCFLAGS) $(OPTFLAGS) -c components/cuda/cupti_event_and_metric.c -o cupti_event_and_metric.o

src/components/cuda/cupti_config.h

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,23 @@
11
/**
22
* @file cupti_config.h
3-
*
4-
* @author Treece Burgess tburgess@icl.utk.edu (updated in 2024, redesigned to add device qualifier support.)
5-
* @author Anustuv Pal anustuv@icl.utk.edu
63
*/
74

85
#ifndef __LCUDA_CONFIG_H__
96
#define __LCUDA_CONFIG_H__
107

118
#include <cupti.h>
129

13-
/* used to assign the EventSet state */
10+
// Used to assign the EventSet state
1411
#define CUDA_EVENTS_STOPPED (0x0)
1512
#define CUDA_EVENTS_RUNNING (0x2)
1613

14+
#define API_PERFWORKS 1
1715
#define CUPTI_PROFILER_API_MIN_SUPPORTED_VERSION (13)
1816

19-
#if (CUPTI_API_VERSION >= CUPTI_PROFILER_API_MIN_SUPPORTED_VERSION)
20-
# define API_PERFWORKS 1
21-
#endif
17+
#define API_LEGACY 2
18+
#define CUPTI_EVENT_AND_METRIC_MAX_SUPPORTED_VERSION (13000)
2219

23-
// The Events API has been deprecated in Cuda Toolkit 12.8 and will be removed in a future
24-
// CUDA release (https://docs.nvidia.com/cupti/api/group__CUPTI__EVENT__API.html).
25-
// TODO: When the Events API has been removed #define CUPTI_EVENTS_API_MAX_SUPPORTED_VERSION
26-
// and set it to the last version that is supported. Use this macro as a runtime check in
27-
// `cuptic_determine_runtime_api`.
28-
#define API_EVENTS 2
20+
#define PAPI_CUDA_MPX_COUNTERS 512
21+
#define PAPI_CUDA_MAX_COUNTERS 30
2922

3023
#endif /* __LCUDA_CONFIG_H__ */

0 commit comments

Comments
 (0)