You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/components/cuda/README.md
+65-53Lines changed: 65 additions & 53 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,83 +1,93 @@
1
-
# CUDA Component
1
+
# Cuda Component
2
2
3
-
The CUDA component exposes counters and controls for NVIDIA GPUs.
3
+
The `cuda` component exposes counters and controls for NVIDIA GPUs.
4
4
5
5
*[Enabling the CUDA Component](#enabling-the-cuda-component)
6
-
*[Environment Variables](#environment-variables)
7
6
*[Known Limitations](#known-limitations)
8
7
*[FAQ](#faq)
9
8
***
10
-
## Enabling the CUDA Component
11
9
12
-
To enable reading or writing of CUDA counters the user needs to link against a
13
-
PAPI library that was configured with the CUDA component enabled. As an
14
-
example the following command:
15
-
16
-
./configure --with-components="cuda"
17
-
18
-
is sufficient to enable the component.
19
-
20
-
Typically, the utility `papi_components_avail` (available in
21
-
`papi/src/utils/papi_components_avail`) will display the components available
22
-
to the user, and whether they are disabled, and when they are disabled why.
10
+
## Enabling the `cuda` Component
23
11
24
-
## Environment Variables
25
-
26
-
For CUDA, PAPI requires one environment variable: `PAPI_CUDA_ROOT`. This is
27
-
required for both compiling and runtime.
12
+
To enable reading or writing of CUDA counters the user needs to link against a
13
+
PAPI library that was configured with the `cuda` component. As an example:
14
+
```
15
+
./configure --with-components="cuda"
16
+
```
28
17
29
-
Typically in Linux one would export this (examples are shown below) variable but
30
-
some systems have software to manage environment variables (such as `module` or
31
-
`spack`), so consult with your sysadmin if you have such management software. Eg:
18
+
For the component to be active, PAPI requires one environment variable to be set: `PAPI_CUDA_ROOT`. This environment variable **must** be set to the root of the Cuda Toolkit that is desired to be used for both compiling and runtime. As an example:
32
19
33
-
export PAPI_CUDA_ROOT=/path/to/installed/cuda
20
+
```
21
+
PAPI_CUDA_ROOT=/packages/cuda/#.#.#
22
+
```
34
23
35
-
Within PAPI_CUDA_ROOT, we expect the following standard directories for building:
24
+
Within `PAPI_CUDA_ROOT`, we expect the following standard directories for building:
36
25
37
-
PAPI_CUDA_ROOT/include
38
-
PAPI_CUDA_ROOT/extras/CUPTI/include
26
+
```
27
+
PAPI_CUDA_ROOT/include
28
+
PAPI_CUDA_ROOT/extras/CUPTI/include
29
+
```
39
30
40
31
and for runtime:
41
32
42
-
PAPI_CUDA_ROOT/lib64
43
-
PAPI_CUDA_ROOT/extras/CUPTI/lib64
33
+
```
34
+
PAPI_CUDA_ROOT/lib64
35
+
PAPI_CUDA_ROOT/extras/CUPTI/lib64
36
+
```
44
37
45
-
As of this writing (07/2021) Nvidia has overhauled performance reporting;
46
-
divided now into "Legacy CUpti" and "CUpti_11", the new approach. Legacy
47
-
Cupti works on devices up to Compute Capability 7.0; while only CUpti_11
48
-
works on devices with Compute Capability >=7.0. Both work on CC==7.0.
38
+
To verify the `cuda` component was configured with your PAPI build and is active,
39
+
run `papi_component_avail` (available in `utils/papi_component_avail`). This
40
+
utility will display the components configured in your PAPI build and whether they are active or disabled. If a component is disabled a message on why the component
41
+
has been disabled will be directly below it.
49
42
50
-
This component automatically distinguishes between the two; but it cannot
51
-
handle a "mix", one device that can only work with Legacy and another that
52
-
can only work with CUpti_11.
43
+
At the time of writing this, the `cuda` component supports the following three APIs:
53
44
54
-
For the CUDA component to be operational, both versions require
55
-
the following dynamic libraries be found at runtime:
45
+
| API | Supported Compute Capabilities | Example GPU |
For the `cuda` component to be operational, the following dynamic libraries must be found at runtime for both the Event/Metric APIs and the Perfworks API:
60
52
61
-
CUpti\_11 also requires:
53
+
```
54
+
libcuda.so
55
+
libcudart.so
56
+
libcupti.so
57
+
```
62
58
63
-
libnvperf_host.so
59
+
For the Perfworks API, the dynamic library `libnvperf_host.so` must also be found.
64
60
65
61
If those libraries cannot be found or some of those are stub libraries in the
66
62
standard `PAPI_CUDA_ROOT` subdirectories, you must add the correct paths,
67
63
e.g. `/usr/lib64` or `/usr/lib` to `LD_LIBRARY_PATH`, separated by colons `:`.
* In CUpti\_11, the number of possible events is vastly expanded; e.g. from
74
-
some hundreds of events per device to over 110,000 events per device. this can
75
-
make the utility `papi/src/utils/papi_native_avail` run for several minutes;
76
-
as much as 2 minutes per GPU. If the output is redirected to a file, this
77
-
may appear to "hang up". Give it time.
70
+
## Partially Disabled Cuda Component
71
+
As previously mentioned the `cuda` component supports three primary APIs to expose counters and controls for NVIDIA GPUs.
78
72
79
-
* Currently the CUDA component profiling only works with GPUs with compute capability > 7.0 using the NVIDIA Perfworks libraries.
73
+
The Event/Metric API only overlaps with the Perfworks API at CC 7.0 (V100). Meaning in the case of machines with NVIDIA GPUs with mixed compute capabilities e.g. P100 - CC 6.0 and A100 - CC 8.0 a choice must be made for which CCs the counters and controls will be exposed for.
80
74
75
+
To allow for this choice to be made the `cuda` component supports being ***Partially Disabled***. Which means:
76
+
77
+
* If exposing counters and controls for CCs <= 7.0 (e.g. P100 and V100), then support for exposing counters and controls for CCs > 7.0 will be disabled
78
+
* If exposing counters and controls for CCs >= 7.0 (e.g. V100 and A100), then support for exposing counters and controls for CCs < 7.0 will be disabled
79
+
80
+
By default on mixed compute capability machines, counters and controls for CCs >= 7.0 will be exposed. However, at runtime the choice of which CCs the counter and controls will be exposed for can be changed via the environment variable `PAPI_CUDA_API`. Simply
81
+
set `PAPI_CUDA_API` equal to `LEGACY`, e.g:
82
+
83
+
```
84
+
export PAPI_CUDA_API=LEGACY
85
+
```
86
+
87
+
Important note, in the case of machines that only have GPUs with CCs = 7.0 there will be no partially disabled Cuda component. Counter and controls will be exposed via the Perfworks Metrics API; however, if you would like to expose counters and controls via the Legacy APIs please see the aforementioned environment variable.
88
+
89
+
## Known Limitations
90
+
* Exposing counters on machines that have NVIDIA GPUs with CCS >= 7.0 is done via the Pefworks API. This API vastly expands the number of possible counters from roughly a few hundred to over 140,000 per GPU. Due to this, the PAPI utility `utils/papi_native_avail` may take a few minutes to run (as much as 2 minutes per GPU). If the output from `utils/papi_native_avail` is redirected to a file, it may appear as if it has "hung"; however, give it time and it will complete.
81
91
***
82
92
83
93
## FAQ
@@ -99,9 +109,11 @@ subdirectories mentioned above, or `PAPI_CUDA_ROOT` does not exist at runtime, t
99
109
usually `/usr/lib64`, `/lib64`, `/usr/lib` and `/lib`.
100
110
101
111
The system will also search the directories listed in `LD_LIBRARY_PATH`,
102
-
separated by colons `:`. This can be set using export; e.g.
112
+
separated by colons `:`. This can be set using export; e.g.
* If CUDA libraries are installed on your system, such that the OS can find `nvcc`, the header files, and the shared libraries, then `PAPI_CUDA_ROOT` and `LD_LIBRARY_PATH` may not be necessary.
107
119
@@ -121,7 +133,7 @@ However, it is possible to load each of these libraries from custom paths by set
121
133
-`PAPI_CUDA_PERFWORKS` to point to `libnvperf_host.so`
122
134
123
135
## Compute capability 7.0 with CUDA toolkit version 11.0
124
-
NVIDIA GPUs with compute capability 7.0 support profiling on both PerfWorks API and the older Events & Metrics API.
136
+
NVIDIA GPUs with compute capability 7.0 support profiling on both PerfWorks API and the older Event/Metric APIs.
125
137
126
138
If CUDA toolkit version > 11.0 is used, then PAPI uses the newer API, but using toolkit version 11.0, PAPI uses the events API by default.
0 commit comments