Skip to content

Commit 2d4f2c4

Browse files
authored
Add profiling guide (rapidsai#20292)
Adds a profiling guide to the C++ developer documentation, with the most commonly desired flags for `nsys profile` commands. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Paul Mattione (https://github.com/pmattione-nvidia) URL: rapidsai#20292
1 parent b260f08 commit 2d4f2c4

File tree

3 files changed

+48
-0
lines changed

3 files changed

+48
-0
lines changed

cpp/doxygen/Doxyfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -861,6 +861,7 @@ INPUT = main_page.md \
861861
developer_guide/BENCHMARKING.md \
862862
developer_guide/DOCUMENTATION.md \
863863
developer_guide/DEVELOPER_GUIDE.md \
864+
developer_guide/PROFILING.md \
864865
developer_guide/TESTING.md \
865866
../include \
866867
../include/cudf_test/column_wrapper.hpp \

cpp/doxygen/developer_guide/DEVELOPER_GUIDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ to these additional files for further documentation of libcudf best practices.
66
* [Documentation Guide](DOCUMENTATION.md) for guidelines on documenting libcudf code.
77
* [Testing Guide](TESTING.md) for guidelines on writing unit tests.
88
* [Benchmarking Guide](BENCHMARKING.md) for guidelines on writing unit benchmarks.
9+
* [Profiling Guide](PROFILING.md) for guidelines on profiling libcudf code.
910

1011
# Overview
1112

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Profiling libcudf
2+
3+
Profiling is essential for understanding performance characteristics and identifying bottlenecks in libcudf. This guide covers GPU profiling using NVIDIA Nsight Systems.
4+
5+
## NVIDIA Nsight Systems
6+
7+
[NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems) is a system-wide performance analysis tool that provides detailed timeline views of CPU and GPU activity.
8+
It's the recommended tool for profiling CUDA applications and understanding kernel execution, memory transfers, and API calls.
9+
10+
### Installation
11+
12+
Nsight Systems is included with the CUDA Toolkit, or can be downloaded from https://developer.nvidia.com/nsight-systems. The command-line tool is `nsys`. Verify installation:
13+
14+
```bash
15+
nsys --version
16+
```
17+
18+
### Recommended Profile Command
19+
20+
When profiling cuDF workloads, use the following flags:
21+
22+
```bash
23+
nsys profile --trace=nvtx,cuda,osrt --cuda-memory-usage=true --gpu-metrics-devices=0 --nvtx-domain-exclude=CCCL python script.py
24+
```
25+
26+
**Options explained:**
27+
- `--trace=nvtx,cuda,osrt`: Trace NVTX ranges, CUDA API calls, and OS runtime libraries
28+
- `--cuda-memory-usage=true`: Track CUDA memory allocation and usage
29+
- `--gpu-metrics-devices=0`: Collect GPU metrics from device 0
30+
- `--nvtx-domain-exclude=CCCL`: Exclude verbose CCCL (CUDA C++ Core Libraries) NVTX ranges
31+
32+
### Profiling Specific GPUs
33+
34+
When working with multi-GPU systems, you may want to profile a specific GPU.
35+
To profile GPUs other than device 0, use both `--gpu-metrics-devices=N` and `--env-var CUDA_VISIBLE_DEVICES=N` to ensure the application and profiler target the same device.
36+
37+
For example, modify the flags like this for profiling GPU 4:
38+
39+
```bash
40+
nsys profile --trace=nvtx,cuda,osrt --cuda-memory-usage=true --gpu-metrics-devices=4 --env-var CUDA_VISIBLE_DEVICES=4 python script.py
41+
```
42+
43+
### Analyzing Results
44+
45+
After profiling, open the `.nsys-rep` file in the Nsight Systems GUI to analyze CPU and GPU activity over time.
46+
The interface shows individual kernel launches and durations, memory allocations and transfers, and metrics like memory bandwidth utilization.

0 commit comments

Comments
 (0)