|
| 1 | +# Profiling libcudf |
| 2 | + |
| 3 | +Profiling is essential for understanding performance characteristics and identifying bottlenecks in libcudf. This guide covers GPU profiling using NVIDIA Nsight Systems. |
| 4 | + |
| 5 | +## NVIDIA Nsight Systems |
| 6 | + |
| 7 | +[NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems) is a system-wide performance analysis tool that provides detailed timeline views of CPU and GPU activity. |
| 8 | +It's the recommended tool for profiling CUDA applications and understanding kernel execution, memory transfers, and API calls. |
| 9 | + |
| 10 | +### Installation |
| 11 | + |
| 12 | +Nsight Systems is included with the CUDA Toolkit, or can be downloaded from https://developer.nvidia.com/nsight-systems. The command-line tool is `nsys`. Verify installation: |
| 13 | + |
| 14 | +```bash |
| 15 | +nsys --version |
| 16 | +``` |
| 17 | + |
| 18 | +### Recommended Profile Command |
| 19 | + |
| 20 | +When profiling cuDF workloads, use the following flags: |
| 21 | + |
| 22 | +```bash |
| 23 | +nsys profile --trace=nvtx,cuda,osrt --cuda-memory-usage=true --gpu-metrics-devices=0 --nvtx-domain-exclude=CCCL python script.py |
| 24 | +``` |
| 25 | + |
| 26 | +**Options explained:** |
| 27 | +- `--trace=nvtx,cuda,osrt`: Trace NVTX ranges, CUDA API calls, and OS runtime libraries |
| 28 | +- `--cuda-memory-usage=true`: Track CUDA memory allocation and usage |
| 29 | +- `--gpu-metrics-devices=0`: Collect GPU metrics from device 0 |
| 30 | +- `--nvtx-domain-exclude=CCCL`: Exclude verbose CCCL (CUDA C++ Core Libraries) NVTX ranges |
| 31 | + |
| 32 | +### Profiling Specific GPUs |
| 33 | + |
| 34 | +When working with multi-GPU systems, you may want to profile a specific GPU. |
| 35 | +To profile GPUs other than device 0, use both `--gpu-metrics-devices=N` and `--env-var CUDA_VISIBLE_DEVICES=N` to ensure the application and profiler target the same device. |
| 36 | + |
| 37 | +For example, modify the flags like this for profiling GPU 4: |
| 38 | + |
| 39 | +```bash |
| 40 | +nsys profile --trace=nvtx,cuda,osrt --cuda-memory-usage=true --gpu-metrics-devices=4 --env-var CUDA_VISIBLE_DEVICES=4 python script.py |
| 41 | +``` |
| 42 | + |
| 43 | +### Analyzing Results |
| 44 | + |
| 45 | +After profiling, open the `.nsys-rep` file in the Nsight Systems GUI to analyze CPU and GPU activity over time. |
| 46 | +The interface shows individual kernel launches and durations, memory allocations and transfers, and metrics like memory bandwidth utilization. |
0 commit comments