Skip to content

Commit 16c00d2

Browse files
committed
Lab11: Added simple gpu prof exerc.
1 parent d8959d8 commit 16c00d2

File tree

1 file changed

+70
-2
lines changed

1 file changed

+70
-2
lines changed

docs/src/lecture_11/lab.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,7 @@ Let's now explore what the we can do with this array programming paradigm on few
8282
<header class="admonition-header">Exercise</header>
8383
<div class="admonition-body">
8484
```
85-
*we can use test images*
86-
Load a sufficiently large image to the GPU such as this [one]()*TODO LINK* and manipulate it in the following ways:
85+
Load a sufficiently large image to the GPU such as the one provided in the lab (anything >1Mpx should be enough) and manipulate it in the following ways:
8786
- create a negative
8887
- half the pixel brightness
8988
- find the brightest pixels
@@ -505,7 +504,76 @@ Gray.(Array(cgray_img_moved))
505504
```
506505

507506
### Profiling
507+
CUDA framework offers a wide variety of developer tooling for debugging and profiling our own kernels. In this section we will focus profiling using the Nsight Systems software that you can download after registering [here](https://developer.nvidia.com/nsight-systems). It contains both `nsys` profiler as well as `nsys-ui`GUI application for viewing the results. First we have to run `julia` using `nsys` application.
508+
- on Windows with PowerShell (available on the lab computers)
509+
```ps
510+
& "C:\Program Files\NVIDIA Corporation\Nsight Systems 2021.2.4\target-windows-x64\nsys.exe" launch --trace=cuda,nvtx H:/Downloads/julia-1.6.3/bin/julia.exe --color=yes --color=yes --project=$((Get-Item .).FullName)
511+
```
512+
- on Linux
513+
```bash
514+
/full/path/to/nsys launch --trace=cuda,nvtx /home/honza/Apps/julia-1.6.5/bin/julia --color=yes --project=.
515+
```
516+
Once `julia` starts we have to additionally (on the lab computers, where we cannot modify env path) instruct `CUDA.jl`, where `nsys.exe` is located.
517+
```julia
518+
ENV["JULIA_CUDA_NSYS"] = "C:\\Program Files\\NVIDIA Corporation\\Nsight Systems 2021.2.4\\target-windows-x64\\nsys.exe"
519+
```
520+
Now we should be ready to start profiling our kernels.
521+
522+
```@raw html
523+
<div class="admonition is-category-exercise">
524+
<header class="admonition-header">Exercise</header>
525+
<div class="admonition-body">
526+
```
527+
Choose a function/kernel out of previous exercises, in order to profile it. Use the `CUDA.@profile` macro the following patter to launch profiling of a block of code with `CUDA.jl`
528+
```julia
529+
CUDA.@profile CUDA.@sync begin
530+
NVTX.@range "something" begin
531+
# run some kernel
532+
end
533+
534+
NVTX.@range "something" begin
535+
# run some kernel
536+
end
537+
end
538+
```
539+
where `NVTX.@range "something"` is part of `CUDA.jl` as well and serves us to mark a piece of execution for better readability later. Inspect the result in `NSight Systems`.
540+
541+
!!! note "Profiling overhead"
542+
It is recommended to run the code twice as shown above, because the first execution with profiler almost always takes longer, even after compilation of the kernel itself.
543+
544+
```@raw html
545+
</div></div>
546+
<details class = "solution-body">
547+
<summary class = "solution-header">Solution:</summary><p>
548+
```
549+
In order to show multiple kernels running let's demonstrate profiling of the first image processing exercise
550+
```julia
551+
CUDA.@profile CUDA.@sync begin
552+
NVTX.@range "copy H2D" begin
553+
rgb_img = FileIO.load("image.jpg");
554+
gray_img = Float32.(Gray.(rgb_img));
555+
cgray_img = CuArray(gray_img);
556+
end
557+
558+
NVTX.@range "negative" begin
559+
negative(cgray_img);
560+
end
561+
NVTX.@range "darken" begin
562+
darken(cgray_img);
563+
end
564+
NVTX.@range "fourier" begin
565+
fourier(cgray_img);
566+
end
567+
NVTX.@range "brightest" begin
568+
brightest(cgray_img);
569+
end
570+
end
571+
```
572+
Running this code should create a report in the current directory with the name `report-**.***`, which we can examine in `NSight Systems`.
508573

574+
```@raw html
575+
</p></details>
576+
```
509577

510578
### Matrix multiplication
511579

0 commit comments

Comments
 (0)