You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Load a sufficiently large image to the GPU such as this [one]()*TODO LINK* and manipulate it in the following ways:
85
+
Load a sufficiently large image to the GPU such as the one provided in the lab (anything >1Mpx should be enough) and manipulate it in the following ways:
CUDA framework offers a wide variety of developer tooling for debugging and profiling our own kernels. In this section we will focus profiling using the Nsight Systems software that you can download after registering [here](https://developer.nvidia.com/nsight-systems). It contains both `nsys` profiler as well as `nsys-ui`GUI application for viewing the results. First we have to run `julia` using `nsys` application.
508
+
- on Windows with PowerShell (available on the lab computers)
Choose a function/kernel out of previous exercises, in order to profile it. Use the `CUDA.@profile` macro the following patter to launch profiling of a block of code with `CUDA.jl`
528
+
```julia
529
+
CUDA.@profile CUDA.@syncbegin
530
+
NVTX.@range"something"begin
531
+
# run some kernel
532
+
end
533
+
534
+
NVTX.@range"something"begin
535
+
# run some kernel
536
+
end
537
+
end
538
+
```
539
+
where `NVTX.@range "something"` is part of `CUDA.jl` as well and serves us to mark a piece of execution for better readability later. Inspect the result in `NSight Systems`.
540
+
541
+
!!! note "Profiling overhead"
542
+
It is recommended to run the code twice as shown above, because the first execution with profiler almost always takes longer, even after compilation of the kernel itself.
543
+
544
+
```@raw html
545
+
</div></div>
546
+
<details class = "solution-body">
547
+
<summary class = "solution-header">Solution:</summary><p>
548
+
```
549
+
In order to show multiple kernels running let's demonstrate profiling of the first image processing exercise
550
+
```julia
551
+
CUDA.@profile CUDA.@syncbegin
552
+
NVTX.@range"copy H2D"begin
553
+
rgb_img = FileIO.load("image.jpg");
554
+
gray_img =Float32.(Gray.(rgb_img));
555
+
cgray_img =CuArray(gray_img);
556
+
end
557
+
558
+
NVTX.@range"negative"begin
559
+
negative(cgray_img);
560
+
end
561
+
NVTX.@range"darken"begin
562
+
darken(cgray_img);
563
+
end
564
+
NVTX.@range"fourier"begin
565
+
fourier(cgray_img);
566
+
end
567
+
NVTX.@range"brightest"begin
568
+
brightest(cgray_img);
569
+
end
570
+
end
571
+
```
572
+
Running this code should create a report in the current directory with the name `report-**.***`, which we can examine in `NSight Systems`.
0 commit comments