You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/pull_request_template.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,5 +54,5 @@ To make sure the code is performing as expected on GPU devices, I have:
54
54
-[ ] Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
55
55
-[ ] Enclosed the new feature via `nvtx` ranges so that they can be identified in profiles
56
56
-[ ] Ran a Nsight Systems profile using `./mfc.sh run XXXX --gpu -t simulation --nsys`, and have attached the output file (`.nsys-rep`) and plain text results to this PR
57
-
-[ ] Ran an Omniperf profile using `./mfc.sh run XXXX --gpu -t simulation --omniperf`, and have attached the output file and plain text results to this PR.
57
+
-[ ] Ran a Rocprof Systems profile using `./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace`, and have attached the output file and plain text results to this PR.
58
58
-[ ] Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature
Copy file name to clipboardExpand all lines: docs/documentation/running.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,13 +98,13 @@ Learn more about NVIDIA Nsight Compute [here](https://docs.nvidia.com/nsight-com
98
98
99
99
100
100
#### AMD GPUs
101
-
- Rocprof (ROC): `./mfc.sh run ... -t simulation --roc --hip-trace [rocprof flags]` allows one to visualize MFC's system-wide performance with [Perfetto UI](https://ui.perfetto.dev/).
101
+
- Rocprof Systems (RSYS): `./mfc.sh run ... -t simulation --rsys --hip-trace [rocprof flags]` allows one to visualize MFC's system-wide performance with [Perfetto UI](https://ui.perfetto.dev/).
102
102
When used, `--roc` will run the simulation and generate files in the case directory for all targets.
103
103
`results.json` can then be imported in [Perfetto's UI](https://ui.perfetto.dev/).
104
104
Learn more about AMD Rocprof [here](https://rocm.docs.amd.com/projects/rocprofiler/en/docs-5.5.1/rocprof.html)
105
105
It is best to run case files with few timesteps to keep the report file sizes manageable.
106
-
-Omniperf (OMNI): `./mfc.sh run ... -t simulation --omni [omniperf flags]` allows one to conduct kernel-level profiling with [AMD's Omniperf](https://rocm.docs.amd.com/projects/omniperf/en/latest/index.html).
107
-
When used, `--omni` will output profiling information for all subroutines, including rooflines, cache usage, register usage, and more, after the simulation is run.
106
+
-Rocprof Compute (RCU): `./mfc.sh run ... -t simulation --rcu -n <name> [rocprof-compute flags]` allows one to conduct kernel-level profiling with [ROCm Compute Profiler](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/what-is-rocprof-compute.html).
107
+
When used, `--rcu` will output profiling information for all subroutines, including rooflines, cache usage, register usage, and more, after the simulation is run.
108
108
Adding this argument will moderately slow down the simulation and run the MFC executable several times.
109
109
For this reason, it should only be used with case files with few timesteps.
0 commit comments