You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
intensity and performance analysis for individual kernels.
24
+
22
25
Run ``rocprof-compute analyze -h`` for more details.
23
26
24
27
.. _cli-walkthrough:
@@ -32,7 +35,7 @@ There are three high-level GPU analysis views:
32
35
33
36
* System Speed-of-Light: Key GPU performance metrics to show overall GPU performance and utilization.
34
37
* Memory chart: Shows memory transactions and throughput on each cache hierarchical level.
35
-
* Empirical hierarchical roofline: Roofline model that compares achieved throughput with attainable peak hardware limits, more specifically peak compute throughput and memory bandwidth (on L1/LDS/L2/HBM).
38
+
* Empirical hierarchical roofline: Roofline model that compares achieved throughput with attainable peak hardware limits, more specifically peak compute throughput and memory bandwidth (on L1/LDS/L2/HBM). When combined with kernel filtering, provides detailed per-kernel arithmetic intensity analysis and performance breakdowns.
36
39
37
40
**System Speed-of-Light:**
38
41
@@ -67,7 +70,7 @@ There are three high-level GPU analysis views:
67
70
.. note::
68
71
* Visualized memory chart and Roofline chart are only supported in single run analysis. In multiple runs comparison mode, both are switched back to basic table view.
69
72
* Visualized memory chart requires the width of the terminal output to be greater than or equal to 234 to display the whole chart properly.
70
-
* Visualized Roofline chart is adapted to the initial terminal size only. If it is not clear, you may need to adjust the terminal size and regenerate it to check the display effect.
73
+
* Visualized Roofline chart is adapted to the initial terminal size only. If it is not clear, you may need to adjust the terminal size and regenerate it to check the display effect. Roofline analysis provides detailed, structured table output with measured empirical peak values for comparison.
71
74
72
75
.. _cli-list-metrics:
73
76
@@ -309,6 +312,67 @@ Filter kernels
309
312
You should see your filtered kernels indicated by an asterisk in the **Top
310
313
Stats** table.
311
314
315
+
.. _per-kernel-roofline:
316
+
317
+
Per-kernel roofline analysis
318
+
When analyzing specific kernels, the roofline analysis provides detailed metrics for each filtered kernel:
0 commit comments