Skip to content

Commit 8099fd3

Browse files
authored
Architecture data support and diagrams added (#814)
* Architeture data support and diagrams added * Architecture image added * CDNA4 Image updated * Review feedback incorporated * CDNA 4 partition mode added * Fei review feedback incorporated
1 parent ccb34e8 commit 8099fd3

File tree

5 files changed

+144
-11
lines changed

5 files changed

+144
-11
lines changed

docs/conceptual/performance-model.rst

Lines changed: 144 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. meta::
22
:description: ROCm Compute Profiler performance model
33
:keywords: Omniperf, ROCm Compute Profiler, ROCm, performance, model, profiler, tool, Instinct,
4-
accelerator, AMD
4+
accelerator, AMD, CDNA
55

66
*****************
77
Performance model
@@ -10,12 +10,148 @@ Performance model
1010
ROCm Compute Profiler makes available an extensive list of metrics to better understand
1111
achieved application performance on AMD Instinct™ MI-series accelerators
1212
including Graphics Core Next™ (GCN) GPUs like the AMD Instinct MI50, CDNA™
13-
accelerators like the MI100, and CDNA2 accelerators such as the MI250X, MI250,
14-
and MI210.
13+
accelerators like the MI100, CDNA2 accelerators such as the AMD Instinct MI250X, MI250,
14+
and MI210, CDNA3 accelerators such as the AMD Instinct MI300A, MI300X, MI325X, and CDNA4 accelerators such as MI350X and MI355X.
15+
16+
The table provides key details and support available for the different architectures:
17+
18+
✅: Supported
19+
❌: Unsupported
20+
21+
**Architecture details**
22+
23+
.. table::
24+
:widths: 30 30 30 30 30
25+
26+
+-----------------+-----------+---------------------------------+-------------------------------------+-------------------------+
27+
|Architecture |CDNA |CDNA 2 |CDNA 3 |CDNA 4 |
28+
+=================+===========+=================================+=====================================+=========================+
29+
|Chip packaging |Single Die |Two graphics Compute Dies (GCDs) |One logical processor with dozen |Similar to CDNA3, |
30+
| | |into single package. |chiplets, configurable with partition|Multi-Die chiplet, but |
31+
| | | |modes. |with two I/O Dies (IODs) |
32+
+-----------------+-----------+---------------------------------+-------------------------------------+-------------------------+
33+
|Supported series |MI100 |MI200 |MI300A |MI350X |
34+
| | +---------------------------------+-------------------------------------+-------------------------+
35+
| | |MI210 |MI300X |MI355X |
36+
| | +---------------------------------+-------------------------------------+-------------------------+
37+
| | |MI250 |MI325X | |
38+
+-----------------+-----------+---------------------------------+-------------------------------------+-------------------------+
39+
|Spatial partition|❌ |❌ |Compute partition mode and |Compute partition mode |
40+
|mode | | |Memory partition mode |and Memory partition mode|
41+
+-----------------+-----------+---------------------------------+-------------------------------------+-------------------------+
42+
43+
**Data type support**
44+
45+
.. list-table::
46+
:header-rows: 1
47+
48+
*
49+
- Architecture
50+
- FP32
51+
- FP64
52+
- FP16
53+
- INT32 ADD/LOGIC/MAD
54+
- INT8 DOT
55+
- INT4 DOT
56+
- FP32 GEMM
57+
- FP64 GEMM
58+
- FP16 GEMM
59+
- BF16 GEMM
60+
- INT8 GEMM
61+
- Packed FP32
62+
- TF32 GEMM
63+
- FP8/BF8
64+
*
65+
- CDNA
66+
- ✅
67+
- ✅
68+
- ✅
69+
- ✅
70+
- ✅
71+
- ✅
72+
- ✅
73+
- ❌
74+
- ❌
75+
- ❌
76+
- ❌
77+
- ❌
78+
- ❌
79+
- ❌
80+
*
81+
- CDNA2
82+
- ✅
83+
- ✅
84+
- ✅
85+
- ✅
86+
- ✅
87+
- ✅
88+
- ✅
89+
- ✅
90+
- ✅
91+
- ✅
92+
- ✅
93+
- ✅
94+
- ❌
95+
- ❌
96+
*
97+
- CDNA3
98+
- ✅
99+
- ✅
100+
- ✅
101+
- ✅
102+
- ✅
103+
- ✅
104+
- ✅
105+
- ✅
106+
- ✅
107+
- ✅
108+
- ✅
109+
- ✅
110+
- ✅
111+
- ✅
112+
*
113+
- CDNA4
114+
- ✅
115+
- ✅
116+
- ✅
117+
- ✅
118+
- ✅
119+
- ✅
120+
- ✅
121+
- ✅
122+
- ✅
123+
- ✅
124+
- ✅
125+
- ✅
126+
- ❌
127+
- ✅
15128

16129
To best use profiling data, it's important to understand the role of various
17-
hardware blocks of AMD Instinct accelerators. This section describes each
18-
hardware block on the accelerator as interacted with by a software developer to
130+
hardware blocks of AMD Instinct accelerators. Refer to the following top level GPU architecture diagram to understand the hardware blocks of each architectures.
131+
132+
.. tab-set::
133+
134+
.. tab-item:: CDNA
135+
136+
.. image:: ../data/conceptual/CDNA.png
137+
:alt: CDNA top level architecture diagram with zoomed view of Compute unit
138+
139+
.. tab-item:: CDNA2
140+
141+
.. image:: ../data/conceptual/CDNA2.png
142+
:alt: CDNA2 top level architecture diagram with zoomed view of Compute unit
143+
144+
.. tab-item:: CDNA3
145+
146+
.. image:: ../data/conceptual/CDNA3.png
147+
:alt: CDNA3 top level architecture diagram with zoomed view of Accelerator Complex Dies (XCDs)
148+
149+
.. tab-item:: CDNA4
150+
151+
.. image:: ../data/conceptual/CDNA4.png
152+
:alt: CDNA4 top level architecture diagram
153+
154+
This section describes each hardware block on the accelerator as interacted with by a software developer to
19155
give a deeper understanding of the metrics reported by profiling data. Refer to
20156
:doc:`/tutorial/profiling-by-example` for more practical examples and details on how
21157
to use ROCm Compute Profiler to optimize your code.
@@ -24,15 +160,12 @@ to use ROCm Compute Profiler to optimize your code.
24160

25161
.. note::
26162

27-
In this chapter, **MI2XX** refers to any of the CDNA2 architecture-based AMD
163+
In this documentation, **MI2XX** refers to any of the CDNA2 architecture-based MI200 series accelerators such as AMD
28164
Instinct MI250X, MI250, and MI210 accelerators interchangeably in cases
29-
where the exact product at hand is not relevant.
165+
where the exact product at hand is not relevant. For product details, see `AMD Instinct GPUs <https://www.amd.com/en/products/accelerators/instinct.html>`_.
30166

31167
For a comparison of AMD Instinct accelerator specifications, refer to
32-
:doc:`Hardware specifications <rocm:reference/gpu-arch-specs>`. For product
33-
details, see the :prod-page:`MI250X <mi200/mi250x>`,
34-
:prod-page:`MI250 <mi200/mi250>`, and :prod-page:`MI210 <mi200/mi210>`
35-
product pages.
168+
:doc:`Hardware specifications <rocm:reference/gpu-arch-specs>`.
36169

37170
In this chapter, the AMD Instinct performance model used by ROCm Compute Profiler is divided into a handful of
38171
key hardware blocks, each detailed in the following sections:

docs/data/conceptual/CDNA.png

261 KB
Loading

docs/data/conceptual/CDNA2.png

747 KB
Loading

docs/data/conceptual/CDNA3.png

299 KB
Loading

docs/data/conceptual/CDNA4.png

122 KB
Loading

0 commit comments

Comments
 (0)