Skip to content

Commit ce883e7

Browse files
authored
[ROCm] Fix WGP counts for Radeon cards (iree-org#20291)
These are meant to be used in distribution/scheduling heuristics like workgroup reordering and should use the level of granularity used for dispatching workgroups. This also aligns with the SIMDs-per-WGP constant which was (correctly) set to 4 already. Signed-off-by: Jakub Kuderski <[email protected]>
1 parent 8a1409e commit ce883e7

File tree

2 files changed

+23
-12
lines changed

2 files changed

+23
-12
lines changed

compiler/plugins/target/ROCM/test/target_device_features.mlir

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@
5959
// GFX1201-SAME: mma = [<WMMAR4_F32_16x16x16_F16>, <WMMAR4_F16_16x16x16_F16>, <WMMAR4_F32_16x16x16_BF16>, <WMMAR4_BF16_16x16x16_BF16>, <WMMAR4_F32_16x16x16_F8E5M2>, <WMMAR4_F32_16x16x16_F8E5M2_F8E4M3FN>, <WMMAR4_F32_16x16x16_F8E4M3FN>, <WMMAR4_F32_16x16x16_F8E4M3FN_F8E5M2>, <WMMAR4_I32_16x16x16_I8>]
6060
// GFX1201-SAME: subgroup_size_choices = [32, 64]
6161

62-
// RX9070XT: chip = <wgp_count = 64, sku = "rx9070xt">>
63-
// RX9070: chip = <wgp_count = 56, sku = "rx9070">>
62+
// RX9070XT: chip = <wgp_count = 32, sku = "rx9070xt">>
63+
// RX9070: chip = <wgp_count = 28, sku = "rx9070">>
6464

6565
stream.executable public @reduce_dispatch {
6666
stream.executable.export @reduce_dispatch workgroups(%arg0: index) -> (index, index, index) {

compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/KnownTargets.cpp

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,8 @@ std::optional<TargetDetails> getAMDGPUTargetDetails(StringRef target) {
319319
const WgpDetails *rdna2Wgp = getRDNA2WgpDetails();
320320
const WgpDetails *rdna1Wgp = getRDNA1WgpDetails();
321321

322+
// --- CDNA --- //
323+
322324
// "AMD Instinct MI300 Series Product Offerings" in Page 23 of
323325
// https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
324326
static const ChipDetails mi300xChip = {304, "mi300x"};
@@ -336,20 +338,29 @@ std::optional<TargetDetails> getAMDGPUTargetDetails(StringRef target) {
336338
// https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna-white-paper.pdf
337339
static const ChipDetails mi100Chip = {120, "mi100"};
338340

341+
// --- RDNA --- //
342+
343+
// With RDNA, two Compute Units form a Workgroup Processor (WGP).
344+
// A kernel can be dispatched in either the CU mode or the WGP mode (where
345+
// some resources like LDS are accessible to both CUs). The default with HIP
346+
// is the WGP mode. For the purpose of distribution heuristics, we divide the
347+
// number of CU reported in the hardware spaces by two to get the number of
348+
// WGPs.
349+
339350
// AMD RDNA4 architecture:
340351
// https://www.amd.com/en/newsroom/press-releases/2025-2-28-amd-unveils-next-generation-amd-rdna-4-architectu.html.
341-
static const ChipDetails rx9070xtChip = {64, "rx9070xt"};
342-
static const ChipDetails rx9070Chip = {56, "rx9070"};
352+
static const ChipDetails rx9070xtChip = {64 / 2, "rx9070xt"};
353+
static const ChipDetails rx9070Chip = {56 / 2, "rx9070"};
343354

344355
// AMD RDNA3.
345-
static const ChipDetails rx7900xtxChip = {96, "rx7900xtx"};
346-
static const ChipDetails rx7900xtChip = {84, "rx7900xt"};
347-
static const ChipDetails rx7800xtChip = {60, "rx7800xt"};
348-
static const ChipDetails rx7700xtChip = {54, "rx7700xt"};
349-
static const ChipDetails v710Chip = {54, "v710"};
350-
static const ChipDetails w7900Chip = {96, "w7900"};
351-
static const ChipDetails w7800Chip = {70, "w7800"};
352-
static const ChipDetails w7700Chip = {48, "w7700"};
356+
static const ChipDetails rx7900xtxChip = {96 / 2, "rx7900xtx"};
357+
static const ChipDetails rx7900xtChip = {84 / 2, "rx7900xt"};
358+
static const ChipDetails rx7800xtChip = {60 / 2, "rx7800xt"};
359+
static const ChipDetails rx7700xtChip = {54 / 2, "rx7700xt"};
360+
static const ChipDetails v710Chip = {54 / 2, "v710"};
361+
static const ChipDetails w7900Chip = {96 / 2, "w7900"};
362+
static const ChipDetails w7800Chip = {70 / 2, "w7800"};
363+
static const ChipDetails w7700Chip = {48 / 2, "w7700"};
353364

354365
// See https://llvm.org/docs/AMDGPUUsage.html#processors for gfxN to
355366
// cdnaN/rdnaN mapping.

0 commit comments

Comments
 (0)