[AMDGPU] Make cluster a target feature #162040

shiltian · 2025-10-06T04:36:50Z

This replaces the original arch check.

shiltian · 2025-10-06T04:37:12Z

[AMDGPU] Make cluster a target feature #162040 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-10-06T04:37:22Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang

Author: Shilei Tian (shiltian)

Changes

This replaces the original arch check.

Full diff: https://github.com/llvm/llvm-project/pull/162040.diff

4 Files Affected:

(modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+2-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+7)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3-1)
(modified) llvm/lib/TargetParser/TargetParser.cpp (+1)

diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index af1ef64764cf4..c0c22bcecf9ae 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -109,8 +109,8 @@
 // GFX1153: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+atomic-fmin-fmax-global-f32,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
 // GFX1200: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot11-insts,+dot12-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+fp8-conversion-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
 // GFX1201: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot10-insts,+dot11-insts,+dot12-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+fp8-conversion-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32"
-// GFX1250: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bitop3-insts,+ci-insts,+dl-insts,+dot7-insts,+dot8-insts,+dpp,+fp8-conversion-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx8-insts,+gfx9-insts,+permlane16-swap,+prng-inst,+setprio-inc-wg-inst,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+wavefrontsize32"
-// GFX1251: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bitop3-insts,+ci-insts,+dl-insts,+dot7-insts,+dot8-insts,+dpp,+fp8-conversion-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx8-insts,+gfx9-insts,+permlane16-swap,+prng-inst,+setprio-inc-wg-inst,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+wavefrontsize32"
+// GFX1250: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bitop3-insts,+ci-insts,+cluster,+dl-insts,+dot7-insts,+dot8-insts,+dpp,+fp8-conversion-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx8-insts,+gfx9-insts,+permlane16-swap,+prng-inst,+setprio-inc-wg-inst,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+wavefrontsize32"
+// GFX1251: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-fmin-fmax-global-f32,+atomic-fmin-fmax-global-f64,+atomic-global-pk-add-bf16-inst,+bf16-cvt-insts,+bf16-pk-insts,+bf16-trans-insts,+bitop3-insts,+ci-insts,+cluster,+dl-insts,+dot7-insts,+dot8-insts,+dpp,+fp8-conversion-insts,+fp8e5m3-insts,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx12-insts,+gfx1250-insts,+gfx8-insts,+gfx9-insts,+permlane16-swap,+prng-inst,+setprio-inc-wg-inst,+tanh-insts,+tensor-cvt-lut-insts,+transpose-load-f4f6-insts,+vmem-pref-insts,+wavefrontsize32"
 
 // GFX1103-W64: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+atomic-fmin-fmax-global-f32,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize64"
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 9446144d30e9b..ddb238120dd02 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1462,6 +1462,12 @@ def Feature45BitNumRecordsBufferResource : SubtargetFeature< "45-bit-num-records
   "The buffer resource (V#) supports 45-bit num_records"
 >;
 
+def FeatureCluster : SubtargetFeature< "cluster",
+  "HasCluster",
+  "true",
+  "Has cluster support"
+>;
+
 // Dummy feature used to disable assembler instructions.
 def FeatureDisable : SubtargetFeature<"",
   "FeatureDisable","true",
@@ -2128,6 +2134,7 @@ def FeatureISAVersion12_50 : FeatureSet<
    Feature45BitNumRecordsBufferResource,
    FeatureSupportsXNACK,
    FeatureXNACK,
+   FeatureCluster,
 ]>;
 
 def FeatureISAVersion12_51 : FeatureSet<
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index a54d6651c25c1..879bf5aac079f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -288,6 +288,8 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool Has45BitNumRecordsBufferResource = false;
 
+  bool HasCluster = false;
+
   // Dummy feature to use for assembler in tablegen.
   bool FeatureDisable = false;
 
@@ -1837,7 +1839,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   }
 
   /// \returns true if the subtarget supports clusters of workgroups.
-  bool hasClusters() const { return GFX1250Insts; }
+  bool hasClusters() const { return HasCluster; }
 
   /// \returns true if the subtarget requires a wait for xcnt before atomic
   /// flat/global stores & rmw.
diff --git a/llvm/lib/TargetParser/TargetParser.cpp b/llvm/lib/TargetParser/TargetParser.cpp
index 34b09b14b0138..b90669029eea4 100644
--- a/llvm/lib/TargetParser/TargetParser.cpp
+++ b/llvm/lib/TargetParser/TargetParser.cpp
@@ -444,6 +444,7 @@ static void fillAMDGCNFeatureMap(StringRef GPU, const Triple &T,
     Features["atomic-fmin-fmax-global-f32"] = true;
     Features["atomic-fmin-fmax-global-f64"] = true;
     Features["wavefrontsize32"] = true;
+    Features["cluster"] = true;
     break;
   case GK_GFX1201:
   case GK_GFX1200:

jayfoad · 2025-10-06T09:47:23Z

llvm/lib/Target/AMDGPU/GCNSubtarget.h


  /// \returns true if the subtarget supports clusters of workgroups.
-  bool hasClusters() const { return GFX1250Insts; }
+  bool hasClusters() const { return HasCluster; }


Please call the target feature "HasClusters". The inconsistency is really annoying.

These should have generated names from tablegen, we just didn't define the macro for it and ended up reinventing all of these getters

This replaces the original arch check.

rupprecht · 2025-10-06T19:44:30Z

Running LLVM tests downstream w/ sanitizers, I'm seeing llvm/lib/Target/AMDGPU/GCNSubtarget.h:1842:37: runtime error: load of value 171, which is not a valid value for type 'bool', e.g. for tests like lld/test/ELF/lto/r600.ll. I'm not sure why I can't find similar failures on sanitizer buildbots, though I don't see why it wouldn't affect those. Is there something needed to make default initialization happen for this feature?

jyknight · 2025-10-06T22:53:38Z

The error arises because AMDGPUAttributor.cpp calls TM.getSubtarget<GCNSubtarget>(*F);, and despite appearances and its documentation, getSubtarget does an unchecked static_cast to the specified type. This code can apparently be invoked with an R600Subtarget, and casting an R600Subtarget to a GCNSubtarget is completely utterly broken.

The code in question was added to AMDGPUAttributor.cpp 3 weeks ago in #158076 -- so I guess it worked before only by complete accident...

jyknight · 2025-10-06T22:59:21Z

(And the debug assert that the correct TargetSubtargetInfo subclass is returned was removed in 2010, in f2f73bf).

jyknight · 2025-10-07T00:53:31Z

Proposed fix sent in #162207.

…n R600. (#162207) These passes call `getSubtarget<GCNSubtarget>`, which doesn't work on R600 targets, as that uses an `R600Subtarget` type, instead. Unfortunately, `TargetMachine::getSubtarget<ST>` does an unchecked static_cast to `ST&`, which makes it easy for this error to go undetected. The modifications here were verified by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream. These errors have been present for some time, but were detected after #162040 caused an uninitialized memory read to be reported by asan/msan.

[AMDGPU] Make cluster a target feature

ec37c37

This replaces the original arch check.

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU labels Oct 6, 2025

shiltian requested review from Pierre-vh, arsenm and rampitec October 6, 2025 04:37

arsenm approved these changes Oct 6, 2025

View reviewed changes

shiltian enabled auto-merge (squash) October 6, 2025 05:00

shiltian merged commit bea0225 into main Oct 6, 2025
12 checks passed

shiltian deleted the users/shiltian/cluster-as-target-feature branch October 6, 2025 05:05

jayfoad reviewed Oct 6, 2025

View reviewed changes

aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 6, 2025

[AMDGPU] Make cluster a target feature (llvm#162040)

d8c2ddc

This replaces the original arch check.

jyknight mentioned this pull request Oct 7, 2025

AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600. #162207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Make cluster a target feature #162040

[AMDGPU] Make cluster a target feature #162040

Uh oh!

shiltian commented Oct 6, 2025

Uh oh!

shiltian commented Oct 6, 2025

Uh oh!

llvmbot commented Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

jayfoad Oct 6, 2025

Uh oh!

arsenm Oct 6, 2025

Uh oh!

shiltian Oct 6, 2025

Uh oh!

rupprecht commented Oct 6, 2025

Uh oh!

jyknight commented Oct 6, 2025

Uh oh!

jyknight commented Oct 6, 2025

Uh oh!

jyknight commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[AMDGPU] Make cluster a target feature #162040

[AMDGPU] Make cluster a target feature #162040

Uh oh!

Conversation

shiltian commented Oct 6, 2025

Uh oh!

shiltian commented Oct 6, 2025

Uh oh!

llvmbot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jayfoad Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

rupprecht commented Oct 6, 2025

Uh oh!

jyknight commented Oct 6, 2025

Uh oh!

jyknight commented Oct 6, 2025

Uh oh!

jyknight commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

llvmbot commented Oct 6, 2025 •

edited

Loading