Skip to content

Conversation

jyknight
Copy link
Member

@jyknight jyknight commented Oct 7, 2025

These passes call getSubtarget<GCNSubtarget>, which doesn't work on R600 targets, as that uses an R600Subtarget type, instead. Unfortunately, TargetMachine::getSubtarget<ST> does an unchecked static_cast to ST&, which makes it easy for this error to go undetected.

The modifications here were verified by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream.

These errors have been present for some time, but were detected after #162040 caused an uninitialized memory read to be reported by asan/msan.

Unfortunately, `TargetMachine::getSubtarget<ST>` does an unchecked
static_cast to `ST&`, which makes it easy to get wrong.

The modifications here were created by running check-llvm with an
assert added to getSubtarget. However, that asssert requires that RTTI
is enabled, which LLVM doesn't use, so I've reverted the assert before
sending this fix upstream.
@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: James Y Knight (jyknight)

Changes

Unfortunately, TargetMachine::getSubtarget&lt;ST&gt; does an unchecked static_cast to ST&amp;, which makes it easy to get wrong.

The modifications here were created by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream.

These errors have been present for some time, but were detected after #162040 caused an uninitialized memory read to be reported by asan/msan.


Full diff: https://github.com/llvm/llvm-project/pull/162207.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+16-10)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index cb49936871e74..f6186f1fae777 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -162,12 +162,16 @@ class AMDGPUInformationCache : public InformationCache {
 
   /// Check if the subtarget has aperture regs.
   bool hasApertureRegs(Function &F) {
+    if (!TM.getTargetTriple().isAMDGCN())
+      return false;
     const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
     return ST.hasApertureRegs();
   }
 
   /// Check if the subtarget supports GetDoorbellID.
   bool supportsGetDoorbellID(Function &F) {
+    if (!TM.getTargetTriple().isAMDGCN())
+      return false;
     const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
     return ST.supportsGetDoorbellID();
   }
@@ -182,18 +186,18 @@ class AMDGPUInformationCache : public InformationCache {
 
   std::pair<unsigned, unsigned>
   getDefaultFlatWorkGroupSize(const Function &F) const {
-    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+    const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
     return ST.getDefaultFlatWorkGroupSize(F.getCallingConv());
   }
 
   std::pair<unsigned, unsigned>
   getMaximumFlatWorkGroupRange(const Function &F) {
-    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+    const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
     return {ST.getMinFlatWorkGroupSize(), ST.getMaxFlatWorkGroupSize()};
   }
 
   SmallVector<unsigned> getMaxNumWorkGroups(const Function &F) {
-    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+    const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
     return ST.getMaxNumWorkGroups(F);
   }
 
@@ -206,7 +210,7 @@ class AMDGPUInformationCache : public InformationCache {
   std::pair<unsigned, unsigned>
   getWavesPerEU(const Function &F,
                 std::pair<unsigned, unsigned> FlatWorkGroupSize) {
-    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+    const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
     return ST.getWavesPerEU(FlatWorkGroupSize, getLDSSize(F), F);
   }
 
@@ -217,7 +221,7 @@ class AMDGPUInformationCache : public InformationCache {
     if (!Val)
       return std::nullopt;
     if (!Val->second) {
-      const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+      const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
       Val->second = ST.getMaxWavesPerEU();
     }
     return std::make_pair(Val->first, *(Val->second));
@@ -227,13 +231,13 @@ class AMDGPUInformationCache : public InformationCache {
   getEffectiveWavesPerEU(const Function &F,
                          std::pair<unsigned, unsigned> WavesPerEU,
                          std::pair<unsigned, unsigned> FlatWorkGroupSize) {
-    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+    const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
     return ST.getEffectiveWavesPerEU(WavesPerEU, FlatWorkGroupSize,
                                      getLDSSize(F));
   }
 
   unsigned getMaxWavesPerEU(const Function &F) {
-    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+    const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
     return ST.getMaxWavesPerEU();
   }
 
@@ -1511,9 +1515,11 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
       A.getOrCreateAAFor<AAAMDWavesPerEU>(IRPosition::function(*F));
     }
 
-    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(*F);
-    if (!F->isDeclaration() && ST.hasClusters())
-      A.getOrCreateAAFor<AAAMDGPUClusterDims>(IRPosition::function(*F));
+    if (TM.getTargetTriple().isAMDGCN()) {
+      const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(*F);
+      if (!F->isDeclaration() && ST.hasClusters())
+        A.getOrCreateAAFor<AAAMDGPUClusterDims>(IRPosition::function(*F));
+    }
 
     for (auto &I : instructions(F)) {
       Value *Ptr = nullptr;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp b/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp
index 639089c75a33e..155c7dad904c5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp
@@ -281,7 +281,7 @@ bool optimizeSection(ArrayRef<SmallVector<IntrinsicInst *, 4>> MergeableInsts) {
 }
 
 static bool imageIntrinsicOptimizerImpl(Function &F, const TargetMachine *TM) {
-  if (!TM)
+  if (!TM || !TM->getTargetTriple().isAMDGCN())
     return false;
 
   // This optimization only applies to GFX11 and beyond.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead we should just not add the pass for r600

@jyknight
Copy link
Member Author

jyknight commented Oct 7, 2025

AMDGPUImageIntrinsicOptimizer: OK, will change that one.

But are you saying AMDGPUAttributor also shouldn't run on R600? I can change it if you like, but naively the pass seems like it's doing things that do apply to R600 too, and has some code explicitly for r600 intrinsics.

@shiltian
Copy link
Contributor

shiltian commented Oct 7, 2025

I'm not sure how much longer we still want to support r600.

@jyknight
Copy link
Member Author

jyknight commented Oct 7, 2025

I'm not sure how much longer we still want to support r600.

I have no opinion on that. But surely it should not be invoking UB in the test suite while it's still included, right?

@shiltian
Copy link
Contributor

shiltian commented Oct 7, 2025

That's right. However, I'm not sure if we instead want to disable AMDGPUAttributor for r600.

@jyknight
Copy link
Member Author

jyknight commented Oct 7, 2025

Ah, got it -- another vote for skipping the pass on r600 instead of making it work -- done!

if (!isLTOPreLink(Phase)) {
AMDGPUAttributorOptions Opts;
MPM.addPass(AMDGPUAttributorPass(*this, Opts, Phase));
if (getTargetTriple().isAMDGCN()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this check into the call back, closely wrap the attributor pass, since we might want to add more passes into this EP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jyknight jyknight changed the title AMDGPU: don't call getSubtarget<GCNSubtarget> on R600 targets. AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600 targets. Oct 7, 2025
@jyknight jyknight changed the title AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600 targets. AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600. Oct 7, 2025
@jyknight jyknight enabled auto-merge (squash) October 7, 2025 16:12
@jyknight jyknight merged commit 2f4275b into llvm:main Oct 7, 2025
9 checks passed
@jyknight jyknight deleted the fix-r600 branch October 7, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants