-
Notifications
You must be signed in to change notification settings - Fork 15.2k
AMDGPU: skip AMDGPUAttributor and AMDGPUImageIntrinsicOptimizerPass on R600. #162207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unfortunately, `TargetMachine::getSubtarget<ST>` does an unchecked static_cast to `ST&`, which makes it easy to get wrong. The modifications here were created by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream.
@llvm/pr-subscribers-backend-amdgpu Author: James Y Knight (jyknight) ChangesUnfortunately, The modifications here were created by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream. These errors have been present for some time, but were detected after #162040 caused an uninitialized memory read to be reported by asan/msan. Full diff: https://github.com/llvm/llvm-project/pull/162207.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index cb49936871e74..f6186f1fae777 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -162,12 +162,16 @@ class AMDGPUInformationCache : public InformationCache {
/// Check if the subtarget has aperture regs.
bool hasApertureRegs(Function &F) {
+ if (!TM.getTargetTriple().isAMDGCN())
+ return false;
const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
return ST.hasApertureRegs();
}
/// Check if the subtarget supports GetDoorbellID.
bool supportsGetDoorbellID(Function &F) {
+ if (!TM.getTargetTriple().isAMDGCN())
+ return false;
const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
return ST.supportsGetDoorbellID();
}
@@ -182,18 +186,18 @@ class AMDGPUInformationCache : public InformationCache {
std::pair<unsigned, unsigned>
getDefaultFlatWorkGroupSize(const Function &F) const {
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+ const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
return ST.getDefaultFlatWorkGroupSize(F.getCallingConv());
}
std::pair<unsigned, unsigned>
getMaximumFlatWorkGroupRange(const Function &F) {
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+ const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
return {ST.getMinFlatWorkGroupSize(), ST.getMaxFlatWorkGroupSize()};
}
SmallVector<unsigned> getMaxNumWorkGroups(const Function &F) {
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+ const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
return ST.getMaxNumWorkGroups(F);
}
@@ -206,7 +210,7 @@ class AMDGPUInformationCache : public InformationCache {
std::pair<unsigned, unsigned>
getWavesPerEU(const Function &F,
std::pair<unsigned, unsigned> FlatWorkGroupSize) {
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+ const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
return ST.getWavesPerEU(FlatWorkGroupSize, getLDSSize(F), F);
}
@@ -217,7 +221,7 @@ class AMDGPUInformationCache : public InformationCache {
if (!Val)
return std::nullopt;
if (!Val->second) {
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+ const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
Val->second = ST.getMaxWavesPerEU();
}
return std::make_pair(Val->first, *(Val->second));
@@ -227,13 +231,13 @@ class AMDGPUInformationCache : public InformationCache {
getEffectiveWavesPerEU(const Function &F,
std::pair<unsigned, unsigned> WavesPerEU,
std::pair<unsigned, unsigned> FlatWorkGroupSize) {
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+ const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
return ST.getEffectiveWavesPerEU(WavesPerEU, FlatWorkGroupSize,
getLDSSize(F));
}
unsigned getMaxWavesPerEU(const Function &F) {
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+ const AMDGPUSubtarget &ST = AMDGPUSubtarget::get(TM, F);
return ST.getMaxWavesPerEU();
}
@@ -1511,9 +1515,11 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
A.getOrCreateAAFor<AAAMDWavesPerEU>(IRPosition::function(*F));
}
- const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(*F);
- if (!F->isDeclaration() && ST.hasClusters())
- A.getOrCreateAAFor<AAAMDGPUClusterDims>(IRPosition::function(*F));
+ if (TM.getTargetTriple().isAMDGCN()) {
+ const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(*F);
+ if (!F->isDeclaration() && ST.hasClusters())
+ A.getOrCreateAAFor<AAAMDGPUClusterDims>(IRPosition::function(*F));
+ }
for (auto &I : instructions(F)) {
Value *Ptr = nullptr;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp b/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp
index 639089c75a33e..155c7dad904c5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUImageIntrinsicOptimizer.cpp
@@ -281,7 +281,7 @@ bool optimizeSection(ArrayRef<SmallVector<IntrinsicInst *, 4>> MergeableInsts) {
}
static bool imageIntrinsicOptimizerImpl(Function &F, const TargetMachine *TM) {
- if (!TM)
+ if (!TM || !TM->getTargetTriple().isAMDGCN())
return false;
// This optimization only applies to GFX11 and beyond.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead we should just not add the pass for r600
AMDGPUImageIntrinsicOptimizer: OK, will change that one. But are you saying AMDGPUAttributor also shouldn't run on R600? I can change it if you like, but naively the pass seems like it's doing things that do apply to R600 too, and has some code explicitly for r600 intrinsics. |
I'm not sure how much longer we still want to support r600. |
I have no opinion on that. But surely it should not be invoking UB in the test suite while it's still included, right? |
That's right. However, I'm not sure if we instead want to disable AMDGPUAttributor for r600. |
Ah, got it -- another vote for skipping the pass on r600 instead of making it work -- done! |
if (!isLTOPreLink(Phase)) { | ||
AMDGPUAttributorOptions Opts; | ||
MPM.addPass(AMDGPUAttributorPass(*this, Opts, Phase)); | ||
if (getTargetTriple().isAMDGCN()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put this check into the call back, closely wrap the attributor pass, since we might want to add more passes into this EP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
These passes call
getSubtarget<GCNSubtarget>
, which doesn't work on R600 targets, as that uses anR600Subtarget
type, instead. Unfortunately,TargetMachine::getSubtarget<ST>
does an unchecked static_cast toST&
, which makes it easy for this error to go undetected.The modifications here were verified by running check-llvm with an assert added to getSubtarget. However, that asssert requires that RTTI is enabled, which LLVM doesn't use, so I've reverted the assert before sending this fix upstream.
These errors have been present for some time, but were detected after #162040 caused an uninitialized memory read to be reported by asan/msan.