-
Notifications
You must be signed in to change notification settings - Fork 15k
[GVN] MemorySSA for GVN: eliminate redundant loads via MemorySSA #152859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[GVN] MemorySSA for GVN: eliminate redundant loads via MemorySSA #152859
Conversation
Introduce the main algorithm performing redundant load elimination via MemorySSA in GVN. The entry point is `findReachingValuesForLoad`, which, given as input a possibly redundant load `L`, it attempts to provide as output a set of reaching memory values (`ReachingMemVal`), i.e., which values (defs or equivalent reads) can reach `L` along at least one path where that memory location is not modified meanwhile (if non-local, PRE will establish whether the load may be eliminated). Specifically, a reaching value may be of the following descriptor kind: * Def: found a new instruction that produces exactly the bits the load would read. For example, a must-alias store (which defines the load memory location), or a must-alias read (exactly reads the same memory location, found, e.g., after a phi-translation fixup); * Clobber: found a write that clobbers a superset of the bits the load would read. For example, a memset call over a memory region, whose value read overlaps such a region (and may be forwarded to the load), or a generic call clobbering the load that cannot be reused; * Other: a precise instruction could not be found, but know the block where the dependency exists (e.g., a memory location already live at function entry). We start off by looking for the users of the defining memory access of the given load within the same block, searching for local dependencies. We may need to iteratively follow the use-def chain of the defining memory access to find the actual clobbering access, while staying within the scope of the load. If an actual local def/clobber (or liveOnEntry) is found, we are done and return that one. Otherwise, we move visiting the predecessors of the load's block, considering the incoming value from the MemoryPhi as a potential clobbering access. Hence, we search for dependencies within the predecessor block itself. If the new potential clobbering access is a MemoryPhi, we continue visiting the transitive closure of the predecessor, recording its new potential clobbering access. In this phase, as we are exploring non-local definitions, the load itself may be considered a must-alias def as well when being examined from a backedge path.
b0bd023 to
bb3e1cc
Compare
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef([^a-zA-Z0-9_-]|$)|UndefValue::get)' 'HEAD~1' HEAD llvm/include/llvm/Transforms/Scalar/GVN.h llvm/lib/Transforms/Scalar/GVN.cpp llvm/test/Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll llvm/test/Transforms/GVN/PRE/load-metadata.ll llvm/test/Transforms/GVN/PRE/lpre-call-wrap.ll llvm/test/Transforms/GVN/PRE/phi-translate-2.ll llvm/test/Transforms/GVN/PRE/phi-translate-add.ll llvm/test/Transforms/GVN/PRE/phi-translate.ll llvm/test/Transforms/GVN/PRE/pre-after-rle.ll llvm/test/Transforms/GVN/PRE/pre-aliasning-path.ll llvm/test/Transforms/GVN/PRE/pre-load-dbg.ll llvm/test/Transforms/GVN/PRE/pre-load-guards.ll llvm/test/Transforms/GVN/PRE/pre-load-implicit-cf-updates.ll llvm/test/Transforms/GVN/PRE/pre-load.ll llvm/test/Transforms/GVN/PRE/pre-single-pred.ll llvm/test/Transforms/GVN/PRE/preserve-tbaa.ll llvm/test/Transforms/GVN/PRE/rle-addrspace-cast.ll llvm/test/Transforms/GVN/PRE/rle-semidominated.ll llvm/test/Transforms/GVN/PRE/rle.ll llvm/test/Transforms/GVN/nonescaping.ll llvm/test/Transforms/GVN/pr14166.ll llvm/test/Transforms/GVN/readattrs.ll llvm/test/Transforms/GVN/setjmp.ll llvm/test/Transforms/GVN/tbaa.ll llvm/test/Transforms/GVN/vscale.llThe following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
}Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
}Please refer to the Undefined Behavior Manual for more information. |
| ModRefInfo MR = AA.getModRefInfo(ClobberI, Loc); | ||
| // If may modify the location, analyze deeper, to exclude accesses to | ||
| // non-escaping local allocations. | ||
| if (MR == ModRefInfo::NoModRef || MR == ModRefInfo::Ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped callCapturesBefore() here (which, among other things, was previously miscompiling setjmp.ll test) to favour using BatchAA with EarliestEscapeAnalysis, based on feedback from @nikic.
| auto *MSSA = | ||
| isMemorySSAEnabled() ? AM.getCachedResult<MemorySSAAnalysis>(F) : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should avoid retrieving cached results (and thus computing MemorySSAUpdater) if MemorySSA is not enabled (missed from the initial patch, realized while running tests, may happen that both MD and MSSAU exist, as of now).
| auto *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(MA); | ||
| HasLocalDep = isa<MemoryDef>(Clobber) && Clobber->getBlock() == SuccBB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a recent addition. If not missing anything, it should suffice to check whether the identical load has a local dominating clobber, in order not to be hoisted.
| for (MemoryAccess *MA : ClobbersList) { | ||
| unsigned Scanned = 0; | ||
| for (User *U : MA->users()) { | ||
| if (++Scanned >= ScanUsersLimit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version includes the threshold, but unused (due to how we now search for clobbering accesses). Conservatively left it, might as well be dropped.
| // TODO: Should drop bitcast? | ||
| if (isa<BitCastInst>(I) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be possible to drop the bitcast check here while looking for an equivalent dominating !invariant.group load with the same pointer.
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Antonio Frighetto (antoniofrighetto) ChangesIntroduce the main algorithm performing redundant load elimination via MemorySSA in GVN. The entry point is Specifically, a reaching value may be of the following descriptor kind (
We start off by looking for the users of the defining memory access of the given load within the same block, searching for local dependencies. We may need to iteratively follow the use-def chain of the defining memory access to find the actual clobbering access, while staying within the scope of the load. If an actual local def/clobber (or liveOnEntry) is found, we are done and return that one. Otherwise, we move visiting the predecessors of the load's block, considering the incoming value from the MemoryPhi as a potential clobbering access. Hence, we search for dependencies within the predecessor block itself. If the new potential clobbering access is a MemoryPhi, we continue visiting the transitive closure of the predecessor, recording its new potential clobbering access. In this phase, as we are exploring non-local definitions, the load itself may be considered a must-alias def as well when being examined from a backedge path. A detailed description of the algorithm from the original patch follows. The algorithm consists of three parts:
First, we do a quick check for a local dependency, i.e., a memory read or write within the same basic block as the initial load instruction. Doing this first allows us to avoid having to disambiguate between the parts of the initial basic block We find the clobbering set of blocks by doing a walk of the reverse CFG (iterative DFS with a worklist), stopping (i.e., not visiting the predecessors further) at blocks/instructions which definitely modify the memory location. With each basic block, we maintain the following associated information:
The worklist is seeded with the predecessors of the starting basic block (the one that contains the initial load instruction). For each of these blocks, the memory address is the pointer operand of the load, possibly phi-translated and On each iteration we fetch and examine the next basic block. There are a few cases to consider:
There are a few subtle moments when transitioning to predecessors:
At this phase of the algorithm, the set we're building serves as the visited set so as to prevent infinite loops. Once this first phase of the algorithm completes, we have obtained the needed set of the basic block as well as the memory writing instructions that could modify our location. For each block, the initial clobbering memory access is the We still have to determine if there are memory reads that act as definitions (see above). Once again we do an iterative DFS on the reverse CFG. On each iteration we look for memory uses in the current block. For this we first construct a list of the memory clobbers that could be a defining memory access for our memory location. This list begins with the initial clobbering memory access for the current block, following the clobbers along the use-def chain up to the final clobbering memory access. If the final clobbering memory access belongs to the same block and is not a MemoryPhi that finishes building
In both cases, transition occurs if and only if the new block is already among the collected set of clobbering blocks. Once we have the list of clobbers, we look in turn at uses of each clobber in the list, that are in the current block, looking for memory reads which could provide (a superset of) the bits of the memory location. For those reads, we choose the one closest to the end of the block. A following (dominating) clobber in the list is processed only if no usable memory read was found so far. If an interesting memory use was not found, we look for a clobber in the current block. This was already determined in the phase which collected the set of clobbering block. If there is a clobber, we include it in the list of the results, otherwise the block is transparent, and we can add it to the worklist. Original patch: https://reviews.llvm.org/D116825, incrementally moved towards https://reviews.llvm.org/D130489 (primary change involves moving from a linear scan of the whole memory accesses of each basic block via getBlockAccesses() to building a list of potential clobbers and following their use-def chain.) Minor additions wrt the latest original version encompass style, improved naming, further comments, code hoisting / rearrangement opportunities, APIs update, discontinued API removal, partial tests update (complete tests update prior to turning on MSSA). Patch is 346.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152859.diff 25 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index bc0f108ac8260..ef2521220b0a2 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -19,6 +19,7 @@
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/PHITransAddr.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/PassManager.h"
@@ -37,8 +38,10 @@ class AAResults;
class AssumeInst;
class AssumptionCache;
class BasicBlock;
+class BatchAAResults;
class BranchInst;
class CallInst;
+class EarliestEscapeAnalysis;
class ExtractValueInst;
class Function;
class FunctionPass;
@@ -255,6 +258,7 @@ class GVNPass : public PassInfoMixin<GVNPass> {
OptimizationRemarkEmitter *ORE = nullptr;
ImplicitControlFlowTracking *ICF = nullptr;
LoopInfo *LI = nullptr;
+ AAResults *AA = nullptr;
MemorySSAUpdater *MSSAU = nullptr;
ValueTable VN;
@@ -344,21 +348,91 @@ class GVNPass : public PassInfoMixin<GVNPass> {
// List of critical edges to be split between iterations.
SmallVector<std::pair<Instruction *, unsigned>, 4> ToSplit;
+ enum class DepKind {
+ Other = 0, // Unknown value.
+ Def, // Exactly overlapping locations.
+ Clobber, // Reaching value superset of needed bits.
+ };
+
+ // Describe a memory location value, such that there exists a path to a point
+ // in the program, along which that memory location is not modified.
+ struct ReachingMemVal {
+ DepKind Kind;
+ BasicBlock *Block;
+ const Value *Addr;
+ Instruction *Inst;
+ int32_t Offset;
+
+ static ReachingMemVal getUnknown(BasicBlock *BB, const Value *Addr,
+ Instruction *Inst = nullptr) {
+ return {DepKind::Other, BB, Addr, Inst, -1};
+ }
+
+ static ReachingMemVal getDef(const Value *Addr, Instruction *Inst) {
+ return {DepKind::Def, Inst->getParent(), Addr, Inst, -1};
+ }
+
+ static ReachingMemVal getClobber(const Value *Addr, Instruction *Inst,
+ int32_t Offset = -1) {
+ return {DepKind::Clobber, Inst->getParent(), Addr, Inst, Offset};
+ }
+ };
+
+ struct DependencyBlockInfo {
+ DependencyBlockInfo() = delete;
+ DependencyBlockInfo(const PHITransAddr &Addr, MemoryAccess *ClobberMA)
+ : Addr(Addr), InitialClobberMA(ClobberMA), ClobberMA(ClobberMA),
+ ForceUnknown(false), Visited(false) {}
+ PHITransAddr Addr;
+ MemoryAccess *InitialClobberMA;
+ MemoryAccess *ClobberMA;
+ std::optional<ReachingMemVal> MemVal;
+ bool ForceUnknown : 1;
+ bool Visited : 1;
+ };
+
+ using DependencyBlockSet = DenseMap<BasicBlock *, DependencyBlockInfo>;
+
+ std::optional<GVNPass::ReachingMemVal> scanMemoryAccessesUsers(
+ const MemoryLocation &Loc, bool IsInvariantLoad, BasicBlock *BB,
+ const SmallVectorImpl<MemoryAccess *> &ClobbersList, MemorySSA &MSSA,
+ BatchAAResults &AA, LoadInst *L = nullptr);
+
+ std::optional<GVNPass::ReachingMemVal>
+ accessMayModifyLocation(MemoryAccess *ClobberMA, const MemoryLocation &Loc,
+ bool IsInvariantLoad, BasicBlock *BB, MemorySSA &MSSA,
+ BatchAAResults &AA);
+
+ bool collectPredecessors(BasicBlock *BB, const PHITransAddr &Addr,
+ MemoryAccess *ClobberMA, DependencyBlockSet &Blocks,
+ SmallVectorImpl<BasicBlock *> &Worklist);
+
+ void collectClobberList(SmallVectorImpl<MemoryAccess *> &Clobbers,
+ BasicBlock *BB, const DependencyBlockInfo &StartInfo,
+ const DependencyBlockSet &Blocks, MemorySSA &MSSA);
+
+ bool findReachingValuesForLoad(LoadInst *Inst,
+ SmallVectorImpl<ReachingMemVal> &Values,
+ MemorySSA &MSSA, AAResults &AA);
+
// Helper functions of redundant load elimination.
bool processLoad(LoadInst *L);
bool processMaskedLoad(IntrinsicInst *I);
bool processNonLocalLoad(LoadInst *L);
+ bool processNonLocalLoad(LoadInst *L, SmallVectorImpl<ReachingMemVal> &Deps);
bool processAssumeIntrinsic(AssumeInst *II);
/// Given a local dependency (Def or Clobber) determine if a value is
/// available for the load.
std::optional<gvn::AvailableValue>
- AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo, Value *Address);
+ AnalyzeLoadAvailability(LoadInst *Load, const ReachingMemVal &Dep,
+ Value *Address);
/// Given a list of non-local dependencies, determine if a value is
/// available for the load in each specified block. If it is, add it to
/// ValuesPerBlock. If not, add it to UnavailableBlocks.
- void AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
+ void AnalyzeLoadAvailability(LoadInst *Load,
+ SmallVectorImpl<ReachingMemVal> &Deps,
AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 72e1131a54a86..9eb1cf52d5013 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -116,6 +116,11 @@ static cl::opt<bool> GVNEnableMemDep("enable-gvn-memdep", cl::init(true));
static cl::opt<bool> GVNEnableMemorySSA("enable-gvn-memoryssa",
cl::init(false));
+static cl::opt<unsigned> ScanUsersLimit(
+ "gvn-scan-users-limit", cl::Hidden, cl::init(100),
+ cl::desc("The number of memory accesses to scan in a block in reaching "
+ "memory values analysis (default = 100)"));
+
static cl::opt<uint32_t> MaxNumDeps(
"gvn-max-num-deps", cl::Hidden, cl::init(100),
cl::desc("Max number of dependences to attempt Load PRE (default = 100)"));
@@ -887,7 +892,8 @@ PreservedAnalyses GVNPass::run(Function &F, FunctionAnalysisManager &AM) {
auto *MemDep =
isMemDepEnabled() ? &AM.getResult<MemoryDependenceAnalysis>(F) : nullptr;
auto &LI = AM.getResult<LoopAnalysis>(F);
- auto *MSSA = AM.getCachedResult<MemorySSAAnalysis>(F);
+ auto *MSSA =
+ isMemorySSAEnabled() ? AM.getCachedResult<MemorySSAAnalysis>(F) : nullptr;
if (isMemorySSAEnabled() && !MSSA) {
assert(!MemDep &&
"On-demand computation of MemSSA implies that MemDep is disabled!");
@@ -1280,7 +1286,7 @@ static const Instruction *findMayClobberedPtrAccess(LoadInst *Load,
/// Try to locate the three instruction involved in a missed
/// load-elimination case that is due to an intervening store.
-static void reportMayClobberedLoad(LoadInst *Load, MemDepResult DepInfo,
+static void reportMayClobberedLoad(LoadInst *Load, Instruction *DepInst,
const DominatorTree *DT,
OptimizationRemarkEmitter *ORE) {
using namespace ore;
@@ -1293,7 +1299,7 @@ static void reportMayClobberedLoad(LoadInst *Load, MemDepResult DepInfo,
if (OtherAccess)
R << " in favor of " << NV("OtherAccess", OtherAccess);
- R << " because it is clobbered by " << NV("ClobberedBy", DepInfo.getInst());
+ R << " because it is clobbered by " << NV("ClobberedBy", DepInst);
ORE->emit(R);
}
@@ -1321,15 +1327,16 @@ static Value *findDominatingValue(const MemoryLocation &Loc, Type *LoadTy,
}
std::optional<AvailableValue>
-GVNPass::AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
+GVNPass::AnalyzeLoadAvailability(LoadInst *Load, const ReachingMemVal &Dep,
Value *Address) {
assert(Load->isUnordered() && "rules below are incorrect for ordered access");
- assert(DepInfo.isLocal() && "expected a local dependence");
+ assert((Dep.Kind == DepKind::Def || Dep.Kind == DepKind::Clobber) &&
+ "expected a local dependence");
- Instruction *DepInst = DepInfo.getInst();
+ Instruction *DepInst = Dep.Inst;
const DataLayout &DL = Load->getDataLayout();
- if (DepInfo.isClobber()) {
+ if (Dep.Kind == DepKind::Clobber) {
// If the dependence is to a store that writes to a superset of the bits
// read by the load, we can extract the bits we need for the load from the
// stored value.
@@ -1354,17 +1361,23 @@ GVNPass::AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
if (DepLoad != Load && Address &&
Load->isAtomic() <= DepLoad->isAtomic()) {
Type *LoadType = Load->getType();
- int Offset = -1;
-
- // If MD reported clobber, check it was nested.
- if (DepInfo.isClobber() &&
- canCoerceMustAliasedValueToLoad(DepLoad, LoadType,
- DepLoad->getFunction())) {
- const auto ClobberOff = MD->getClobberOffset(DepLoad);
- // GVN has no deal with a negative offset.
- Offset = (ClobberOff == std::nullopt || *ClobberOff < 0)
- ? -1
- : *ClobberOff;
+ int Offset = Dep.Offset;
+
+ if (MD && !MSSAU) {
+ // If MD reported clobber, check it was nested.
+ if (canCoerceMustAliasedValueToLoad(DepLoad, LoadType,
+ DepLoad->getFunction())) {
+ const auto ClobberOff = MD->getClobberOffset(DepLoad);
+ // GVN has no deal with a negative offset.
+ Offset = (ClobberOff == std::nullopt || *ClobberOff < 0)
+ ? -1
+ : *ClobberOff;
+ }
+ } else {
+ if (!canCoerceMustAliasedValueToLoad(DepLoad, LoadType,
+ DepLoad->getFunction()) ||
+ Offset < 0)
+ Offset = -1;
}
if (Offset == -1)
Offset =
@@ -1391,11 +1404,11 @@ GVNPass::AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
dbgs() << "GVN: load "; Load->printAsOperand(dbgs());
dbgs() << " is clobbered by " << *DepInst << '\n';);
if (ORE->allowExtraAnalysis(DEBUG_TYPE))
- reportMayClobberedLoad(Load, DepInfo, DT, ORE);
+ reportMayClobberedLoad(Load, DepInst, DT, ORE);
return std::nullopt;
}
- assert(DepInfo.isDef() && "follows from above");
+ assert(Dep.Kind == DepKind::Def && "follows from above");
// Loading the alloca -> undef.
// Loading immediately after lifetime begin -> undef.
@@ -1463,7 +1476,8 @@ GVNPass::AnalyzeLoadAvailability(LoadInst *Load, MemDepResult DepInfo,
return std::nullopt;
}
-void GVNPass::AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
+void GVNPass::AnalyzeLoadAvailability(LoadInst *Load,
+ SmallVectorImpl<ReachingMemVal> &Deps,
AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks) {
// Filter out useless results (non-locals, etc). Keep track of the blocks
@@ -1471,8 +1485,7 @@ void GVNPass::AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
// dependencies that produce an unknown value for the load (such as a call
// that could potentially clobber the load).
for (const auto &Dep : Deps) {
- BasicBlock *DepBB = Dep.getBB();
- MemDepResult DepInfo = Dep.getResult();
+ BasicBlock *DepBB = Dep.Block;
if (DeadBlocks.count(DepBB)) {
// Dead dependent mem-op disguise as a load evaluating the same value
@@ -1481,7 +1494,7 @@ void GVNPass::AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
continue;
}
- if (!DepInfo.isLocal()) {
+ if (Dep.Kind == DepKind::Other) {
UnavailableBlocks.push_back(DepBB);
continue;
}
@@ -1489,7 +1502,8 @@ void GVNPass::AnalyzeLoadAvailability(LoadInst *Load, LoadDepVect &Deps,
// The address being loaded in this non-local block may not be the same as
// the pointer operand of the load if PHI translation occurs. Make sure
// to consider the right address.
- if (auto AV = AnalyzeLoadAvailability(Load, DepInfo, Dep.getAddress())) {
+ if (auto AV =
+ AnalyzeLoadAvailability(Load, Dep, const_cast<Value *>(Dep.Addr))) {
// subtlety: because we know this was a non-local dependency, we know
// it's safe to materialize anywhere between the instruction within
// DepInfo and the end of it's block.
@@ -1545,12 +1559,24 @@ LoadInst *GVNPass::findLoadToHoistIntoPred(BasicBlock *Pred, BasicBlock *LoadBB,
if (!Inst.isIdenticalTo(Load))
continue;
- MemDepResult Dep = MD->getDependency(&Inst);
+ bool HasLocalDep = true;
+ if (MD && !MSSAU) {
+ MemDepResult Dep = MD->getDependency(&Inst);
+ HasLocalDep = !Dep.isNonLocal();
+ } else {
+ auto *MSSA = MSSAU->getMemorySSA();
+ // Do not hoist if the identical load has ordering constraint.
+ if (auto *MA = MSSA->getMemoryAccess(&Inst); MA && isa<MemoryUse>(MA)) {
+ auto *Clobber = MSSA->getWalker()->getClobberingMemoryAccess(MA);
+ HasLocalDep = isa<MemoryDef>(Clobber) && Clobber->getBlock() == SuccBB;
+ }
+ }
+
// If an identical load doesn't depends on any local instructions, it can
// be safely moved to PredBB.
// Also check for the implicit control flow instructions. See the comments
// in PerformLoadPRE for details.
- if (Dep.isNonLocal() && !ICF->isDominatedByICFIFromSameBlock(&Inst))
+ if (!HasLocalDep && !ICF->isDominatedByICFIFromSameBlock(&Inst))
return cast<LoadInst>(&Inst);
// Otherwise there is something in the same BB clobbers the memory, we can't
@@ -1607,7 +1633,8 @@ void GVNPass::eliminatePartiallyRedundantLoad(
// Add the newly created load.
ValuesPerBlock.push_back(
AvailableValueInBlock::get(UnavailableBlock, NewLoad));
- MD->invalidateCachedPointerInfo(LoadPtr);
+ if (MD)
+ MD->invalidateCachedPointerInfo(LoadPtr);
LLVM_DEBUG(dbgs() << "GVN INSERTED " << *NewLoad << '\n');
// For PredBB in CriticalEdgePredAndLoad we need to replace the uses of old
@@ -1637,7 +1664,7 @@ void GVNPass::eliminatePartiallyRedundantLoad(
V->takeName(Load);
if (Instruction *I = dyn_cast<Instruction>(V))
I->setDebugLoc(Load->getDebugLoc());
- if (V->getType()->isPtrOrPtrVectorTy())
+ if (MD && V->getType()->isPtrOrPtrVectorTy())
MD->invalidateCachedPointerInfo(V);
ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "LoadPRE", Load)
@@ -1992,13 +2019,11 @@ static void reportLoadElim(LoadInst *Load, Value *AvailableValue,
/// non-local by performing PHI construction.
bool GVNPass::processNonLocalLoad(LoadInst *Load) {
// Non-local speculations are not allowed under asan.
- if (Load->getParent()->getParent()->hasFnAttribute(
- Attribute::SanitizeAddress) ||
- Load->getParent()->getParent()->hasFnAttribute(
- Attribute::SanitizeHWAddress))
+ if (Load->getFunction()->hasFnAttribute(Attribute::SanitizeAddress) ||
+ Load->getFunction()->hasFnAttribute(Attribute::SanitizeHWAddress))
return false;
- // Step 1: Find the non-local dependencies of the load.
+ // Find the non-local dependencies of the load.
LoadDepVect Deps;
MD->getNonLocalPointerDependency(Load, Deps);
@@ -2009,10 +2034,27 @@ bool GVNPass::processNonLocalLoad(LoadInst *Load) {
if (NumDeps > MaxNumDeps)
return false;
+ SmallVector<ReachingMemVal, 64> MemVals;
+ for (const NonLocalDepResult &Dep : Deps) {
+ Value *Address = Dep.getAddress();
+ BasicBlock *BB = Dep.getBB();
+ Instruction *Inst = Dep.getResult().getInst();
+ if (Dep.getResult().isClobber())
+ MemVals.emplace_back(ReachingMemVal::getClobber(Address, Inst));
+ else if (Dep.getResult().isDef())
+ MemVals.emplace_back(ReachingMemVal::getDef(Address, Inst));
+ else
+ MemVals.emplace_back(ReachingMemVal::getUnknown(BB, Address, Inst));
+ }
+
+ return processNonLocalLoad(Load, MemVals);
+}
+
+bool GVNPass::processNonLocalLoad(LoadInst *Load,
+ SmallVectorImpl<ReachingMemVal> &Deps) {
// If we had a phi translation failure, we'll have a single entry which is a
// clobber in the current block. Reject this early.
- if (NumDeps == 1 &&
- !Deps[0].getResult().isDef() && !Deps[0].getResult().isClobber()) {
+ if (Deps.size() == 1 && Deps[0].Kind == DepKind::Other) {
LLVM_DEBUG(dbgs() << "GVN: non-local load "; Load->printAsOperand(dbgs());
dbgs() << " has unknown dependencies\n";);
return false;
@@ -2027,7 +2069,7 @@ bool GVNPass::processNonLocalLoad(LoadInst *Load) {
Changed |= performScalarPRE(I);
}
- // Step 2: Analyze the availability of the load.
+ // Step 1: Analyze the availability of the load.
AvailValInBlkVect ValuesPerBlock;
UnavailBlkVect UnavailableBlocks;
AnalyzeLoadAvailability(Load, Deps, ValuesPerBlock, UnavailableBlocks);
@@ -2037,7 +2079,7 @@ bool GVNPass::processNonLocalLoad(LoadInst *Load) {
if (ValuesPerBlock.empty())
return Changed;
- // Step 3: Eliminate fully redundancy.
+ // Step 2: Eliminate fully redundancy.
//
// If all of the instructions we depend on produce a known value for this
// load, then it is fully redundant and we can use PHI insertion to compute
@@ -2059,7 +2101,7 @@ bool GVNPass::processNonLocalLoad(LoadInst *Load) {
// to propagate Load's DebugLoc because Load may not post-dominate I.
if (Load->getDebugLoc() && Load->getParent() == I->getParent())
I->setDebugLoc(Load->getDebugLoc());
- if (V->getType()->isPtrOrPtrVectorTy())
+ if (MD && V->getType()->isPtrOrPtrVectorTy())
MD->invalidateCachedPointerInfo(V);
++NumGVNLoad;
reportLoadElim(Load, V, ORE);
@@ -2067,7 +2109,7 @@ bool GVNPass::processNonLocalLoad(LoadInst *Load) {
return true;
}
- // Step 4: Eliminate partial redundancy.
+ // Step 3: Eliminate partial redundancy.
if (!isPREEnabled() || !isLoadPREEnabled())
return Changed;
if (!isLoadInLoopPREEnabled() && LI->getLoopFor(Load->getParent()))
@@ -2146,10 +2188,530 @@ static void patchAndReplaceAllUsesWith(Instruction *I, Value *Repl) {
I->replaceAllUsesWith(Repl);
}
+/// If a load has !invariant.group, try to find the most-dominating instruction
+/// with the same metadata and equivalent pointer (modulo bitcasts and zero
+/// GEPs). If one is found that dominates the load, its value can be reused.
+static Instruction *findInvariantGroupValue(LoadInst *L, DominatorTree &DT) {
+ Value *PointerOperand = L->getPointerOperand()->stripPointerCasts();
+
+ // It's not safe to walk the use list of a global value because function
+ // passes aren't allowed to look outside their functions.
+ // FIXME: this could be fixed by filtering instructions from outside of
+ // current function.
+ if (isa<Constant>(PointerOperand))
+ return nullptr;
+
+ // Queue to process all pointers that are equivalent to load operand.
+ SmallVector<Value *, 8> PointerUsesQueue;
+ PointerUsesQueue.push_back(PointerOperand);
+
+ Instruction *MostDominatingInstruction = L;
+
+ // FIXME: This loop is potentially O(n^2) due to repeated dominates checks.
+ while (!PointerUsesQueue.empty()) {
+ Value *Ptr = PointerUsesQueue.pop_back_val();
+ assert(Ptr && !isa<GlobalValue>(Ptr) &&
+ "Null or GlobalValue should not be inserted");
+
+ for (User *U : Ptr->users()) {
+ auto *I = dyn_cast<Instruction>(U);
+ if (!I || I == L || !DT.dominates(I, MostDominatingInstruction))
+ continue;
+
+ // Add bitcasts and zero GEPs to queue.
+ // TODO: Should drop bitcast?
+ if (isa<BitCastInst>(I) ||
+ (isa<GetElementPtrInst>(I) &&
+ cast<GetElementPtrInst>(I)->hasAllZeroIndices())) {
+ PointerUsesQueue.push_back(I);
+ continue;
+ }
+
+ // If we hit a load/store with an invariant.group metadata and the same
+ // pointer operand, we can assume that value pointed to by the pointer
+ // operand didn't change.
+ if (I->hasMetadata(LLVMContext::MD_invariant_group) &&
+ Ptr == getLoadStorePointerOperand(I) && !I->isVolatile())
+ MostDominatingInstruction = I;
+ }
+ }
+
+ return MostDominatingInstruction != L ? MostDominatingInstruction : nullptr;
+}
+
+// Return the me...
[truncated]
|
Introduce the main algorithm performing redundant load elimination via MemorySSA in GVN. The entry point is
findReachingValuesForLoad, which, given as input a possibly redundant loadL, it attempts to provide as output a set of reaching memory values (ReachingMemVal), i.e., which values (defs or equivalent reads) can reachLalong at least one path where that memory location is not modified meanwhile (if non-local, PRE will establish whether the load may be eliminated).Specifically, a reaching value may be of the following descriptor kind (
DepKind):We start off by looking for the users of the defining memory access of the given load within the same block, searching for local dependencies. We may need to iteratively follow the use-def chain of the defining memory access to find the actual clobbering access, while staying within the scope of the load. If an actual local def/clobber (or liveOnEntry) is found, we are done and return that one. Otherwise, we move visiting the predecessors of the load's block, considering the incoming value from the MemoryPhi as a potential clobbering access. Hence, we search for dependencies within the predecessor block itself. If the new potential clobbering access is a MemoryPhi, we continue visiting the transitive closure of the predecessor, recording its new potential clobbering access. In this phase, as we are exploring non-local definitions, the load itself may be considered a must-alias def as well when being examined from a backedge path.
A detailed description of the algorithm from the original patch follows.
The algorithm consists of three parts:
First, we do a quick check for a local dependency, i.e., a memory read or write within the same basic block as the initial load instruction. Doing this first allows us to avoid having to disambiguate between the parts of the initial basic block
before and after the original load instruction (when entered from a backedge).
We find the clobbering set of blocks by doing a walk of the reverse CFG (iterative DFS with a worklist), stopping (i.e., not visiting the predecessors further) at blocks/instructions which definitely modify the memory location. With each basic block, we maintain the following associated information:
The worklist is seeded with the predecessors of the starting basic block (the one that contains the initial load instruction). For each of these blocks, the memory address is the pointer operand of the load, possibly phi-translated and
the initial clobbering memory access is the defining access of the load, again, possibly phi-translated. The final clobbering memory access is at first the same as the initial one. These blocks are added to the set.
On each iteration we fetch and examine the next basic block. There are a few cases to consider:
There are a few subtle moments when transitioning to predecessors:
At this phase of the algorithm, the set we're building serves as the visited set so as to prevent infinite loops.
Once this first phase of the algorithm completes, we have obtained the needed set of the basic block as well as the memory writing instructions that could modify our location. For each block, the initial clobbering memory access is the
one we enter the block with, and there is a use-def chain to the final clobbering memory access where every access is non-aliasing, except for the final one.
We still have to determine if there are memory reads that act as definitions (see above). Once again we do an iterative DFS on the reverse CFG.
On each iteration we look for memory uses in the current block. For this we first construct a list of the memory clobbers that could be a defining memory access for our memory location. This list begins with the initial clobbering memory access for the current block, following the clobbers along the use-def chain up to the final clobbering memory access. If the final clobbering memory access belongs to the same block and is not a MemoryPhi that finishes building
the list – we cannot have an aliasing use in the block, that is not a use of the final clobbering memory access. Otherwise, we look to extend the list in a couple of ways:
In both cases, transition occurs if and only if the new block is already among the collected set of clobbering blocks.
Once we have the list of clobbers, we look in turn at uses of each clobber in the list, that are in the current block, looking for memory reads which could provide (a superset of) the bits of the memory location. For those reads, we choose the one closest to the end of the block. A following (dominating) clobber in the list is processed only if no usable memory read was found so far.
If an interesting memory use was not found, we look for a clobber in the current block. This was already determined in the phase which collected the set of clobbering block. If there is a clobber, we include it in the list of the results, otherwise the block is transparent, and we can add it to the worklist.
Original patch: https://reviews.llvm.org/D116825, incrementally moved towards https://reviews.llvm.org/D130489 (primary change involves moving from a linear scan of the whole memory accesses of each basic block via getBlockAccesses() to building a list of potential clobbers and following their use-def chain.)
Minor additions wrt the latest original version encompass style, improved naming, further comments, code hoisting / rearrangement opportunities, APIs update, discontinued API removal, partial tests update (complete tests update prior to turning on MSSA).