Skip to content
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 96 additions & 31 deletions llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -441,42 +441,96 @@ using IsExpiredFn = function_ref<bool(const MachineInstr &, int WaitStates)>;
using GetNumWaitStatesFn = function_ref<unsigned int(const MachineInstr &)>;

// Search for a hazard in a block and its predecessors.
// StateT must implement getHashValue() and isEqual().
template <typename StateT>
static bool
hasHazard(StateT State,
hasHazard(StateT InitialState,
function_ref<HazardFnResult(StateT &, const MachineInstr &)> IsHazard,
function_ref<void(StateT &, const MachineInstr &)> UpdateState,
Comment on lines 447 to 448
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to address in the current patch, but I think this IsHazard/UpdateState interface is confusing because IsHazard also updates the state. So technically I think there is no need for the separate UpdateState function, it could just become part of what IsHazard does before returning NoHazardFound.

const MachineBasicBlock *MBB,
MachineBasicBlock::const_reverse_instr_iterator I,
DenseSet<const MachineBasicBlock *> &Visited) {
for (auto E = MBB->instr_rend(); I != E; ++I) {
// No need to look at parent BUNDLE instructions.
if (I->isBundle())
continue;

switch (IsHazard(State, *I)) {
case HazardFound:
return true;
case HazardExpired:
return false;
default:
// Continue search
break;
const MachineBasicBlock *InitialMBB,
MachineBasicBlock::const_reverse_instr_iterator InitialI) {
SmallVector<std::pair<const MachineBasicBlock *, unsigned>> Worklist;
SmallDenseSet<std::pair<const MachineBasicBlock *, unsigned>> Visited;
SmallVector<std::pair<unsigned, unsigned>, 1> Collisions;
SmallDenseMap<unsigned, unsigned> StateHash2Idx;
SmallVector<StateT> States;

// States contains a vector of unique state structures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels overcomplicated and it's also similar to what MapVector does internally. I think you could probably reimplement this with a MapVector<StateT, void>, using MapVector::getArrayRef to convert between states and StateIdxs. This would have the (I claim) benefit that StateT would have to support standard DenseMap traits instead of the less standard getHashValue and isEqual methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping back a bit, why do you want this level of indirection in the first place? Is it "just" a performance thing, to avoid copying around states which could be large, and avoid storing multiple copies of identical states? The MapVector solution would still store two copies of each state, since MapVector has a SmallVector of keys+values as well as a DenseMap of the keys.

Another possibility might be to have a DenseMap of pointers to states hashed by the state itself not the pointer value, a bit like this does for MachineInstrs: **

/// Special DenseMapInfo traits to compare MachineInstr* by *value* of the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could probably reimplement MapVector<StateT, void>, using MapVector::getArrayRef
getArrayRef only provides access to the values, not the keys.

Stepping back a bit, why do you want this level of indirection in the first place?
Avoiding copying for performance is the primary reason. Because the data is backed by a SmallVector pointers are also not an option (will invalid).

DenseMap allows for alternate Key and LookupKey types specified via traits.
I have reimplemented using this mechanism; however, the actual key type needs to be a pointer to the backing SmallVector and an index into it, because traits methods are static -- if they were not static I could capture the SmallVector reference and keys would only need to be indexes into the array.

Overall implementation is more code, but runs with comparable performance to previous version in this PR. Slightly worse, but still net improvement against current code without this patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, the actual key type needs to be a pointer to the backing SmallVector and an index into it, because traits methods are static -- if they were not static I could capture the SmallVector reference and keys would only need to be indexes into the array.

If you use std::deque then you can work with pointers to the StateT objects, not indexes, because deque promises not to move existing allocations as it grows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can put up a version that uses deque and SmallDenseMap from pointer to pointer if you prefer?
It doesn't look that different to this, but it is measurably a little slower.

A lot of the benefit of this refactor comes from the fact that most hasHazard calls can be handled within a block.
Hence avoiding touching any data structures and doing any memory allocation is beneficial -- most std::deque implementation do alloca on init AFAIK.

Beyond improved correctness the secondary goal of this patch is to try and claw back some of the impact of #138663.
Using std::deque certainly negates a lot of that.
Version prior to current was still the best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I still don't like the hand-rolled hashtable implementation, but I guess I won't object if it's measurably faster and if you can find someone to review it for correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you accept the current implementation using SmallVector and SmallDenseMap?
To re-iterate, this version while not the best is still better than current implementation in both performance and correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll take another look at the current version.

// StateT is hashed via getHashValue() and StateHash2Idx maps each hash
// to an index in the States vector.
// In the unlikely event of a hash collision the Collision vector provides
// additional hash to index associations which must be retrieved by a linear
// scan.

// Retrieve unique constant index for a StateT structure in the States vector.
auto ResolveStateIdx = [&](const StateT State) {
unsigned StateHash = State.getHashValue();
unsigned StateIdx;
if (!StateHash2Idx.contains(StateHash)) {
StateIdx = States.size();
States.push_back(State);
StateHash2Idx[StateHash] = StateIdx;
} else {
StateIdx = StateHash2Idx[StateHash];
if (LLVM_UNLIKELY(!StateT::isEqual(State, States[StateIdx]))) {
// Hash collision
auto Collision = llvm::find_if(Collisions, [&](auto &C) {
return C.first == StateHash &&
StateT::isEqual(State, States[C.second]);
});
if (Collision != Collisions.end()) {
StateIdx = Collision->second;
} else {
StateIdx = States.size();
States.push_back(State);
Collisions.emplace_back(StateHash, StateIdx);
}
}
}
return StateIdx;
};

if (I->isInlineAsm() || I->isMetaInstruction())
continue;
const MachineBasicBlock *MBB = InitialMBB;
StateT State = InitialState;
auto I = InitialI;

UpdateState(State, *I);
}
for (;;) {
bool Expired = false;
for (auto E = MBB->instr_rend(); I != E; ++I) {
// No need to look at parent BUNDLE instructions.
if (I->isBundle())
continue;

for (MachineBasicBlock *Pred : MBB->predecessors()) {
if (!Visited.insert(Pred).second)
continue;
auto Result = IsHazard(State, *I);
if (Result == HazardFound)
return true;
if (Result == HazardExpired) {
Expired = true;
break;
}

if (hasHazard(State, IsHazard, UpdateState, Pred, Pred->instr_rbegin(),
Visited))
return true;
if (I->isInlineAsm() || I->isMetaInstruction())
continue;

UpdateState(State, *I);
}

if (!Expired) {
unsigned StateIdx = ResolveStateIdx(State);
for (MachineBasicBlock *Pred : MBB->predecessors()) {
if (!Visited.insert(std::pair(Pred, StateIdx)).second)
continue;
Worklist.emplace_back(Pred, StateIdx);
}
}

if (Worklist.empty())
break;

unsigned StateIdx;
std::tie(MBB, StateIdx) = Worklist.pop_back_val();
State = States[StateIdx];
I = MBB->instr_rbegin();
}

return false;
Expand Down Expand Up @@ -1641,6 +1695,14 @@ bool GCNHazardRecognizer::fixVALUPartialForwardingHazard(MachineInstr *MI) {
SmallDenseMap<Register, int, 4> DefPos;
int ExecPos = std::numeric_limits<int>::max();
int VALUs = 0;

unsigned getHashValue() const {
return hash_combine(ExecPos, VALUs, hash_combine_range(DefPos));
}
static bool isEqual(const StateType &LHS, const StateType &RHS) {
return LHS.DefPos == RHS.DefPos && LHS.ExecPos == RHS.ExecPos &&
LHS.VALUs == RHS.VALUs;
}
};

StateType State;
Expand Down Expand Up @@ -1735,9 +1797,8 @@ bool GCNHazardRecognizer::fixVALUPartialForwardingHazard(MachineInstr *MI) {
State.VALUs += 1;
};

DenseSet<const MachineBasicBlock *> Visited;
if (!hasHazard<StateType>(State, IsHazardFn, UpdateStateFn, MI->getParent(),
std::next(MI->getReverseIterator()), Visited))
std::next(MI->getReverseIterator())))
return false;

BuildMI(*MI->getParent(), MI, MI->getDebugLoc(),
Expand Down Expand Up @@ -1778,6 +1839,11 @@ bool GCNHazardRecognizer::fixVALUTransUseHazard(MachineInstr *MI) {
struct StateType {
int VALUs = 0;
int TRANS = 0;

unsigned getHashValue() const { return hash_combine(VALUs, TRANS); }
static bool isEqual(const StateType &LHS, const StateType &RHS) {
return LHS.VALUs == RHS.VALUs && LHS.TRANS == RHS.TRANS;
}
};

StateType State;
Expand Down Expand Up @@ -1813,9 +1879,8 @@ bool GCNHazardRecognizer::fixVALUTransUseHazard(MachineInstr *MI) {
State.TRANS += 1;
};

DenseSet<const MachineBasicBlock *> Visited;
if (!hasHazard<StateType>(State, IsHazardFn, UpdateStateFn, MI->getParent(),
std::next(MI->getReverseIterator()), Visited))
std::next(MI->getReverseIterator())))
return false;

// Hazard is observed - insert a wait on va_dst counter to ensure hazard is
Expand Down