Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@

#include "AMDGPU.h"
#include "GCNSubtarget.h"
#include "SIRegisterInfo.h"
#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/Analysis/CaptureTracking.h"
#include "llvm/Analysis/InstSimplifyFolder.h"
Expand All @@ -36,6 +38,7 @@
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IntrinsicsR600.h"
Expand All @@ -45,6 +48,9 @@
#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"

#include <algorithm>
#include <unordered_set>

#define DEBUG_TYPE "amdgpu-promote-alloca"

using namespace llvm;
Expand Down Expand Up @@ -100,6 +106,14 @@ class AMDGPUPromoteAllocaImpl {
unsigned VGPRBudgetRatio;
unsigned MaxVectorRegs;

std::unordered_map<BasicBlock *, std::unordered_set<Instruction *>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on density of your map one could use getNumber() on BasicBlock to get a unique number for mapping.
E.g. this could be a SmallVector of sets with a size of F.getMaxBlockNumber().
For the set I'd suggest using a SmallPtrSet instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may perform slightly better, but it would be harder to read. Perhaps it would be possible to encapsulate this algorithm into a set data structure somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgymnich I can change it to SmallPtrSet for the value type, but the problem with using a vector is that I need to know whether the map contains a value. That would require making a vector of pointers to the SmallPtrSet instead of a vector of SmallPtrSets. I'm not sure if that would perform better than what I currently have.

SGPRLiveIns;
size_t getSGPRPressureEstimate(AllocaInst &I);

std::unordered_map<BasicBlock *, std::unordered_set<Instruction *>>
VGPRLiveIns;
size_t getVGPRPressureEstimate(AllocaInst &I);

bool IsAMDGCN = false;
bool IsAMDHSA = false;

Expand Down Expand Up @@ -1471,9 +1485,87 @@ bool AMDGPUPromoteAllocaImpl::hasSufficientLocalMem(const Function &F) {
return true;
}

size_t AMDGPUPromoteAllocaImpl::getSGPRPressureEstimate(AllocaInst &I) {
Function &F = *I.getFunction();
size_t MaxLive = 0;
for (BasicBlock *BB : post_order(&F)) {
if (SGPRLiveIns.count(BB))
continue;

std::unordered_set<Instruction *> CurrentlyLive;
for (BasicBlock *SuccBB : successors(BB))
if (SGPRLiveIns.count(SuccBB))
for (const auto &R : SGPRLiveIns[SuccBB])
CurrentlyLive.insert(R);

for (auto RIt = BB->rbegin(); RIt != BB->rend(); RIt++) {
if (&*RIt == &I)
return MaxLive;

MaxLive = std::max(MaxLive, CurrentlyLive.size());

for (auto &Op : RIt->operands())
if (!Op.get()->getType()->isVectorTy())
if (Instruction *U = dyn_cast<Instruction>(Op))
CurrentlyLive.insert(U);

if (!RIt->getType()->isVectorTy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGPR does not mean "not an IR vector type"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm do you know of a better heuristic for estimating SGPR pressure at the IR level?

CurrentlyLive.erase(&*RIt);
}

SGPRLiveIns[BB] = CurrentlyLive;
}

llvm_unreachable("Woops, we fell off the edge of the world. Bye bye.");
}

size_t AMDGPUPromoteAllocaImpl::getVGPRPressureEstimate(AllocaInst &I) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IR has no knowledge of SGPRs or VGPRs, and you're missing out on all the pressures exposed by legalization

Function &F = *I.getParent()->getParent();
size_t MaxLive = 0;
for (BasicBlock *BB : post_order(&F)) {
if (VGPRLiveIns.count(BB))
continue;

std::unordered_set<Instruction *> CurrentlyLive;
for (BasicBlock *SuccBB : successors(BB))
if (VGPRLiveIns.count(SuccBB))
for (const auto &R : VGPRLiveIns[SuccBB])
CurrentlyLive.insert(R);

for (auto RIt = BB->rbegin(); RIt != BB->rend(); RIt++) {
if (&*RIt == &I)
return MaxLive;

MaxLive = std::max(MaxLive, CurrentlyLive.size());

for (auto &Op : RIt->operands())
if (Op.get()->getType()->isVectorTy())
if (Instruction *U = dyn_cast<Instruction>(Op))
CurrentlyLive.insert(U);

if (RIt->getType()->isVectorTy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VGPR doesn't mean "IR vector type"

Copy link
Contributor Author

@linuxrocks123 linuxrocks123 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm do you know of a better heuristic for estimating VGPR pressure at the IR level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but the type has nothing to do with it. We don't have any real / precise attempts pressure heuristics in IR

CurrentlyLive.erase(&*RIt);
}

VGPRLiveIns[BB] = CurrentlyLive;
}

llvm_unreachable("Woops, we fell off the edge of the world. Bye bye.");
}

// FIXME: Should try to pick the most likely to be profitable allocas first.
bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToLDS(AllocaInst &I,
bool SufficientLDS) {
const unsigned SGPRPressureLimit = AMDGPU::SGPR_32RegClass.getNumRegs();
const unsigned VGPRPressureLimit = AMDGPU::VGPR_32RegClass.getNumRegs();

if (getSGPRPressureEstimate(I) > SGPRPressureLimit ||
getVGPRPressureEstimate(I) > VGPRPressureLimit) {
LLVM_DEBUG(dbgs() << "Declining to promote " << I
<< " to LDS since pressure is relatively high.\n");
return false;
}

LLVM_DEBUG(dbgs() << "Trying to promote to LDS: " << I << '\n');

if (DisablePromoteAllocaToLDS) {
Expand Down