-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes #168052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesMerge the following classes into
They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review. Patch is 51.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168052.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 6ab8d5521ebdb..b3c56c325e210 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -302,16 +302,17 @@ class SICacheControl {
SICacheControl(const GCNSubtarget &ST);
- /// Sets named bit \p BitName to "true" if present in instruction \p MI.
+ /// Sets CPol \p Bits to "true" if present in instruction \p MI.
/// \returns Returns true if \p MI is modified, false otherwise.
- bool enableNamedBit(const MachineBasicBlock::iterator MI,
- AMDGPU::CPol::CPol Bit) const;
+ bool enableCPolBits(const MachineBasicBlock::iterator MI,
+ unsigned Bits) const;
/// Check if any atomic operation on AS can affect memory accessible via the
/// global address space.
bool canAffectGlobalAddrSpace(SIAtomicAddrSpace AS) const;
public:
+ using CPol = AMDGPU::CPol::CPol;
/// Create a cache control for the subtarget \p ST.
static std::unique_ptr<SICacheControl> create(const GCNSubtarget &ST);
@@ -401,21 +402,9 @@ class SICacheControl {
virtual ~SICacheControl() = default;
};
+/// Generates code sequences for the memory model of all GFX targets below
+/// GFX10.
class SIGfx6CacheControl : public SICacheControl {
-protected:
-
- /// Sets GLC bit to "true" if present in \p MI. Returns true if \p MI
- /// is modified, false otherwise.
- bool enableGLCBit(const MachineBasicBlock::iterator &MI) const {
- return enableNamedBit(MI, AMDGPU::CPol::GLC);
- }
-
- /// Sets SLC bit to "true" if present in \p MI. Returns true if \p MI
- /// is modified, false otherwise.
- bool enableSLCBit(const MachineBasicBlock::iterator &MI) const {
- return enableNamedBit(MI, AMDGPU::CPol::SLC);
- }
-
public:
SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -454,114 +443,9 @@ class SIGfx6CacheControl : public SICacheControl {
Position Pos) const override;
};
-class SIGfx7CacheControl : public SIGfx6CacheControl {
-public:
-
- SIGfx7CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
-
- bool insertAcquire(MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace,
- Position Pos) const override;
-
-};
-
-class SIGfx90ACacheControl : public SIGfx7CacheControl {
-public:
-
- SIGfx90ACacheControl(const GCNSubtarget &ST) : SIGfx7CacheControl(ST) {}
-
- bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
- SIAtomicAddrSpace AddrSpace, SIMemOp Op,
- bool IsVolatile, bool IsNonTemporal,
- bool IsLastUse) const override;
-
- bool insertWait(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace, SIMemOp Op,
- bool IsCrossAddrSpaceOrdering, Position Pos,
- AtomicOrdering Order, bool AtomicsOnly) const override;
-
- bool insertAcquire(MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace,
- Position Pos) const override;
-
- bool insertRelease(MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace,
- bool IsCrossAddrSpaceOrdering,
- Position Pos) const override;
-};
-
-class SIGfx940CacheControl : public SIGfx90ACacheControl {
-protected:
-
- /// Sets SC0 bit to "true" if present in \p MI. Returns true if \p MI
- /// is modified, false otherwise.
- bool enableSC0Bit(const MachineBasicBlock::iterator &MI) const {
- return enableNamedBit(MI, AMDGPU::CPol::SC0);
- }
-
- /// Sets SC1 bit to "true" if present in \p MI. Returns true if \p MI
- /// is modified, false otherwise.
- bool enableSC1Bit(const MachineBasicBlock::iterator &MI) const {
- return enableNamedBit(MI, AMDGPU::CPol::SC1);
- }
-
- /// Sets NT bit to "true" if present in \p MI. Returns true if \p MI
- /// is modified, false otherwise.
- bool enableNTBit(const MachineBasicBlock::iterator &MI) const {
- return enableNamedBit(MI, AMDGPU::CPol::NT);
- }
-
-public:
- SIGfx940CacheControl(const GCNSubtarget &ST) : SIGfx90ACacheControl(ST) {};
-
- bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
- SIAtomicAddrSpace AddrSpace, SIMemOp Op,
- bool IsVolatile, bool IsNonTemporal,
- bool IsLastUse) const override;
-
- bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-
- bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace, bool IsCrossAddrSpaceOrdering,
- Position Pos) const override;
-};
-
-class SIGfx10CacheControl : public SIGfx7CacheControl {
-protected:
-
- /// Sets DLC bit to "true" if present in \p MI. Returns true if \p MI
- /// is modified, false otherwise.
- bool enableDLCBit(const MachineBasicBlock::iterator &MI) const {
- return enableNamedBit(MI, AMDGPU::CPol::DLC);
- }
-
+class SIGfx10CacheControl : public SIGfx6CacheControl {
public:
-
- SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx7CacheControl(ST) {}
+ SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
@@ -601,6 +485,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
// \returns Returns true if \p MI is modified, false otherwise.
bool setTH(const MachineBasicBlock::iterator MI,
AMDGPU::CPol::CPol Value) const;
+
// Sets Scope policy to \p Value if CPol operand is present in instruction \p
// MI. \returns Returns true if \p MI is modified, false otherwise.
bool setScope(const MachineBasicBlock::iterator MI,
@@ -1006,13 +891,13 @@ SICacheControl::SICacheControl(const GCNSubtarget &ST) : ST(ST) {
InsertCacheInv = !AmdgcnSkipCacheInvalidations;
}
-bool SICacheControl::enableNamedBit(const MachineBasicBlock::iterator MI,
- AMDGPU::CPol::CPol Bit) const {
+bool SICacheControl::enableCPolBits(const MachineBasicBlock::iterator MI,
+ unsigned Bits) const {
MachineOperand *CPol = TII->getNamedOperand(*MI, AMDGPU::OpName::cpol);
if (!CPol)
return false;
- CPol->setImm(CPol->getImm() | Bit);
+ CPol->setImm(CPol->getImm() | Bits);
return true;
}
@@ -1028,14 +913,8 @@ bool SICacheControl::canAffectGlobalAddrSpace(SIAtomicAddrSpace AS) const {
/* static */
std::unique_ptr<SICacheControl> SICacheControl::create(const GCNSubtarget &ST) {
GCNSubtarget::Generation Generation = ST.getGeneration();
- if (ST.hasGFX940Insts())
- return std::make_unique<SIGfx940CacheControl>(ST);
- if (ST.hasGFX90AInsts())
- return std::make_unique<SIGfx90ACacheControl>(ST);
- if (Generation <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
- return std::make_unique<SIGfx6CacheControl>(ST);
if (Generation < AMDGPUSubtarget::GFX10)
- return std::make_unique<SIGfx7CacheControl>(ST);
+ return std::make_unique<SIGfx6CacheControl>(ST);
if (Generation < AMDGPUSubtarget::GFX11)
return std::make_unique<SIGfx10CacheControl>(ST);
if (Generation < AMDGPUSubtarget::GFX12)
@@ -1048,33 +927,61 @@ bool SIGfx6CacheControl::enableLoadCacheBypass(
SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace) const {
assert(MI->mayLoad() && !MI->mayStore());
- bool Changed = false;
- if (canAffectGlobalAddrSpace(AddrSpace)) {
- switch (Scope) {
- case SIAtomicScope::SYSTEM:
- case SIAtomicScope::AGENT:
+ if (!canAffectGlobalAddrSpace(AddrSpace)) {
+ /// The scratch address space does not need the global memory caches
+ /// to be bypassed as all memory operations by the same thread are
+ /// sequentially consistent, and no other thread can access scratch
+ /// memory.
+
+ /// Other address spaces do not have a cache.
+ return false;
+ }
+
+ bool Changed = false;
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ if (ST.hasGFX940Insts()) {
+ // Set SC bits to indicate system scope.
+ Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+ break;
+ }
+ [[fallthrough]];
+ case SIAtomicScope::AGENT:
+ if (ST.hasGFX940Insts()) {
+ // Set SC bits to indicate agent scope.
+ Changed |= enableCPolBits(MI, CPol::SC1);
+ } else {
// Set L1 cache policy to MISS_EVICT.
// Note: there is no L2 cache bypass policy at the ISA level.
- Changed |= enableGLCBit(MI);
- break;
- case SIAtomicScope::WORKGROUP:
- case SIAtomicScope::WAVEFRONT:
- case SIAtomicScope::SINGLETHREAD:
- // No cache to bypass.
- break;
- default:
- llvm_unreachable("Unsupported synchronization scope");
+ Changed |= enableCPolBits(MI, CPol::GLC);
+ }
+ break;
+ case SIAtomicScope::WORKGROUP:
+ if (ST.hasGFX940Insts()) {
+ // In threadgroup split mode the waves of a work-group can be executing
+ // on different CUs. Therefore need to bypass the L1 which is per CU.
+ // Otherwise in non-threadgroup split mode all waves of a work-group are
+ // on the same CU, and so the L1 does not need to be bypassed. Setting
+ // SC bits to indicate work-group scope will do this automatically.
+ Changed |= enableCPolBits(MI, CPol::SC0);
+ } else if (ST.hasGFX90AInsts()) {
+ // In threadgroup split mode the waves of a work-group can be executing
+ // on different CUs. Therefore need to bypass the L1 which is per CU.
+ // Otherwise in non-threadgroup split mode all waves of a work-group are
+ // on the same CU, and so the L1 does not need to be bypassed.
+ if (ST.isTgSplitEnabled())
+ Changed |= enableCPolBits(MI, CPol::GLC);
}
+ break;
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // No cache to bypass.
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
}
- /// The scratch address space does not need the global memory caches
- /// to be bypassed as all memory operations by the same thread are
- /// sequentially consistent, and no other thread can access scratch
- /// memory.
-
- /// Other address spaces do not have a cache.
-
return Changed;
}
@@ -1085,8 +992,39 @@ bool SIGfx6CacheControl::enableStoreCacheBypass(
assert(!MI->mayLoad() && MI->mayStore());
bool Changed = false;
- /// The L1 cache is write through so does not need to be bypassed. There is no
- /// bypass control for the L2 cache at the isa level.
+ /// For targets other than GFX940, the L1 cache is write through so does not
+ /// need to be bypassed. There is no bypass control for the L2 cache at the
+ /// isa level.
+
+ if (ST.hasGFX940Insts() && canAffectGlobalAddrSpace(AddrSpace)) {
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ // Set SC bits to indicate system scope.
+ Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+ break;
+ case SIAtomicScope::AGENT:
+ // Set SC bits to indicate agent scope.
+ Changed |= enableCPolBits(MI, CPol::SC1);
+ break;
+ case SIAtomicScope::WORKGROUP:
+ // Set SC bits to indicate workgroup scope.
+ Changed |= enableCPolBits(MI, CPol::SC0);
+ break;
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // Leave SC bits unset to indicate wavefront scope.
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
+ }
+
+ /// The scratch address space does not need the global memory caches
+ /// to be bypassed as all memory operations by the same thread are
+ /// sequentially consistent, and no other thread can access scratch
+ /// memory.
+
+ /// Other address spaces do not have a cache.
+ }
return Changed;
}
@@ -1098,10 +1036,31 @@ bool SIGfx6CacheControl::enableRMWCacheBypass(
assert(MI->mayLoad() && MI->mayStore());
bool Changed = false;
- /// Do not set GLC for RMW atomic operations as L0/L1 cache is automatically
- /// bypassed, and the GLC bit is instead used to indicate if they are
- /// return or no-return.
- /// Note: there is no L2 cache coherent bypass control at the ISA level.
+ /// For targets other than GFX940, do not set GLC for RMW atomic operations as
+ /// L0/L1 cache is automatically bypassed, and the GLC bit is instead used to
+ /// indicate if they are return or no-return. Note: there is no L2 cache
+ /// coherent bypass control at the ISA level.
+ /// For GFX90A+, RMW atomics implicitly bypass the L1 cache.
+
+ if (ST.hasGFX940Insts() && canAffectGlobalAddrSpace(AddrSpace)) {
+ switch (Scope) {
+ case SIAtomicScope::SYSTEM:
+ // Set SC1 bit to indicate system scope.
+ Changed |= enableCPolBits(MI, CPol::SC1);
+ break;
+ case SIAtomicScope::AGENT:
+ case SIAtomicScope::WORKGROUP:
+ case SIAtomicScope::WAVEFRONT:
+ case SIAtomicScope::SINGLETHREAD:
+ // RMW atomic operations implicitly bypass the L1 cache and only use SC1
+ // to indicate system or agent scope. The SC0 bit is used to indicate if
+ // they are return or no-return. Leave SC1 bit unset to indicate agent
+ // scope.
+ break;
+ default:
+ llvm_unreachable("Unsupported synchronization scope");
+ }
+ }
return Changed;
}
@@ -1123,11 +1082,15 @@ bool SIGfx6CacheControl::enableVolatileAndOrNonTemporal(
bool Changed = false;
if (IsVolatile) {
- // Set L1 cache policy to be MISS_EVICT for load instructions
- // and MISS_LRU for store instructions.
- // Note: there is no L2 cache bypass policy at the ISA level.
- if (Op == SIMemOp::LOAD)
- Changed |= enableGLCBit(MI);
+ if (ST.hasGFX940Insts()) {
+ // Set SC bits to indicate system scope.
+ Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+ } else if (Op == SIMemOp::LOAD) {
+ // Set L1 cache policy to be MISS_EVICT for load instructions
+ // and MISS_LRU for store instructions.
+ // Note: there is no L2 cache bypass policy at the ISA level.
+ Changed |= enableCPolBits(MI, CPol::GLC);
+ }
// Ensure operation has completed at system scope to cause all volatile
// operations to be visible outside the program in a global order. Do not
@@ -1142,10 +1105,13 @@ bool SIGfx6CacheControl::enableVolatileAndOrNonTemporal(
}
if (IsNonTemporal) {
- // Setting both GLC and SLC configures L1 cache policy to MISS_EVICT
- // for both loads and stores, and the L2 cache policy to STREAM.
- Changed |= enableGLCBit(MI);
- Changed |= enableSLCBit(MI);
+ if (ST.hasGFX940Insts()) {
+ Changed |= enableCPolBits(MI, CPol::NT);
+ } else {
+ // Setting both GLC and SLC configures L1 cache policy to MISS_EVICT
+ // for both loads and stores, and the L2 cache policy to STREAM.
+ Changed |= enableCPolBits(MI, CPol::SLC | CPol::GLC);
+ }
return Changed;
}
@@ -1166,6 +1132,26 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
if (Pos == Position::AFTER)
++MI;
+ // GFX90A+
+ if (ST.hasGFX90AInsts() && ST.isTgSplitEnabled()) {
+ // In threadgroup split mode the waves of a work-group can be executing on
+ // different CUs. Therefore need to wait for global or GDS memory operations
+ // to complete to ensure they are visible to waves in the other CUs.
+ // Otherwise in non-threadgroup split mode all waves of a work-group are on
+ // the same CU, so no need to wait for global memory as all waves in the
+ // work-group access the same the L1, nor wait for GDS as access are ordered
+ // on a CU.
+ if (((AddrSpace & (SIAtomicAddrSpace::GLOBAL | SIAtomicAddrSpace::SCRATCH |
+ SIAtomicAddrSpace::GDS)) != SIAtomicAddrSpace::NONE) &&
+ (Scope == SIAtomicScope::WORKGROUP)) {
+ // Same as <GFX90A at AGENT scope;
+ Scope = SIAtomicScope::AGENT;
+ }
+ // In threadgroup split mode LDS cannot be allocated so no need to wait for
+ // LDS memory operations.
+ AddrSpace &= ~SIAtomicAddrSpace::LDS;
+ }
+
bool VMCnt = false;
bool LGKMCnt = false;
@@ -1260,6 +1246,12 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
return Changed;
}
+static bool canUseBUFFER_WBINVL1_VOL(const GCNSubtarget &ST) {
+ if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
+ return false;
+ return !(ST.isAmdPalOS() || ST.isMesa3DOS());
+}
+
bool SIGfx6CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace,
@@ -1275,17 +1267,92 @@ bool SIGfx6CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
if (Pos == Position::AFTER)
++MI;
+ const unsigned InvalidateL1 = canUseBUFFER_WBINVL1_VOL(ST)
+ ? AMDGPU::BUFFER_WBINVL1_VOL
+ : AMDGPU::BUFFER_WBINVL1;
+
if (canAffectGlobalAddrSpace(AddrSpace)) {
switch (Scope) {
case SIAtomicScope::SYSTEM:
+ if (ST.hasGFX940Insts()) {
+ // Ensures that following loads will not see stale remote VMEM data or
+ // stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW
+ // and CC will never be stale due to the local memory probes.
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INV))
+ // Set SC bits to indicate system scope.
+ .addImm(AMDGPU::CPol::SC0 | AMDGPU::CPol::SC1);
+ // Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
+ // hardware does not reorder memory operations by the same wave with
+ // respect to a preceding "BUFFER_INV". The invalidate is guaranteed to
+ // remove any cache lines of earlier writes by the same wave and ensures
+ // later reads by the same wave will refetch the cache lines.
+ Changed = true;
+ break;
+ }
+
+ if (ST.hasGFX90AInsts()) {
+ // Ensures that following loads will not see stale remote VMEM data or
+ // stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW
+ // and CC will never be stale due to the local memory probes.
+ BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INVL2));
+ BuildMI(MBB, MI, DL, TII->get(InvalidateL1));
+ // Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
+ // hardware does not reorder memory operations by the same wave with
+ // respect to a preceding "BUFFER_INVL2". The invalidate is guaranteed
+ // to remove any cache lines of earlier writes by the same wave and
+ // ensures later reads by the same wave will refetch the cache lines.
+ Changed = true;
+ break;
+ }
+ [[fallthrough]];
case SIAtomicScope::AGENT:
- BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_WBINVL1));
+ if (ST.hasGFX940Insts()) {
+ // Ensures that following loads will not...
[truncated]
|
jayfoad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, but I have not reviewed the implementation in detail.
| static bool canUseBUFFER_WBINVL1_VOL(const GCNSubtarget &ST) { | ||
| if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS) | ||
| return false; | ||
| return !(ST.isAmdPalOS() || ST.isMesa3DOS()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return !(ST.isAmdPalOS() || ST.isMesa3DOS()); | |
| return !ST.isAmdPalOS() && !ST.isMesa3DOS(); |
Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.
6900721 to
b092e9c
Compare
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/27420 Here is the relevant piece of the build log for the reference |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/27577 Here is the relevant piece of the build log for the reference |
…lvm#168052) Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.

Merge the following classes into
SIGfx6CacheControl:They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same.
This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions.
There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.