Skip to content

Conversation

@Pierre-vh
Copy link
Contributor

Merge the following classes into SIGfx6CacheControl:

  • SIGfx7CacheControl
  • SIGfx90ACacheControl
  • SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same.

This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.

Copy link
Contributor Author

Pierre-vh commented Nov 14, 2025

@Pierre-vh Pierre-vh marked this pull request as ready for review November 14, 2025 13:12
@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Merge the following classes into SIGfx6CacheControl:

  • SIGfx7CacheControl
  • SIGfx90ACacheControl
  • SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same.

This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.


Patch is 51.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168052.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+294-790)
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 6ab8d5521ebdb..b3c56c325e210 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -302,16 +302,17 @@ class SICacheControl {
 
   SICacheControl(const GCNSubtarget &ST);
 
-  /// Sets named bit \p BitName to "true" if present in instruction \p MI.
+  /// Sets CPol \p Bits to "true" if present in instruction \p MI.
   /// \returns Returns true if \p MI is modified, false otherwise.
-  bool enableNamedBit(const MachineBasicBlock::iterator MI,
-                      AMDGPU::CPol::CPol Bit) const;
+  bool enableCPolBits(const MachineBasicBlock::iterator MI,
+                      unsigned Bits) const;
 
   /// Check if any atomic operation on AS can affect memory accessible via the
   /// global address space.
   bool canAffectGlobalAddrSpace(SIAtomicAddrSpace AS) const;
 
 public:
+  using CPol = AMDGPU::CPol::CPol;
 
   /// Create a cache control for the subtarget \p ST.
   static std::unique_ptr<SICacheControl> create(const GCNSubtarget &ST);
@@ -401,21 +402,9 @@ class SICacheControl {
   virtual ~SICacheControl() = default;
 };
 
+/// Generates code sequences for the memory model of all GFX targets below
+/// GFX10.
 class SIGfx6CacheControl : public SICacheControl {
-protected:
-
-  /// Sets GLC bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableGLCBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::GLC);
-  }
-
-  /// Sets SLC bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableSLCBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::SLC);
-  }
-
 public:
 
   SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -454,114 +443,9 @@ class SIGfx6CacheControl : public SICacheControl {
                      Position Pos) const override;
 };
 
-class SIGfx7CacheControl : public SIGfx6CacheControl {
-public:
-
-  SIGfx7CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
-
-  bool insertAcquire(MachineBasicBlock::iterator &MI,
-                     SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace,
-                     Position Pos) const override;
-
-};
-
-class SIGfx90ACacheControl : public SIGfx7CacheControl {
-public:
-
-  SIGfx90ACacheControl(const GCNSubtarget &ST) : SIGfx7CacheControl(ST) {}
-
-  bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
-                             SIAtomicScope Scope,
-                             SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
-                            SIAtomicScope Scope,
-                            SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
-                                      SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-                                      bool IsVolatile, bool IsNonTemporal,
-                                      bool IsLastUse) const override;
-
-  bool insertWait(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
-                  SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-                  bool IsCrossAddrSpaceOrdering, Position Pos,
-                  AtomicOrdering Order, bool AtomicsOnly) const override;
-
-  bool insertAcquire(MachineBasicBlock::iterator &MI,
-                     SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace,
-                     Position Pos) const override;
-
-  bool insertRelease(MachineBasicBlock::iterator &MI,
-                     SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace,
-                     bool IsCrossAddrSpaceOrdering,
-                     Position Pos) const override;
-};
-
-class SIGfx940CacheControl : public SIGfx90ACacheControl {
-protected:
-
-  /// Sets SC0 bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableSC0Bit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::SC0);
-  }
-
-  /// Sets SC1 bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableSC1Bit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::SC1);
-  }
-
-  /// Sets NT bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableNTBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::NT);
-  }
-
-public:
-  SIGfx940CacheControl(const GCNSubtarget &ST) : SIGfx90ACacheControl(ST) {};
-
-  bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
-                             SIAtomicScope Scope,
-                             SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
-                              SIAtomicScope Scope,
-                              SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
-                            SIAtomicScope Scope,
-                            SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
-                                      SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-                                      bool IsVolatile, bool IsNonTemporal,
-                                      bool IsLastUse) const override;
-
-  bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-
-  bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace, bool IsCrossAddrSpaceOrdering,
-                     Position Pos) const override;
-};
-
-class SIGfx10CacheControl : public SIGfx7CacheControl {
-protected:
-
-  /// Sets DLC bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableDLCBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::DLC);
-  }
-
+class SIGfx10CacheControl : public SIGfx6CacheControl {
 public:
-
-  SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx7CacheControl(ST) {}
+  SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
 
   bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
                              SIAtomicScope Scope,
@@ -601,6 +485,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   // \returns Returns true if \p MI is modified, false otherwise.
   bool setTH(const MachineBasicBlock::iterator MI,
              AMDGPU::CPol::CPol Value) const;
+
   // Sets Scope policy to \p Value if CPol operand is present in instruction \p
   // MI. \returns Returns true if \p MI is modified, false otherwise.
   bool setScope(const MachineBasicBlock::iterator MI,
@@ -1006,13 +891,13 @@ SICacheControl::SICacheControl(const GCNSubtarget &ST) : ST(ST) {
   InsertCacheInv = !AmdgcnSkipCacheInvalidations;
 }
 
-bool SICacheControl::enableNamedBit(const MachineBasicBlock::iterator MI,
-                                    AMDGPU::CPol::CPol Bit) const {
+bool SICacheControl::enableCPolBits(const MachineBasicBlock::iterator MI,
+                                    unsigned Bits) const {
   MachineOperand *CPol = TII->getNamedOperand(*MI, AMDGPU::OpName::cpol);
   if (!CPol)
     return false;
 
-  CPol->setImm(CPol->getImm() | Bit);
+  CPol->setImm(CPol->getImm() | Bits);
   return true;
 }
 
@@ -1028,14 +913,8 @@ bool SICacheControl::canAffectGlobalAddrSpace(SIAtomicAddrSpace AS) const {
 /* static */
 std::unique_ptr<SICacheControl> SICacheControl::create(const GCNSubtarget &ST) {
   GCNSubtarget::Generation Generation = ST.getGeneration();
-  if (ST.hasGFX940Insts())
-    return std::make_unique<SIGfx940CacheControl>(ST);
-  if (ST.hasGFX90AInsts())
-    return std::make_unique<SIGfx90ACacheControl>(ST);
-  if (Generation <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
-    return std::make_unique<SIGfx6CacheControl>(ST);
   if (Generation < AMDGPUSubtarget::GFX10)
-    return std::make_unique<SIGfx7CacheControl>(ST);
+    return std::make_unique<SIGfx6CacheControl>(ST);
   if (Generation < AMDGPUSubtarget::GFX11)
     return std::make_unique<SIGfx10CacheControl>(ST);
   if (Generation < AMDGPUSubtarget::GFX12)
@@ -1048,33 +927,61 @@ bool SIGfx6CacheControl::enableLoadCacheBypass(
     SIAtomicScope Scope,
     SIAtomicAddrSpace AddrSpace) const {
   assert(MI->mayLoad() && !MI->mayStore());
-  bool Changed = false;
 
-  if (canAffectGlobalAddrSpace(AddrSpace)) {
-    switch (Scope) {
-    case SIAtomicScope::SYSTEM:
-    case SIAtomicScope::AGENT:
+  if (!canAffectGlobalAddrSpace(AddrSpace)) {
+    /// The scratch address space does not need the global memory caches
+    /// to be bypassed as all memory operations by the same thread are
+    /// sequentially consistent, and no other thread can access scratch
+    /// memory.
+
+    /// Other address spaces do not have a cache.
+    return false;
+  }
+
+  bool Changed = false;
+  switch (Scope) {
+  case SIAtomicScope::SYSTEM:
+    if (ST.hasGFX940Insts()) {
+      // Set SC bits to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+      break;
+    }
+    [[fallthrough]];
+  case SIAtomicScope::AGENT:
+    if (ST.hasGFX940Insts()) {
+      // Set SC bits to indicate agent scope.
+      Changed |= enableCPolBits(MI, CPol::SC1);
+    } else {
       // Set L1 cache policy to MISS_EVICT.
       // Note: there is no L2 cache bypass policy at the ISA level.
-      Changed |= enableGLCBit(MI);
-      break;
-    case SIAtomicScope::WORKGROUP:
-    case SIAtomicScope::WAVEFRONT:
-    case SIAtomicScope::SINGLETHREAD:
-      // No cache to bypass.
-      break;
-    default:
-      llvm_unreachable("Unsupported synchronization scope");
+      Changed |= enableCPolBits(MI, CPol::GLC);
+    }
+    break;
+  case SIAtomicScope::WORKGROUP:
+    if (ST.hasGFX940Insts()) {
+      // In threadgroup split mode the waves of a work-group can be executing
+      // on different CUs. Therefore need to bypass the L1 which is per CU.
+      // Otherwise in non-threadgroup split mode all waves of a work-group are
+      // on the same CU, and so the L1 does not need to be bypassed. Setting
+      // SC bits to indicate work-group scope will do this automatically.
+      Changed |= enableCPolBits(MI, CPol::SC0);
+    } else if (ST.hasGFX90AInsts()) {
+      // In threadgroup split mode the waves of a work-group can be executing
+      // on different CUs. Therefore need to bypass the L1 which is per CU.
+      // Otherwise in non-threadgroup split mode all waves of a work-group are
+      // on the same CU, and so the L1 does not need to be bypassed.
+      if (ST.isTgSplitEnabled())
+        Changed |= enableCPolBits(MI, CPol::GLC);
     }
+    break;
+  case SIAtomicScope::WAVEFRONT:
+  case SIAtomicScope::SINGLETHREAD:
+    // No cache to bypass.
+    break;
+  default:
+    llvm_unreachable("Unsupported synchronization scope");
   }
 
-  /// The scratch address space does not need the global memory caches
-  /// to be bypassed as all memory operations by the same thread are
-  /// sequentially consistent, and no other thread can access scratch
-  /// memory.
-
-  /// Other address spaces do not have a cache.
-
   return Changed;
 }
 
@@ -1085,8 +992,39 @@ bool SIGfx6CacheControl::enableStoreCacheBypass(
   assert(!MI->mayLoad() && MI->mayStore());
   bool Changed = false;
 
-  /// The L1 cache is write through so does not need to be bypassed. There is no
-  /// bypass control for the L2 cache at the isa level.
+  /// For targets other than GFX940, the L1 cache is write through so does not
+  /// need to be bypassed. There is no bypass control for the L2 cache at the
+  /// isa level.
+
+  if (ST.hasGFX940Insts() && canAffectGlobalAddrSpace(AddrSpace)) {
+    switch (Scope) {
+    case SIAtomicScope::SYSTEM:
+      // Set SC bits to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+      break;
+    case SIAtomicScope::AGENT:
+      // Set SC bits to indicate agent scope.
+      Changed |= enableCPolBits(MI, CPol::SC1);
+      break;
+    case SIAtomicScope::WORKGROUP:
+      // Set SC bits to indicate workgroup scope.
+      Changed |= enableCPolBits(MI, CPol::SC0);
+      break;
+    case SIAtomicScope::WAVEFRONT:
+    case SIAtomicScope::SINGLETHREAD:
+      // Leave SC bits unset to indicate wavefront scope.
+      break;
+    default:
+      llvm_unreachable("Unsupported synchronization scope");
+    }
+
+    /// The scratch address space does not need the global memory caches
+    /// to be bypassed as all memory operations by the same thread are
+    /// sequentially consistent, and no other thread can access scratch
+    /// memory.
+
+    /// Other address spaces do not have a cache.
+  }
 
   return Changed;
 }
@@ -1098,10 +1036,31 @@ bool SIGfx6CacheControl::enableRMWCacheBypass(
   assert(MI->mayLoad() && MI->mayStore());
   bool Changed = false;
 
-  /// Do not set GLC for RMW atomic operations as L0/L1 cache is automatically
-  /// bypassed, and the GLC bit is instead used to indicate if they are
-  /// return or no-return.
-  /// Note: there is no L2 cache coherent bypass control at the ISA level.
+  /// For targets other than GFX940, do not set GLC for RMW atomic operations as
+  /// L0/L1 cache is automatically bypassed, and the GLC bit is instead used to
+  /// indicate if they are return or no-return. Note: there is no L2 cache
+  /// coherent bypass control at the ISA level.
+  ///       For GFX90A+, RMW atomics implicitly bypass the L1 cache.
+
+  if (ST.hasGFX940Insts() && canAffectGlobalAddrSpace(AddrSpace)) {
+    switch (Scope) {
+    case SIAtomicScope::SYSTEM:
+      // Set SC1 bit to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC1);
+      break;
+    case SIAtomicScope::AGENT:
+    case SIAtomicScope::WORKGROUP:
+    case SIAtomicScope::WAVEFRONT:
+    case SIAtomicScope::SINGLETHREAD:
+      // RMW atomic operations implicitly bypass the L1 cache and only use SC1
+      // to indicate system or agent scope. The SC0 bit is used to indicate if
+      // they are return or no-return. Leave SC1 bit unset to indicate agent
+      // scope.
+      break;
+    default:
+      llvm_unreachable("Unsupported synchronization scope");
+    }
+  }
 
   return Changed;
 }
@@ -1123,11 +1082,15 @@ bool SIGfx6CacheControl::enableVolatileAndOrNonTemporal(
   bool Changed = false;
 
   if (IsVolatile) {
-    // Set L1 cache policy to be MISS_EVICT for load instructions
-    // and MISS_LRU for store instructions.
-    // Note: there is no L2 cache bypass policy at the ISA level.
-    if (Op == SIMemOp::LOAD)
-      Changed |= enableGLCBit(MI);
+    if (ST.hasGFX940Insts()) {
+      // Set SC bits to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+    } else if (Op == SIMemOp::LOAD) {
+      // Set L1 cache policy to be MISS_EVICT for load instructions
+      // and MISS_LRU for store instructions.
+      // Note: there is no L2 cache bypass policy at the ISA level.
+      Changed |= enableCPolBits(MI, CPol::GLC);
+    }
 
     // Ensure operation has completed at system scope to cause all volatile
     // operations to be visible outside the program in a global order. Do not
@@ -1142,10 +1105,13 @@ bool SIGfx6CacheControl::enableVolatileAndOrNonTemporal(
   }
 
   if (IsNonTemporal) {
-    // Setting both GLC and SLC configures L1 cache policy to MISS_EVICT
-    // for both loads and stores, and the L2 cache policy to STREAM.
-    Changed |= enableGLCBit(MI);
-    Changed |= enableSLCBit(MI);
+    if (ST.hasGFX940Insts()) {
+      Changed |= enableCPolBits(MI, CPol::NT);
+    } else {
+      // Setting both GLC and SLC configures L1 cache policy to MISS_EVICT
+      // for both loads and stores, and the L2 cache policy to STREAM.
+      Changed |= enableCPolBits(MI, CPol::SLC | CPol::GLC);
+    }
     return Changed;
   }
 
@@ -1166,6 +1132,26 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
   if (Pos == Position::AFTER)
     ++MI;
 
+  // GFX90A+
+  if (ST.hasGFX90AInsts() && ST.isTgSplitEnabled()) {
+    // In threadgroup split mode the waves of a work-group can be executing on
+    // different CUs. Therefore need to wait for global or GDS memory operations
+    // to complete to ensure they are visible to waves in the other CUs.
+    // Otherwise in non-threadgroup split mode all waves of a work-group are on
+    // the same CU, so no need to wait for global memory as all waves in the
+    // work-group access the same the L1, nor wait for GDS as access are ordered
+    // on a CU.
+    if (((AddrSpace & (SIAtomicAddrSpace::GLOBAL | SIAtomicAddrSpace::SCRATCH |
+                       SIAtomicAddrSpace::GDS)) != SIAtomicAddrSpace::NONE) &&
+        (Scope == SIAtomicScope::WORKGROUP)) {
+      // Same as <GFX90A at AGENT scope;
+      Scope = SIAtomicScope::AGENT;
+    }
+    // In threadgroup split mode LDS cannot be allocated so no need to wait for
+    // LDS memory operations.
+    AddrSpace &= ~SIAtomicAddrSpace::LDS;
+  }
+
   bool VMCnt = false;
   bool LGKMCnt = false;
 
@@ -1260,6 +1246,12 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
   return Changed;
 }
 
+static bool canUseBUFFER_WBINVL1_VOL(const GCNSubtarget &ST) {
+  if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
+    return false;
+  return !(ST.isAmdPalOS() || ST.isMesa3DOS());
+}
+
 bool SIGfx6CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
                                        SIAtomicScope Scope,
                                        SIAtomicAddrSpace AddrSpace,
@@ -1275,17 +1267,92 @@ bool SIGfx6CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
   if (Pos == Position::AFTER)
     ++MI;
 
+  const unsigned InvalidateL1 = canUseBUFFER_WBINVL1_VOL(ST)
+                                    ? AMDGPU::BUFFER_WBINVL1_VOL
+                                    : AMDGPU::BUFFER_WBINVL1;
+
   if (canAffectGlobalAddrSpace(AddrSpace)) {
     switch (Scope) {
     case SIAtomicScope::SYSTEM:
+      if (ST.hasGFX940Insts()) {
+        // Ensures that following loads will not see stale remote VMEM data or
+        // stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW
+        // and CC will never be stale due to the local memory probes.
+        BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INV))
+            // Set SC bits to indicate system scope.
+            .addImm(AMDGPU::CPol::SC0 | AMDGPU::CPol::SC1);
+        // Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
+        // hardware does not reorder memory operations by the same wave with
+        // respect to a preceding "BUFFER_INV". The invalidate is guaranteed to
+        // remove any cache lines of earlier writes by the same wave and ensures
+        // later reads by the same wave will refetch the cache lines.
+        Changed = true;
+        break;
+      }
+
+      if (ST.hasGFX90AInsts()) {
+        // Ensures that following loads will not see stale remote VMEM data or
+        // stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW
+        // and CC will never be stale due to the local memory probes.
+        BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INVL2));
+        BuildMI(MBB, MI, DL, TII->get(InvalidateL1));
+        // Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
+        // hardware does not reorder memory operations by the same wave with
+        // respect to a preceding "BUFFER_INVL2". The invalidate is guaranteed
+        // to remove any cache lines of earlier writes by the same wave and
+        // ensures later reads by the same wave will refetch the cache lines.
+        Changed = true;
+        break;
+      }
+      [[fallthrough]];
     case SIAtomicScope::AGENT:
-      BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_WBINVL1));
+      if (ST.hasGFX940Insts()) {
+        // Ensures that following loads will not...
[truncated]

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, but I have not reviewed the implementation in detail.

static bool canUseBUFFER_WBINVL1_VOL(const GCNSubtarget &ST) {
if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
return false;
return !(ST.isAmdPalOS() || ST.isMesa3DOS());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return !(ST.isAmdPalOS() || ST.isMesa3DOS());
return !ST.isAmdPalOS() && !ST.isMesa3DOS();

Merge the following classes into `SIGfx6CacheControl`:
- SIGfx7CacheControl
- SIGfx90ACacheControl
- SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same.

This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/reduce-cachecontrol-classes branch from 6900721 to b092e9c Compare November 17, 2025 09:06
@Pierre-vh Pierre-vh merged commit b07bfdb into main Nov 17, 2025
7 of 10 checks passed
@Pierre-vh Pierre-vh deleted the users/pierre-vh/reduce-cachecontrol-classes branch November 17, 2025 09:07
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 17, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/27420

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: commands/log/basic/TestLogHandlers.py (188 of 2383)
PASS: lldb-api :: commands/help/TestHelp.py (189 of 2383)
PASS: lldb-api :: commands/log/invalid-args/TestInvalidArgsLog.py (190 of 2383)
PASS: lldb-api :: commands/frame/var/TestFrameVar.py (191 of 2383)
PASS: lldb-api :: commands/platform/basic/TestPlatformCommand.py (192 of 2383)
PASS: lldb-api :: commands/memory/write/TestMemoryWrite.py (193 of 2383)
PASS: lldb-api :: commands/platform/basic/TestPlatformPython.py (194 of 2383)
PASS: lldb-api :: commands/platform/file/close/TestPlatformFileClose.py (195 of 2383)
PASS: lldb-api :: commands/platform/file/read/TestPlatformFileRead.py (196 of 2383)
UNRESOLVED: lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py (197 of 2383)
******************** TEST 'lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads -p TestGuiSpawnThreads.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision b07bfdb6fbfba261a46c9bc1a2edc62c4cf21bd4)
  clang revision b07bfdb6fbfba261a46c9bc1a2edc62c4cf21bd4
  llvm revision b07bfdb6fbfba261a46c9bc1a2edc62c4cf21bd4
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'pdb', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
======================================================================
ERROR: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 156, in wrapper
    return func(*args, **kwargs)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads/TestGuiSpawnThreads.py", line 44, in test_gui
    self.child.expect_exact(f"thread #{i + 2}: tid =")
  File "/usr/local/lib/python3.10/dist-packages/pexpect/spawnbase.py", line 432, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 179, in expect_loop
    return self.eof(e)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 122, in eof
    raise exc
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0xfa5e853cd420>
command: /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb
args: ['/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear --all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.disable-aslr false', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'settings set show-statusline false', '--file', '/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out']
buffer (last 100 chars): b''
before (last 100 chars): b'thread_create.c:442:8\n#20 0x0000efa0293c9e9c ./misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:82:0\n'
after: <class 'pexpect.exceptions.EOF'>

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 17, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/27577

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'SanitizerCommon-msan-x86_64-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=memory  -m64 -funwind-tables  -I/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test -ldl -O2 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# executed command: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=memory -m64 -funwind-tables -I/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test -ldl -O2 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# RUN: at line 5
env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
# executed command: env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
# note: command had no output on stdout or stderr
# RUN: at line 6
env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0 not  /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 --implicit-check-not="returned null"
# executed command: env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0 not /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 '--implicit-check-not=returned null'
# note: command had no output on stdout or stderr
# RUN: at line 10
env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0:can_use_proc_maps_statm=0 not  /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 --implicit-check-not="returned null"
# executed command: env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0:can_use_proc_maps_statm=0 not /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 '--implicit-check-not=returned null'
# .---command stderr------------
# | �[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:72:24: �[0m�[0;1;31merror: �[0m�[1mCHECK_MAY_RETURN_0: expected string not found in input
�[0m# | �[1m�[0m// CHECK_MAY_RETURN_0: Some of the malloc calls returned non-null:
# | �[0;1;32m                       ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:1:24: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m# | �[1m�[0m[0] allocating 32 times
# | �[0;1;32m                       ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:12:55: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m# | �[1m�[0m==3121617==HINT: if you don't care about these errors you may set allocator_may_return_null=1
# | �[0;1;32m                                                      ^
�[0m# | �[0;1;32m�[0m
# | Input file: <stdin>
# | Check file: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# | �[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m[0] �[0mallocating 32 times�[0;1;46m �[0m
# | �[0;1;32mcheck:71           ^~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;32mnot:imp1       X~~~
�[0m# | �[0;1;32m�[0m�[0;1;31mcheck:72'0                            X error: no match found
�[0m# | �[0;1;31m�[0m�[0;1;30m            2: �[0m�[1m�[0;1;46m==3121617==MemorySanitizer: soft rss limit exhausted (220Mb vs 238Mb) �[0m
# | �[0;1;31mcheck:72'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...

aadeshps-mcw pushed a commit to aadeshps-mcw/llvm-project that referenced this pull request Nov 26, 2025
…lvm#168052)

Merge the following classes into `SIGfx6CacheControl`:
- SIGfx7CacheControl
- SIGfx90ACacheControl
- SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just
to implement one or two codegen differences. GFX90A/GFX940 have a bit
more differences, but they're still manageable under one class because
the general behavior is the same.

This removes 500 lines of code and puts everything into a single place
which I think makes it a lot easier to maintain, at the cost of a slight
increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is
already big enough as is and I don't want to bundle too much into one
review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants