[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes #168052

Pierre-vh · 2025-11-14T13:11:50Z

Merge the following classes into SIGfx6CacheControl:

SIGfx7CacheControl
SIGfx90ACacheControl
SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same.

This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.

Pierre-vh · 2025-11-14T13:12:04Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-11-14T13:13:18Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Merge the following classes into SIGfx6CacheControl:

SIGfx7CacheControl
SIGfx90ACacheControl
SIGfx940CacheControl

They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same.

This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions.

There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.

Patch is 51.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168052.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp (+294-790)

diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 6ab8d5521ebdb..b3c56c325e210 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -302,16 +302,17 @@ class SICacheControl {
 
   SICacheControl(const GCNSubtarget &ST);
 
-  /// Sets named bit \p BitName to "true" if present in instruction \p MI.
+  /// Sets CPol \p Bits to "true" if present in instruction \p MI.
   /// \returns Returns true if \p MI is modified, false otherwise.
-  bool enableNamedBit(const MachineBasicBlock::iterator MI,
-                      AMDGPU::CPol::CPol Bit) const;
+  bool enableCPolBits(const MachineBasicBlock::iterator MI,
+                      unsigned Bits) const;
 
   /// Check if any atomic operation on AS can affect memory accessible via the
   /// global address space.
   bool canAffectGlobalAddrSpace(SIAtomicAddrSpace AS) const;
 
 public:
+  using CPol = AMDGPU::CPol::CPol;
 
   /// Create a cache control for the subtarget \p ST.
   static std::unique_ptr<SICacheControl> create(const GCNSubtarget &ST);
@@ -401,21 +402,9 @@ class SICacheControl {
   virtual ~SICacheControl() = default;
 };
 
+/// Generates code sequences for the memory model of all GFX targets below
+/// GFX10.
 class SIGfx6CacheControl : public SICacheControl {
-protected:
-
-  /// Sets GLC bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableGLCBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::GLC);
-  }
-
-  /// Sets SLC bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableSLCBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::SLC);
-  }
-
 public:
 
   SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -454,114 +443,9 @@ class SIGfx6CacheControl : public SICacheControl {
                      Position Pos) const override;
 };
 
-class SIGfx7CacheControl : public SIGfx6CacheControl {
-public:
-
-  SIGfx7CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
-
-  bool insertAcquire(MachineBasicBlock::iterator &MI,
-                     SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace,
-                     Position Pos) const override;
-
-};
-
-class SIGfx90ACacheControl : public SIGfx7CacheControl {
-public:
-
-  SIGfx90ACacheControl(const GCNSubtarget &ST) : SIGfx7CacheControl(ST) {}
-
-  bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
-                             SIAtomicScope Scope,
-                             SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
-                            SIAtomicScope Scope,
-                            SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
-                                      SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-                                      bool IsVolatile, bool IsNonTemporal,
-                                      bool IsLastUse) const override;
-
-  bool insertWait(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
-                  SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-                  bool IsCrossAddrSpaceOrdering, Position Pos,
-                  AtomicOrdering Order, bool AtomicsOnly) const override;
-
-  bool insertAcquire(MachineBasicBlock::iterator &MI,
-                     SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace,
-                     Position Pos) const override;
-
-  bool insertRelease(MachineBasicBlock::iterator &MI,
-                     SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace,
-                     bool IsCrossAddrSpaceOrdering,
-                     Position Pos) const override;
-};
-
-class SIGfx940CacheControl : public SIGfx90ACacheControl {
-protected:
-
-  /// Sets SC0 bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableSC0Bit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::SC0);
-  }
-
-  /// Sets SC1 bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableSC1Bit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::SC1);
-  }
-
-  /// Sets NT bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableNTBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::NT);
-  }
-
-public:
-  SIGfx940CacheControl(const GCNSubtarget &ST) : SIGfx90ACacheControl(ST) {};
-
-  bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
-                             SIAtomicScope Scope,
-                             SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
-                              SIAtomicScope Scope,
-                              SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
-                            SIAtomicScope Scope,
-                            SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
-                                      SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-                                      bool IsVolatile, bool IsNonTemporal,
-                                      bool IsLastUse) const override;
-
-  bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-
-  bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
-                     SIAtomicAddrSpace AddrSpace, bool IsCrossAddrSpaceOrdering,
-                     Position Pos) const override;
-};
-
-class SIGfx10CacheControl : public SIGfx7CacheControl {
-protected:
-
-  /// Sets DLC bit to "true" if present in \p MI. Returns true if \p MI
-  /// is modified, false otherwise.
-  bool enableDLCBit(const MachineBasicBlock::iterator &MI) const {
-    return enableNamedBit(MI, AMDGPU::CPol::DLC);
-  }
-
+class SIGfx10CacheControl : public SIGfx6CacheControl {
 public:
-
-  SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx7CacheControl(ST) {}
+  SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
 
   bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
                              SIAtomicScope Scope,
@@ -601,6 +485,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   // \returns Returns true if \p MI is modified, false otherwise.
   bool setTH(const MachineBasicBlock::iterator MI,
              AMDGPU::CPol::CPol Value) const;
+
   // Sets Scope policy to \p Value if CPol operand is present in instruction \p
   // MI. \returns Returns true if \p MI is modified, false otherwise.
   bool setScope(const MachineBasicBlock::iterator MI,
@@ -1006,13 +891,13 @@ SICacheControl::SICacheControl(const GCNSubtarget &ST) : ST(ST) {
   InsertCacheInv = !AmdgcnSkipCacheInvalidations;
 }
 
-bool SICacheControl::enableNamedBit(const MachineBasicBlock::iterator MI,
-                                    AMDGPU::CPol::CPol Bit) const {
+bool SICacheControl::enableCPolBits(const MachineBasicBlock::iterator MI,
+                                    unsigned Bits) const {
   MachineOperand *CPol = TII->getNamedOperand(*MI, AMDGPU::OpName::cpol);
   if (!CPol)
     return false;
 
-  CPol->setImm(CPol->getImm() | Bit);
+  CPol->setImm(CPol->getImm() | Bits);
   return true;
 }
 
@@ -1028,14 +913,8 @@ bool SICacheControl::canAffectGlobalAddrSpace(SIAtomicAddrSpace AS) const {
 /* static */
 std::unique_ptr<SICacheControl> SICacheControl::create(const GCNSubtarget &ST) {
   GCNSubtarget::Generation Generation = ST.getGeneration();
-  if (ST.hasGFX940Insts())
-    return std::make_unique<SIGfx940CacheControl>(ST);
-  if (ST.hasGFX90AInsts())
-    return std::make_unique<SIGfx90ACacheControl>(ST);
-  if (Generation <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
-    return std::make_unique<SIGfx6CacheControl>(ST);
   if (Generation < AMDGPUSubtarget::GFX10)
-    return std::make_unique<SIGfx7CacheControl>(ST);
+    return std::make_unique<SIGfx6CacheControl>(ST);
   if (Generation < AMDGPUSubtarget::GFX11)
     return std::make_unique<SIGfx10CacheControl>(ST);
   if (Generation < AMDGPUSubtarget::GFX12)
@@ -1048,33 +927,61 @@ bool SIGfx6CacheControl::enableLoadCacheBypass(
     SIAtomicScope Scope,
     SIAtomicAddrSpace AddrSpace) const {
   assert(MI->mayLoad() && !MI->mayStore());
-  bool Changed = false;
 
-  if (canAffectGlobalAddrSpace(AddrSpace)) {
-    switch (Scope) {
-    case SIAtomicScope::SYSTEM:
-    case SIAtomicScope::AGENT:
+  if (!canAffectGlobalAddrSpace(AddrSpace)) {
+    /// The scratch address space does not need the global memory caches
+    /// to be bypassed as all memory operations by the same thread are
+    /// sequentially consistent, and no other thread can access scratch
+    /// memory.
+
+    /// Other address spaces do not have a cache.
+    return false;
+  }
+
+  bool Changed = false;
+  switch (Scope) {
+  case SIAtomicScope::SYSTEM:
+    if (ST.hasGFX940Insts()) {
+      // Set SC bits to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+      break;
+    }
+    [[fallthrough]];
+  case SIAtomicScope::AGENT:
+    if (ST.hasGFX940Insts()) {
+      // Set SC bits to indicate agent scope.
+      Changed |= enableCPolBits(MI, CPol::SC1);
+    } else {
       // Set L1 cache policy to MISS_EVICT.
       // Note: there is no L2 cache bypass policy at the ISA level.
-      Changed |= enableGLCBit(MI);
-      break;
-    case SIAtomicScope::WORKGROUP:
-    case SIAtomicScope::WAVEFRONT:
-    case SIAtomicScope::SINGLETHREAD:
-      // No cache to bypass.
-      break;
-    default:
-      llvm_unreachable("Unsupported synchronization scope");
+      Changed |= enableCPolBits(MI, CPol::GLC);
+    }
+    break;
+  case SIAtomicScope::WORKGROUP:
+    if (ST.hasGFX940Insts()) {
+      // In threadgroup split mode the waves of a work-group can be executing
+      // on different CUs. Therefore need to bypass the L1 which is per CU.
+      // Otherwise in non-threadgroup split mode all waves of a work-group are
+      // on the same CU, and so the L1 does not need to be bypassed. Setting
+      // SC bits to indicate work-group scope will do this automatically.
+      Changed |= enableCPolBits(MI, CPol::SC0);
+    } else if (ST.hasGFX90AInsts()) {
+      // In threadgroup split mode the waves of a work-group can be executing
+      // on different CUs. Therefore need to bypass the L1 which is per CU.
+      // Otherwise in non-threadgroup split mode all waves of a work-group are
+      // on the same CU, and so the L1 does not need to be bypassed.
+      if (ST.isTgSplitEnabled())
+        Changed |= enableCPolBits(MI, CPol::GLC);
     }
+    break;
+  case SIAtomicScope::WAVEFRONT:
+  case SIAtomicScope::SINGLETHREAD:
+    // No cache to bypass.
+    break;
+  default:
+    llvm_unreachable("Unsupported synchronization scope");
   }
 
-  /// The scratch address space does not need the global memory caches
-  /// to be bypassed as all memory operations by the same thread are
-  /// sequentially consistent, and no other thread can access scratch
-  /// memory.
-
-  /// Other address spaces do not have a cache.
-
   return Changed;
 }
 
@@ -1085,8 +992,39 @@ bool SIGfx6CacheControl::enableStoreCacheBypass(
   assert(!MI->mayLoad() && MI->mayStore());
   bool Changed = false;
 
-  /// The L1 cache is write through so does not need to be bypassed. There is no
-  /// bypass control for the L2 cache at the isa level.
+  /// For targets other than GFX940, the L1 cache is write through so does not
+  /// need to be bypassed. There is no bypass control for the L2 cache at the
+  /// isa level.
+
+  if (ST.hasGFX940Insts() && canAffectGlobalAddrSpace(AddrSpace)) {
+    switch (Scope) {
+    case SIAtomicScope::SYSTEM:
+      // Set SC bits to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+      break;
+    case SIAtomicScope::AGENT:
+      // Set SC bits to indicate agent scope.
+      Changed |= enableCPolBits(MI, CPol::SC1);
+      break;
+    case SIAtomicScope::WORKGROUP:
+      // Set SC bits to indicate workgroup scope.
+      Changed |= enableCPolBits(MI, CPol::SC0);
+      break;
+    case SIAtomicScope::WAVEFRONT:
+    case SIAtomicScope::SINGLETHREAD:
+      // Leave SC bits unset to indicate wavefront scope.
+      break;
+    default:
+      llvm_unreachable("Unsupported synchronization scope");
+    }
+
+    /// The scratch address space does not need the global memory caches
+    /// to be bypassed as all memory operations by the same thread are
+    /// sequentially consistent, and no other thread can access scratch
+    /// memory.
+
+    /// Other address spaces do not have a cache.
+  }
 
   return Changed;
 }
@@ -1098,10 +1036,31 @@ bool SIGfx6CacheControl::enableRMWCacheBypass(
   assert(MI->mayLoad() && MI->mayStore());
   bool Changed = false;
 
-  /// Do not set GLC for RMW atomic operations as L0/L1 cache is automatically
-  /// bypassed, and the GLC bit is instead used to indicate if they are
-  /// return or no-return.
-  /// Note: there is no L2 cache coherent bypass control at the ISA level.
+  /// For targets other than GFX940, do not set GLC for RMW atomic operations as
+  /// L0/L1 cache is automatically bypassed, and the GLC bit is instead used to
+  /// indicate if they are return or no-return. Note: there is no L2 cache
+  /// coherent bypass control at the ISA level.
+  ///       For GFX90A+, RMW atomics implicitly bypass the L1 cache.
+
+  if (ST.hasGFX940Insts() && canAffectGlobalAddrSpace(AddrSpace)) {
+    switch (Scope) {
+    case SIAtomicScope::SYSTEM:
+      // Set SC1 bit to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC1);
+      break;
+    case SIAtomicScope::AGENT:
+    case SIAtomicScope::WORKGROUP:
+    case SIAtomicScope::WAVEFRONT:
+    case SIAtomicScope::SINGLETHREAD:
+      // RMW atomic operations implicitly bypass the L1 cache and only use SC1
+      // to indicate system or agent scope. The SC0 bit is used to indicate if
+      // they are return or no-return. Leave SC1 bit unset to indicate agent
+      // scope.
+      break;
+    default:
+      llvm_unreachable("Unsupported synchronization scope");
+    }
+  }
 
   return Changed;
 }
@@ -1123,11 +1082,15 @@ bool SIGfx6CacheControl::enableVolatileAndOrNonTemporal(
   bool Changed = false;
 
   if (IsVolatile) {
-    // Set L1 cache policy to be MISS_EVICT for load instructions
-    // and MISS_LRU for store instructions.
-    // Note: there is no L2 cache bypass policy at the ISA level.
-    if (Op == SIMemOp::LOAD)
-      Changed |= enableGLCBit(MI);
+    if (ST.hasGFX940Insts()) {
+      // Set SC bits to indicate system scope.
+      Changed |= enableCPolBits(MI, CPol::SC0 | CPol::SC1);
+    } else if (Op == SIMemOp::LOAD) {
+      // Set L1 cache policy to be MISS_EVICT for load instructions
+      // and MISS_LRU for store instructions.
+      // Note: there is no L2 cache bypass policy at the ISA level.
+      Changed |= enableCPolBits(MI, CPol::GLC);
+    }
 
     // Ensure operation has completed at system scope to cause all volatile
     // operations to be visible outside the program in a global order. Do not
@@ -1142,10 +1105,13 @@ bool SIGfx6CacheControl::enableVolatileAndOrNonTemporal(
   }
 
   if (IsNonTemporal) {
-    // Setting both GLC and SLC configures L1 cache policy to MISS_EVICT
-    // for both loads and stores, and the L2 cache policy to STREAM.
-    Changed |= enableGLCBit(MI);
-    Changed |= enableSLCBit(MI);
+    if (ST.hasGFX940Insts()) {
+      Changed |= enableCPolBits(MI, CPol::NT);
+    } else {
+      // Setting both GLC and SLC configures L1 cache policy to MISS_EVICT
+      // for both loads and stores, and the L2 cache policy to STREAM.
+      Changed |= enableCPolBits(MI, CPol::SLC | CPol::GLC);
+    }
     return Changed;
   }
 
@@ -1166,6 +1132,26 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
   if (Pos == Position::AFTER)
     ++MI;
 
+  // GFX90A+
+  if (ST.hasGFX90AInsts() && ST.isTgSplitEnabled()) {
+    // In threadgroup split mode the waves of a work-group can be executing on
+    // different CUs. Therefore need to wait for global or GDS memory operations
+    // to complete to ensure they are visible to waves in the other CUs.
+    // Otherwise in non-threadgroup split mode all waves of a work-group are on
+    // the same CU, so no need to wait for global memory as all waves in the
+    // work-group access the same the L1, nor wait for GDS as access are ordered
+    // on a CU.
+    if (((AddrSpace & (SIAtomicAddrSpace::GLOBAL | SIAtomicAddrSpace::SCRATCH |
+                       SIAtomicAddrSpace::GDS)) != SIAtomicAddrSpace::NONE) &&
+        (Scope == SIAtomicScope::WORKGROUP)) {
+      // Same as <GFX90A at AGENT scope;
+      Scope = SIAtomicScope::AGENT;
+    }
+    // In threadgroup split mode LDS cannot be allocated so no need to wait for
+    // LDS memory operations.
+    AddrSpace &= ~SIAtomicAddrSpace::LDS;
+  }
+
   bool VMCnt = false;
   bool LGKMCnt = false;
 
@@ -1260,6 +1246,12 @@ bool SIGfx6CacheControl::insertWait(MachineBasicBlock::iterator &MI,
   return Changed;
 }
 
+static bool canUseBUFFER_WBINVL1_VOL(const GCNSubtarget &ST) {
+  if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
+    return false;
+  return !(ST.isAmdPalOS() || ST.isMesa3DOS());
+}
+
 bool SIGfx6CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
                                        SIAtomicScope Scope,
                                        SIAtomicAddrSpace AddrSpace,
@@ -1275,17 +1267,92 @@ bool SIGfx6CacheControl::insertAcquire(MachineBasicBlock::iterator &MI,
   if (Pos == Position::AFTER)
     ++MI;
 
+  const unsigned InvalidateL1 = canUseBUFFER_WBINVL1_VOL(ST)
+                                    ? AMDGPU::BUFFER_WBINVL1_VOL
+                                    : AMDGPU::BUFFER_WBINVL1;
+
   if (canAffectGlobalAddrSpace(AddrSpace)) {
     switch (Scope) {
     case SIAtomicScope::SYSTEM:
+      if (ST.hasGFX940Insts()) {
+        // Ensures that following loads will not see stale remote VMEM data or
+        // stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW
+        // and CC will never be stale due to the local memory probes.
+        BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INV))
+            // Set SC bits to indicate system scope.
+            .addImm(AMDGPU::CPol::SC0 | AMDGPU::CPol::SC1);
+        // Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
+        // hardware does not reorder memory operations by the same wave with
+        // respect to a preceding "BUFFER_INV". The invalidate is guaranteed to
+        // remove any cache lines of earlier writes by the same wave and ensures
+        // later reads by the same wave will refetch the cache lines.
+        Changed = true;
+        break;
+      }
+
+      if (ST.hasGFX90AInsts()) {
+        // Ensures that following loads will not see stale remote VMEM data or
+        // stale local VMEM data with MTYPE NC. Local VMEM data with MTYPE RW
+        // and CC will never be stale due to the local memory probes.
+        BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_INVL2));
+        BuildMI(MBB, MI, DL, TII->get(InvalidateL1));
+        // Inserting a "S_WAITCNT vmcnt(0)" after is not required because the
+        // hardware does not reorder memory operations by the same wave with
+        // respect to a preceding "BUFFER_INVL2". The invalidate is guaranteed
+        // to remove any cache lines of earlier writes by the same wave and
+        // ensures later reads by the same wave will refetch the cache lines.
+        Changed = true;
+        break;
+      }
+      [[fallthrough]];
     case SIAtomicScope::AGENT:
-      BuildMI(MBB, MI, DL, TII->get(AMDGPU::BUFFER_WBINVL1));
+      if (ST.hasGFX940Insts()) {
+        // Ensures that following loads will not...
[truncated]

jayfoad

LGTM overall, but I have not reviewed the implementation in detail.

arsenm · 2025-11-14T22:36:47Z

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp

+static bool canUseBUFFER_WBINVL1_VOL(const GCNSubtarget &ST) {
+  if (ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS)
+    return false;
+  return !(ST.isAmdPalOS() || ST.isMesa3DOS());


Suggested change

return !(ST.isAmdPalOS() || ST.isMesa3DOS());

return !ST.isAmdPalOS() && !ST.isMesa3DOS();

Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.

llvm-ci · 2025-11-17T09:24:26Z

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/27420

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: commands/log/basic/TestLogHandlers.py (188 of 2383)
PASS: lldb-api :: commands/help/TestHelp.py (189 of 2383)
PASS: lldb-api :: commands/log/invalid-args/TestInvalidArgsLog.py (190 of 2383)
PASS: lldb-api :: commands/frame/var/TestFrameVar.py (191 of 2383)
PASS: lldb-api :: commands/platform/basic/TestPlatformCommand.py (192 of 2383)
PASS: lldb-api :: commands/memory/write/TestMemoryWrite.py (193 of 2383)
PASS: lldb-api :: commands/platform/basic/TestPlatformPython.py (194 of 2383)
PASS: lldb-api :: commands/platform/file/close/TestPlatformFileClose.py (195 of 2383)
PASS: lldb-api :: commands/platform/file/read/TestPlatformFileRead.py (196 of 2383)
UNRESOLVED: lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py (197 of 2383)
******************** TEST 'lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads -p TestGuiSpawnThreads.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision b07bfdb6fbfba261a46c9bc1a2edc62c4cf21bd4)
  clang revision b07bfdb6fbfba261a46c9bc1a2edc62c4cf21bd4
  llvm revision b07bfdb6fbfba261a46c9bc1a2edc62c4cf21bd4
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'pdb', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
======================================================================
ERROR: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 156, in wrapper
    return func(*args, **kwargs)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads/TestGuiSpawnThreads.py", line 44, in test_gui
    self.child.expect_exact(f"thread #{i + 2}: tid =")
  File "/usr/local/lib/python3.10/dist-packages/pexpect/spawnbase.py", line 432, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 179, in expect_loop
    return self.eof(e)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 122, in eof
    raise exc
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0xfa5e853cd420>
command: /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb
args: ['/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear --all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.disable-aslr false', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'settings set show-statusline false', '--file', '/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out']
buffer (last 100 chars): b''
before (last 100 chars): b'thread_create.c:442:8\n#20 0x0000efa0293c9e9c ./misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:82:0\n'
after: <class 'pexpect.exceptions.EOF'>

llvm-ci · 2025-11-17T09:32:11Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu running on sie-linux-worker3 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/27577

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'SanitizerCommon-msan-x86_64-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=memory  -m64 -funwind-tables  -I/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test -ldl -O2 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# executed command: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=memory -m64 -funwind-tables -I/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test -ldl -O2 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# RUN: at line 5
env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
# executed command: env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
# note: command had no output on stdout or stderr
# RUN: at line 6
env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0 not  /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 --implicit-check-not="returned null"
# executed command: env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0 not /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 '--implicit-check-not=returned null'
# note: command had no output on stdout or stderr
# RUN: at line 10
env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0:can_use_proc_maps_statm=0 not  /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 --implicit-check-not="returned null"
# executed command: env MSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=0:can_use_proc_maps_statm=0 not /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/msan-x86_64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_0 '--implicit-check-not=returned null'
# .---command stderr------------
# | �[1m/home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:72:24: �[0m�[0;1;31merror: �[0m�[1mCHECK_MAY_RETURN_0: expected string not found in input
�[0m# | �[1m�[0m// CHECK_MAY_RETURN_0: Some of the malloc calls returned non-null:
# | �[0;1;32m                       ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:1:24: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m# | �[1m�[0m[0] allocating 32 times
# | �[0;1;32m                       ^
�[0m# | �[0;1;32m�[0m�[1m<stdin>:12:55: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m# | �[1m�[0m==3121617==HINT: if you don't care about these errors you may set allocator_may_return_null=1
# | �[0;1;32m                                                      ^
�[0m# | �[0;1;32m�[0m
# | Input file: <stdin>
# | Check file: /home/buildbot/buildbot-root/llvm-clang-x86_64-gcc-ubuntu/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# | �[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m[0] �[0mallocating 32 times�[0;1;46m �[0m
# | �[0;1;32mcheck:71           ^~~~~~~~~~~~~~~~~~~
�[0m# | �[0;1;32m�[0m�[0;1;32mnot:imp1       X~~~
�[0m# | �[0;1;32m�[0m�[0;1;31mcheck:72'0                            X error: no match found
�[0m# | �[0;1;31m�[0m�[0;1;30m            2: �[0m�[1m�[0;1;46m==3121617==MemorySanitizer: soft rss limit exhausted (220Mb vs 238Mb) �[0m
# | �[0;1;31mcheck:72'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...

…lvm#168052) Merge the following classes into `SIGfx6CacheControl`: - SIGfx7CacheControl - SIGfx90ACacheControl - SIGfx940CacheControl They were all very similar and had a lot of duplicated boilerplate just to implement one or two codegen differences. GFX90A/GFX940 have a bit more differences, but they're still manageable under one class because the general behavior is the same. This removes 500 lines of code and puts everything into a single place which I think makes it a lot easier to maintain, at the cost of a slight increase in complexity for some functions. There is still a lot of room for improvement but I think this patch is already big enough as is and I don't want to bundle too much into one review.

Pierre-vh requested review from jayfoad, nhaehnle, ritter-x2a, ssahasra and t-tye November 14, 2025 13:12

Pierre-vh marked this pull request as ready for review November 14, 2025 13:12

llvmbot added the backend:AMDGPU label Nov 14, 2025

jayfoad reviewed Nov 14, 2025

View reviewed changes

Pierre-vh mentioned this pull request Nov 14, 2025

[AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes #168058

Merged

arsenm approved these changes Nov 14, 2025

View reviewed changes

Pierre-vh added 2 commits November 17, 2025 10:06

nit

b092e9c

Pierre-vh force-pushed the users/pierre-vh/reduce-cachecontrol-classes branch from 6900721 to b092e9c Compare November 17, 2025 09:06

Pierre-vh merged commit b07bfdb into main Nov 17, 2025
7 of 10 checks passed

Pierre-vh deleted the users/pierre-vh/reduce-cachecontrol-classes branch November 17, 2025 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes #168052

[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes #168052

Uh oh!

Pierre-vh commented Nov 14, 2025

Uh oh!

Pierre-vh commented Nov 14, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 14, 2025

Uh oh!

jayfoad left a comment

Uh oh!

arsenm Nov 14, 2025

Uh oh!

Uh oh!

llvm-ci commented Nov 17, 2025

Uh oh!

llvm-ci commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	return !(ST.isAmdPalOS() \|\| ST.isMesa3DOS());
	return !ST.isAmdPalOS() && !ST.isMesa3DOS();

[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes #168052

[AMDGPU][SIMemoryLegalizer] Combine all GFX6-9 CacheControl Classes #168052

Uh oh!

Conversation

Pierre-vh commented Nov 14, 2025

Uh oh!

Pierre-vh commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 14, 2025

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Nov 17, 2025

Uh oh!

llvm-ci commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Pierre-vh commented Nov 14, 2025 •

edited

Loading