Skip to content

Commit 7a8af30

Browse files
committed
[AMDGPU] Make max dwords of memory cluster configurable
We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value. It might be more helpful we can have a per-function value, but I am not sure how this can be done properly.
1 parent a295907 commit 7a8af30

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,11 @@ static cl::opt<bool> Fix16BitCopies(
6060
cl::init(true),
6161
cl::ReallyHidden);
6262

63+
static cl::opt<unsigned> MaxMemoryClusterDWORDS(
64+
"amdgpu-max-memory-cluster-dwords", cl::Hidden, cl::init(8),
65+
cl::desc(
66+
"Restrict the maximum dwords for memory cluster during scheduler"));
67+
6368
SIInstrInfo::SIInstrInfo(const GCNSubtarget &ST)
6469
: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),
6570
RI(ST), ST(ST) {
@@ -565,20 +570,22 @@ bool SIInstrInfo::shouldClusterMemOps(ArrayRef<const MachineOperand *> BaseOps1,
565570
}
566571

567572
// In order to avoid register pressure, on an average, the number of DWORDS
568-
// loaded together by all clustered mem ops should not exceed 8. This is an
569-
// empirical value based on certain observations and performance related
570-
// experiments.
573+
// loaded together by all clustered mem ops should not exceed
574+
// MaxMemoryClusterDWORDS. This is an empirical value based on certain
575+
// observations and performance related experiments.
571576
// The good thing about this heuristic is - it avoids clustering of too many
572577
// sub-word loads, and also avoids clustering of wide loads. Below is the
573-
// brief summary of how the heuristic behaves for various `LoadSize`.
578+
// brief summary of how the heuristic behaves for various `LoadSize` when
579+
// MaxMemoryClusterDWORDS is 8.
580+
//
574581
// (1) 1 <= LoadSize <= 4: cluster at max 8 mem ops
575582
// (2) 5 <= LoadSize <= 8: cluster at max 4 mem ops
576583
// (3) 9 <= LoadSize <= 12: cluster at max 2 mem ops
577584
// (4) 13 <= LoadSize <= 16: cluster at max 2 mem ops
578585
// (5) LoadSize >= 17: do not cluster
579586
const unsigned LoadSize = NumBytes / ClusterSize;
580587
const unsigned NumDWORDs = ((LoadSize + 3) / 4) * ClusterSize;
581-
return NumDWORDs <= 8;
588+
return NumDWORDs <= MaxMemoryClusterDWORDS;
582589
}
583590

584591
// FIXME: This behaves strangely. If, for example, you have 32 load + stores,

0 commit comments

Comments
 (0)