Commit bf93440
authored
[AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077)
The pass was already "reinventing" the concept just to deal with 16 bit
registers. Clean up the entire tracking logic to only use register
units.
There are no test changes because functionality didn't change, except:
- We can now track more LDS DMA IDs if we need it (up to `1 << 16`)
- The debug prints also changed a bit because we now talk in terms of
register units.
This also changes the tracking to use a DenseMap instead of a massive
fixed size table. This trades a bit of access speed for a smaller memory
footprint. Allocating and memsetting a huge table to zero caused a
non-negligible performance impact (I've observed up to 50% of the time
in the pass spent in the `memcpy` built-in on a big test file).
I also think we don't access these often enough to really justify using
a vector. We do a few accesses per instruction, but not much more. In a
huge 120MB LL file, I can barely see the trace of the DenseMap accesses.1 parent b1ef2db commit bf93440
File tree
2 files changed
+314
-285
lines changed- llvm
- lib/Target/AMDGPU
- test/CodeGen/AMDGPU
2 files changed
+314
-285
lines changed
0 commit comments