Skip to content

Commit bf93440

Browse files
authored
[AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077)
The pass was already "reinventing" the concept just to deal with 16 bit registers. Clean up the entire tracking logic to only use register units. There are no test changes because functionality didn't change, except: - We can now track more LDS DMA IDs if we need it (up to `1 << 16`) - The debug prints also changed a bit because we now talk in terms of register units. This also changes the tracking to use a DenseMap instead of a massive fixed size table. This trades a bit of access speed for a smaller memory footprint. Allocating and memsetting a huge table to zero caused a non-negligible performance impact (I've observed up to 50% of the time in the pass spent in the `memcpy` built-in on a big test file). I also think we don't access these often enough to really justify using a vector. We do a few accesses per instruction, but not much more. In a huge 120MB LL file, I can barely see the trace of the DenseMap accesses.
1 parent b1ef2db commit bf93440

File tree

2 files changed

+314
-285
lines changed

2 files changed

+314
-285
lines changed

0 commit comments

Comments
 (0)