Skip to content

Commit 70baf17

Browse files
committed
[AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking
Clean up the tracking logic to rely on register units. The pass was already "reinventing" the concept just to deal with 16 bit registers. There are no test changes, functionality is the same, except we can now track more LDS DMA IDs if we need it. The debug prints also changed a bit because we now talk in terms of register units. This also changes the tracking to use a DenseMap instead of a massive fixed size table. This trades a bit of access speed for a smaller memory footprint. Allocating and memsetting a huge table to zero caused a non-negligible performance impact (I've observed up to 50% of the time in the pass spent in the `memcpy` built-in). I also think we don't access these often enough to really justify using a vector. We do a few accesses per instruction, but not much more. In a huge 120MB LL file, I can barely see the trace of the DenseMap accesses. This still isn't as clean as I'd like it to be though. There is a mix of "VMEMID", "LDS DMA ID", "SGPR RegUnit" and "PhysReg" in the API of WaitCntBrackets. There is no type safety to avoid mix-ups as these are all integers. We could add another layer of abstraction on top, but I feel like it's going to add too much code/boilerplate for such a small issue.
1 parent 5eef98b commit 70baf17

File tree

1 file changed

+226
-272
lines changed

1 file changed

+226
-272
lines changed

0 commit comments

Comments
 (0)