Skip to content

Conversation

@adam-yang
Copy link
Contributor

@adam-yang adam-yang commented Feb 8, 2025

Note: Before this is ready for review, I intend to remove AMDGPUMirSyncDependency and AMDGPUMirDivergenceAnalysis. They were basically forks of the IR versions of the passes from before the upstream MIR versions were available.

I have the code path that uses MachineUniformityInfo and the divergence related unit test passes with it, but I'd like to do more testing before deleting the other ones.

@github-actions
Copy link

github-actions bot commented Feb 8, 2025

✅ With the latest revision this PR passed the undef deprecator.

@github-actions
Copy link

github-actions bot commented Feb 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@dstutt dstutt requested review from jayfoad and perlfu February 26, 2025 17:30
@dstutt
Copy link
Collaborator

dstutt commented Feb 26, 2025

Added Jay and Carl fyi (still needs cleanup)

@dstutt dstutt self-requested a review February 26, 2025 17:31
@perlfu
Copy link
Contributor

perlfu commented Feb 27, 2025

Since this is only a draft and MirDivergence parts are going away I only I had a quick look.
I wonder if parts like AMDGPULatencyTracker should be full analysis passes, rather than just utilities?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to #include <cmath> for this. But perhaps it would be better to use an integer-only power operation (not sure if we already have one anywhere in LLVM).

@jayfoad
Copy link
Contributor

jayfoad commented Feb 27, 2025

Is this intended to be run just before pre-RA MachineScheduler?

Update: I see the pass currently requires SSA form, so it would have to be run earlier than that. Where would you put it in the pass pipeline?

@dstutt
Copy link
Collaborator

dstutt commented Feb 27, 2025

We've got some work in the pipeline to allow pre-RA machinescheduler to run in SSA form as well, so that might be ok.
Also, this might work well with some other work we're considering to add spilling as a separate pass before RA (but in SSA form).

@adam-yang adam-yang force-pushed the amdgpu_hot_block_remat branch from ede9e76 to 971e556 Compare March 18, 2025 03:24
@adam-yang adam-yang marked this pull request as ready for review March 18, 2025 05:21
@jayfoad
Copy link
Contributor

jayfoad commented Mar 25, 2025

Hi @adam-yang, I've done some testing with this patch and noticed that options -amdgpu-remat-enable-hot-block-remat-aggressive and -amdgpu-remat-enable-sub-exp-remat seem to cause many crashes in the compiler. Are you interested in test cases for these? Or are these options not expected to be working yet?

@adam-yang
Copy link
Contributor Author

Hi @adam-yang, I've done some testing with this patch and noticed that options -amdgpu-remat-enable-hot-block-remat-aggressive and -amdgpu-remat-enable-sub-exp-remat seem to cause many crashes in the compiler. Are you interested in test cases for these? Or are these options not expected to be working yet?

Yes could you send them to me please.

@jayfoad
Copy link
Contributor

jayfoad commented Mar 26, 2025

OK, here's a somewhat reduced test case for a crash with -amdgpu-remat-enable-sub-exp-remat. The verifier reports "Bad machine code: Found PHI instruction after non-PHI".
r1.txt

@jayfoad
Copy link
Contributor

jayfoad commented Mar 26, 2025

Here's one for -amdgpu-remat-enable-hot-block-remat-aggressive which fails an assertion "Invalid RC for virtual register": r2.txt

@adam-yang
Copy link
Contributor Author

Here's one for -amdgpu-remat-enable-hot-block-remat-aggressive which fails an assertion "Invalid RC for virtual register": r2.txt

This revealed some incorrect assumptions in this PR. Lanes are assumed to be 32-bit, but new 16-bit subregs have been added so lanes are now 16-bit. I'll have to go through the change to fix some more issues related to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants