-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[NVIDIA] Replace some NVGPU ops with equivalent NVVM ops (part 2) #7471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
peterbell10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test failure looks real
|
Can you show me what code has been generated by nvvm in ptx for ldmatrix and stmatrix? |
|
The test fails because the code generates |
|
Yes you can just update the test |
|
Out of curiosity, does |
It is not supported yet in LLVM, but I proposed a new PR. If merged, we will be able to use the new m16n8 ops. |
|
Amazing, thank you! |
lezcano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but let's wait for @peterbell10's review
|
we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support |
In fact, the NVPTX backend implementation has been completed. It just lacks some interfaces to generate the corresponding PTX code. As mentioned in comments of LLVM PR #148377, I am currently working on it. |
I don't see ldmatrix support in there? |
I just wrote it today. See LLVM PR #148783. |
This change replaces some NVGPU ops with the corresponding NVVM ops. It aligns with previous discussions in PR #7420.
For some op like NVGPU::FenceAsyncSharedOp, there is no corresponding Intrinsic, and LLVM will also generate PTX. However, in the long run, I think it is better to hand over the responsibility of generating code to LLVM instead of hard coding PTX at the NVGPU layer.
The ConvertNVVMToLLVMPass has been added to the pipeline and build system so that NVVM ops are correctly lowered to LLVM IR.
New contributor declaration
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD.Select one of the following.
/testforlittests/unittestfor C++ tests/python/testfor end-to-end testsSelect one of the following.
littests.littests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)