Skip to content

Conversation

@Pecco-314
Copy link
Contributor

@Pecco-314 Pecco-314 commented Jul 11, 2025

This change replaces some NVGPU ops with the corresponding NVVM ops. It aligns with previous discussions in PR #7420.
For some op like NVGPU::FenceAsyncSharedOp, there is no corresponding Intrinsic, and LLVM will also generate PTX. However, in the long run, I think it is better to hand over the responsibility of generating code to LLVM instead of hard coding PTX at the NVGPU layer.
The ConvertNVVMToLLVMPass has been added to the pipeline and build system so that NVVM ops are correctly lowered to LLVM IR.

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because it does not contain new features.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

@Pecco-314 Pecco-314 requested a review from ptillet as a code owner July 11, 2025 03:53
@Pecco-314 Pecco-314 changed the title [NVIDIA] Replace the NVGPU::FenceAsyncSharedOp with the equivalent NVVM Op [NVIDIA] Replace some NVGPU ops with equivalent NVVM ops (part 2) Jul 11, 2025
Copy link
Contributor

@peterbell10 peterbell10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failure looks real

@Jokeren
Copy link
Contributor

Jokeren commented Jul 11, 2025

Can you show me what code has been generated by nvvm in ptx for ldmatrix and stmatrix?

@Pecco-314
Copy link
Contributor Author

Pecco-314 commented Jul 11, 2025

The test fails because the code generates stmatrix.sync.aligned.x4.m8n8.shared.b16 while the assertion expects stmatrix.sync.aligned.m8n8.x4.shared.b16. Note that ptxas accepts both formats, though the latter is more canonical. I've submitted PR #148250 for LLVM, but alternatively we could simply update the test.

@peterbell10
Copy link
Contributor

Yes you can just update the test

@lezcano
Copy link
Contributor

lezcano commented Jul 11, 2025

Out of curiosity, does nvvm.ldmatrix/stmatrix support the new Blackwell ops? We may want to use them at some point

@Pecco-314
Copy link
Contributor Author

Out of curiosity, does nvvm.ldmatrix/stmatrix support the new Blackwell ops? We may want to use them at some point

It is not supported yet in LLVM, but I proposed a new PR. If merged, we will be able to use the new m16n8 ops.

@lezcano
Copy link
Contributor

lezcano commented Jul 12, 2025

Amazing, thank you!

Copy link
Contributor

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's wait for @peterbell10's review

@peterbell10 peterbell10 merged commit 0560390 into triton-lang:main Jul 14, 2025
9 checks passed
@ThomasRaoux
Copy link
Collaborator

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

@Pecco-314
Copy link
Contributor Author

Pecco-314 commented Jul 15, 2025

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

In fact, the NVPTX backend implementation has been completed. It just lacks some interfaces to generate the corresponding PTX code. As mentioned in comments of LLVM PR #148377, I am currently working on it.

@ThomasRaoux
Copy link
Collaborator

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

In fact, the NVPTX backend implementation has been completed. It just lacks some interfaces to generate the corresponding PTX code. As mentioned in comments of this LLVM PR, I am currently working on it.

I don't see ldmatrix support in there?

@Pecco-314
Copy link
Contributor Author

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

In fact, the NVPTX backend implementation has been completed. It just lacks some interfaces to generate the corresponding PTX code. As mentioned in comments of this LLVM PR, I am currently working on it.

I don't see ldmatrix support in there?

I just wrote it today. See LLVM PR #148783.

@Pecco-314 Pecco-314 deleted the nvvm-2 branch July 23, 2025 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants