[NVIDIA] Replace some NVGPU ops with equivalent NVVM ops (part 2) #7471

Pecco-314 · 2025-07-11T03:53:55Z

This change replaces some NVGPU ops with the corresponding NVVM ops. It aligns with previous discussions in PR #7420.
For some op like NVGPU::FenceAsyncSharedOp, there is no corresponding Intrinsic, and LLVM will also generate PTX. However, in the long run, I think it is better to hand over the responsibility of generating code to LLVM instead of hard coding PTX at the NVGPU layer.
The ConvertNVVMToLLVMPass has been added to the pipeline and build system so that NVVM ops are correctly lowered to LLVM IR.

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because it does not contain new features.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

peterbell10

Test failure looks real

Jokeren · 2025-07-11T15:24:17Z

Can you show me what code has been generated by nvvm in ptx for ldmatrix and stmatrix?

Pecco-314 · 2025-07-11T16:11:37Z

The test fails because the code generates stmatrix.sync.aligned.x4.m8n8.shared.b16 while the assertion expects stmatrix.sync.aligned.m8n8.x4.shared.b16. Note that ptxas accepts both formats, though the latter is more canonical. I've submitted PR #148250 for LLVM, but alternatively we could simply update the test.

peterbell10 · 2025-07-11T16:28:23Z

Yes you can just update the test

python/test/unit/language/test_tensor_descriptor.py

lezcano · 2025-07-11T19:20:20Z

Out of curiosity, does nvvm.ldmatrix/stmatrix support the new Blackwell ops? We may want to use them at some point

Pecco-314 · 2025-07-12T17:25:22Z

Out of curiosity, does nvvm.ldmatrix/stmatrix support the new Blackwell ops? We may want to use them at some point

It is not supported yet in LLVM, but I proposed a new PR. If merged, we will be able to use the new m16n8 ops.

lezcano · 2025-07-12T17:55:10Z

Amazing, thank you!

lezcano

LGTM, but let's wait for @peterbell10's review

ThomasRaoux · 2025-07-15T02:17:03Z

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

Pecco-314 · 2025-07-15T02:26:04Z

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

In fact, the NVPTX backend implementation has been completed. It just lacks some interfaces to generate the corresponding PTX code. As mentioned in comments of LLVM PR #148377, I am currently working on it.

ThomasRaoux · 2025-07-15T02:28:32Z

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

In fact, the NVPTX backend implementation has been completed. It just lacks some interfaces to generate the corresponding PTX code. As mentioned in comments of this LLVM PR, I am currently working on it.

I don't see ldmatrix support in there?

Pecco-314 · 2025-07-15T06:31:36Z

we need to add support for the new ldmatrix m16n16 on blackwell. I don't see it in LLVM right now. I think I'll have to reintroduce some of the ldmatrix op unless this is already support

In fact, the NVPTX backend implementation has been completed. It just lacks some interfaces to generate the corresponding PTX code. As mentioned in comments of this LLVM PR, I am currently working on it.

I don't see ldmatrix support in there?

I just wrote it today. See LLVM PR #148783.

Pecco-314 added 2 commits July 11, 2025 11:37

Replace FenceAsyncSharedOp

b840317

Add ConvertNVVMToLLVMPass to the pipeline

e408e1a

Pecco-314 requested a review from ptillet as a code owner July 11, 2025 03:53

Replace LoadMatrixOp and StoreMatrixOp

838195c

Pecco-314 changed the title ~~[NVIDIA] Replace the NVGPU::FenceAsyncSharedOp with the equivalent NVVM Op~~ [NVIDIA] Replace some NVGPU ops with equivalent NVVM ops (part 2) Jul 11, 2025

peterbell10 reviewed Jul 11, 2025

View reviewed changes

Update the test

be2ce55

lezcano reviewed Jul 11, 2025

View reviewed changes

python/test/unit/language/test_tensor_descriptor.py Show resolved Hide resolved

lezcano approved these changes Jul 12, 2025

View reviewed changes

peterbell10 merged commit 0560390 into triton-lang:main Jul 14, 2025
9 checks passed

Pecco-314 deleted the nvvm-2 branch July 23, 2025 06:53

[NVIDIA] Replace some NVGPU ops with equivalent NVVM ops (part 2) #7471

[NVIDIA] Replace some NVGPU ops with equivalent NVVM ops (part 2) #7471

Uh oh!

Conversation

Pecco-314 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New contributor declaration

Uh oh!

peterbell10 left a comment

Choose a reason for hiding this comment

Uh oh!

Jokeren commented Jul 11, 2025

Uh oh!

Pecco-314 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterbell10 commented Jul 11, 2025

Uh oh!

Uh oh!

lezcano commented Jul 11, 2025

Uh oh!

Pecco-314 commented Jul 12, 2025

Uh oh!

lezcano commented Jul 12, 2025

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ThomasRaoux commented Jul 15, 2025

Uh oh!

Pecco-314 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasRaoux commented Jul 15, 2025

Uh oh!

Pecco-314 commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Pecco-314 commented Jul 11, 2025 •

edited

Loading

Pecco-314 commented Jul 11, 2025 •

edited

Loading

Pecco-314 commented Jul 15, 2025 •

edited

Loading