[NVGPU] Fix nvdsl examples #156830

castigli · 2025-09-04T08:52:37Z

This PR aims at fixing the nvdsl examples which got a bit out of sync not being tested in the CI.

The fixed bugs were related to the following PRs:

move to nanobind [mlir python] Port Python core code to nanobind. #118583
split gpu module initialization [mlir][gpu] Change GPU modules to globals #135478

github-actions · 2025-09-04T08:52:56Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

Wolfram70 · 2025-09-04T18:51:13Z

Thanks for bringing this to our attention!

I looked into this a bit, and it does look like this crash is occurring in the --gpu-module-to-binary pass.
For Ch4.py, dumping the .mlir file and extracting the LLVMIR generated during --gpu-module-to-binary, we get
extracted-llvmir.txt. Running llc -mtriple=nvptx64 -mcpu=sm_90a -mattr=+ptx80 on this reproduces this crash exactly so it seems to be an issue during codegen. I am not sure why this is occurring in this specific case.

@durga4github @abhilash1910 Do you have any idea why this might be happening?

abhilash1910 · 2025-09-05T03:55:59Z

Taking a look at the codegen. Thanks for highlighting. IR does not seem incorrect though at first glance.
Edit: Fix is in progress.

@durga4github

Context: Highlighted from #156830 , this is an Isel lowering issue in the NVPTX backend for prefetch.tensormap intrinsic. It is caused by unchecked pattern rewrite during infer-address-space pass. This intrinsic is valid only for const, param and generic address-spaces. Any other address space is invalid. Currently, this intrinsic gets falsely re-written to target AS(1), when the pointer-argument of the intrinsic comes as an argument of a kernel function. So, this patch adds a check on the correct address-spaces before re-writing them. cc @durga4github FYI: @Wolfram70 @rupprecht @castigli

llvmbot · 2025-10-02T09:36:39Z

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-nvgpu

@llvm/pr-subscribers-mlir-gpu

Author: Giacomo Castiglioni (castigli)

Changes

This PR aims at fixing the nvdsl examples which got a bit out of sync not being tested in the CI.

The fixed bugs were related to the following PRs:

move to nanobind #118583
split gpu module initialization #135478

There is one remaining bug that I think #153134 introduced. When running the Ch4 and Ch5 the nvvm.prefetch tensormap intrisic leads to the following error on sm_90a

LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.prefetch.tensormap
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: mlir-opt before.mlir --gpu-module-to-binary
1.	Running pass 'Function Pass Manager' on module 'LLVMDialectModule'.
2.	Running pass 'NVPTX DAG-&gt;DAG Pattern Instruction Selection' on function '@<!-- -->gemm_multistage_kernel'
...

Perahps @Wolfram70 or @grypp could help me out with the last bug?
Could be the solution to revert momentarily to inline ptx?
[edit] this was resolved in #159253

Full diff: https://github.com/llvm/llvm-project/pull/156830.diff

3 Files Affected:

(modified) mlir/test/Examples/NVGPU/Ch5.py (+1-1)
(modified) mlir/test/Examples/NVGPU/tools/nvdsl.py (+3-4)
(modified) mlir/test/Examples/NVGPU/tools/nvgpucompiler.py (+3-1)

diff --git a/mlir/test/Examples/NVGPU/Ch5.py b/mlir/test/Examples/NVGPU/Ch5.py
index f98cfd758a75f..91c346c837dda 100644
--- a/mlir/test/Examples/NVGPU/Ch5.py
+++ b/mlir/test/Examples/NVGPU/Ch5.py
@@ -156,7 +156,7 @@ def producer_loop(
 ):
     phase = const(True, ty=T.bool())
 
-    for iv, phase in scf.for_(0, (K // TILE_K), 1, [phase]):
+    for iv, phase, _ in scf.for_(0, (K // TILE_K), 1, [phase]):
         stage = iv % num_stages
         # Wait MMA to be done
         mbar_mma[stage].try_wait(phase)
diff --git a/mlir/test/Examples/NVGPU/tools/nvdsl.py b/mlir/test/Examples/NVGPU/tools/nvdsl.py
index 90dbb2355e1c8..d4c50fc9bc28d 100644
--- a/mlir/test/Examples/NVGPU/tools/nvdsl.py
+++ b/mlir/test/Examples/NVGPU/tools/nvdsl.py
@@ -84,8 +84,7 @@ def arrive(self, txcount: int = 0, predicate=None):
                 self.mbar_group_op, txcount_op, self.id_op, predicate=predicate
             )
         else:
-            nvgpu.mbarrier_arrive(
-                ir.Type.parse("!nvgpu.mbarrier.token"), self.mbar_group_op, self.id_op
+            nvgpu.mbarrier_arrive(self.mbar_group_op, self.id_op
             )
 
     def try_wait(self, phase: bool = False, ticks: int = 10000000):
@@ -144,7 +143,7 @@ def create_descriptor(self, device_ptr):
             device_ptr,
         )
         self.tma_descriptor = nvgpu.TmaCreateDescriptorOp(
-            tma_descriptor_ty, device_unranked_memref, map(const, self.tma_box_shape)
+            tma_descriptor_ty, device_unranked_memref, list(map(const, self.tma_box_shape))
         )
         return self.tma_descriptor.result
 
@@ -156,7 +155,7 @@ def load(self, dest, mbarrier: Mbarriers, coords=[0], predicate=None):
             dest,
             mbarrier.mbar_group_op,
             self.tma_descriptor,
-            coordinates=map(const, coords),
+            coordinates=list(map(const, coords)),
             mbarId=mbarrier.id_op,
             predicate=predicate,
         )
diff --git a/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py b/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py
index 1c9cc74fcd169..4b661f8df6a9f 100644
--- a/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py
+++ b/mlir/test/Examples/NVGPU/tools/nvgpucompiler.py
@@ -35,9 +35,11 @@ def compile(self, module: ir.Module):
 
     def jit(self, module: ir.Module) -> execution_engine.ExecutionEngine:
         """Wraps the module in a JIT execution engine."""
-        return execution_engine.ExecutionEngine(
+        ee = execution_engine.ExecutionEngine(
             module, opt_level=self.opt_level, shared_libs=self.shared_libs
         )
+        ee.initialize()
+        return ee
 
     def compile_and_jit(self, module: ir.Module) -> execution_engine.ExecutionEngine:
         """Compiles and jits the module."""

durga4github · 2025-10-08T09:43:41Z

@castigli , I updated the commit-msg since the prefetch.tensormap issue is resolved.

Could you please rebase and push once? I can initiate the workflows to run, to get CI results.

castigli · 2025-10-08T12:23:43Z

@castigli , I updated the commit-msg since the prefetch.tensormap issue is resolved.

Could you please rebase and push once? I can initiate the workflows to run, to get CI results.

Done!

durga4github · 2025-10-08T13:01:19Z

@castigli , I updated the commit-msg since the prefetch.tensormap issue is resolved.
Could you please rebase and push once? I can initiate the workflows to run, to get CI results.

Done!

Thanks, initiated the CI.

github-actions · 2025-10-08T13:01:29Z

✅ With the latest revision this PR passed the Python code formatter.

durga4github · 2025-10-08T13:02:21Z

@grypp , Kindly take a look when you get a chance.

grypp

Thanks for fixing these

grypp · 2025-10-08T13:10:34Z

I feel like we should enable LIT testing without running for these test so at least they can get compiled.

castigli · 2025-10-08T16:45:50Z

I feel like we should enable LIT testing without running for these test so at least they can get compiled.

In principle I agree, but I don't have good ideas to check both compile-only and compile-and-run without mucking the code too much.
What about something like

...
# RUN: env MLIR_RUN_CUDA_SM90_TESTS=%mlir_run_cuda_sm90_tests
...
if run_if_cuda_sm90_enabled(lambda: saxpy(x, y, alpha)) is None:
    #  4. Verify MLIR with reference computation
    ref = np.ones((M, N), np.float32)
    ref += x * alpha
    np.testing.assert_allclose(y, ref, rtol=5e-03, atol=1e-01)
print("PASS")
# CHECK: PASS

with the util defined as

def run_if_cuda_sm90_enabled(func, *args, **kwargs):
    """Execute a function if CUDA SM90 tests are enabled, otherwise print a warning."""
    mlir_run_cuda_sm90_tests = os.getenv("MLIR_RUN_CUDA_SM90_TESTS")
    if mlir_run_cuda_sm90_tests == "1" or mlir_run_cuda_sm90_tests is None:
        return func(*args, **kwargs)
    else:
        print("warning: skipping test execution")
        return -1

grypp · 2025-10-11T05:54:49Z

Yes your code looks nice. For non-execution part we can print the generated IR and do add CHECK in the python code

…ithout running

castigli · 2025-10-13T16:08:07Z

I changed a bit the strategy, I think it looks cleaner now. If sm_90 is available the checks are as before, otherwise the IR gets dumped and checked.

castigli · 2025-10-27T10:44:27Z

@durga4github could you re-trigger the tests? Should I rebase first?

durga4github · 2025-10-27T12:41:53Z

@durga4github could you re-trigger the tests? Should I rebase first?

I have re-triggered them now. If they pass, I think we should merge it and improvise on subsequent PRs.

grypp

Thanks for adding the DUMPIR. fwiw, you could check fewer IR that FileCheck tests.

mlir/test/Examples/NVGPU/Ch3.py

mlir/test/Examples/NVGPU/tools/nvgpucompiler.py

castigli · 2025-10-31T15:29:55Z

@durga4github, sorry to bother again, but I did some small changes as requested by @grypp .
Whenever you have time, could you trigger the CI again and merge if it passes (unfortunately I don't have write access yet).

castigli · 2025-11-07T09:36:05Z

@durga4github, whenever is convenient, could you merge this PR for me? Thank you!

github-actions · 2025-11-07T10:53:31Z

@castigli Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvm-ci · 2025-11-07T11:08:24Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building mlir at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/21474

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Examples/NVGPU/Ch2.py' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
env SUPPORT_LIB=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so  sh -c 'if [[ "%mlir_run_cuda_sm90_tests" == "1" ]];  then /usr/bin/python3.10 /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py;  else export MLIR_NVDSL_PRINT_IR=1;  /usr/bin/python3.10 /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py --check-prefix=DUMPIR; fi'
# executed command: env SUPPORT_LIB=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so sh -c 'if [[ "%mlir_run_cuda_sm90_tests" == "1" ]];  then /usr/bin/python3.10 /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py;  else export MLIR_NVDSL_PRINT_IR=1;  /usr/bin/python3.10 /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py --check-prefix=DUMPIR; fi'
# .---command stderr------------
# | sh: 1: [[: not found
# | Traceback (most recent call last):
# |   File "/vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py", line 90, in <module>
# |     saxpy(x, y, alpha)
# |   File "/vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/tools/nvdsl.py", line 426, in wrapper
# |     result = funcBody(*fargs, **kwargs)
# |   File "/vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py", line 35, in saxpy
# |     t1 = gpu.wait(token_ty, [])
# | TypeError: wait() takes from 0 to 1 positional arguments but 2 were given
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Examples/NVGPU/Ch2.py --check-prefix=DUMPIR
# `-----------------------------
# error: command failed with exit status: 2

--

********************

This reverts commit 299df7e.

Reverts #156830 This broke the bots.

joker-eph · 2025-11-07T14:37:57Z

This broke the bots, see https://lab.llvm.org/buildbot/#/builders/138/builds/21474 ; I reverted for now to give you time to investigate.

joker-eph · 2025-11-07T14:41:16Z

mlir/test/Examples/NVGPU/Ch1.py

+# DUMPIR:     %[[MEMCPY3:.*]] = gpu.memcpy async [%[[WAIT1]]] %arg1, %[[MEMREF0]] : memref<256x32xf32>, memref<256x32xf32>
+# DUMPIR:     %[[WAIT2:.*]] = gpu.wait async [%[[MEMCPY3]]]
+# DUMPIR:     return
+# DUMPIR:   }


That seems like a fragile test to me, can this be more targeted?

yes, I will update the test as part of the new PR.

Reverts llvm/llvm-project#156830 This broke the bots.

This PR aims at fixing the nvdsl examples which got a bit out of sync not being tested in the CI. The fixed bugs were related to the following PRs: - move to nanobind llvm#118583 - split gpu module initialization llvm#135478

Reverts llvm#156830 This broke the bots.

castigli · 2025-11-10T09:16:50Z

This broke the bots, see https://lab.llvm.org/buildbot/#/builders/138/builds/21474 ; I reverted for now to give you time to investigate.

Thank you @joker-eph, looking into it.

castigli changed the title ~~Fix nvdsl examples~~ [NVGPU] Fix nvdsl examples Sep 4, 2025

rupprecht mentioned this pull request Sep 17, 2025

[MLIR][NVVM][NVGPU] Combine prefetch and prefetch.tensormap #153134

Merged

abhilash1910 mentioned this pull request Sep 17, 2025

[NVPTX] prefetch.tensormap pattern rewriter fix #159253

Merged

castigli force-pushed the fix-nvdsl branch from e904099 to d0ae04d Compare October 2, 2025 09:33

castigli marked this pull request as ready for review October 2, 2025 09:36

castigli requested a review from grypp as a code owner October 2, 2025 09:36

llvmbot added mlir:gpu mlir mlir:nvgpu labels Oct 2, 2025

fix nvdsl

a915cce

castigli force-pushed the fix-nvdsl branch from d0ae04d to a915cce Compare October 8, 2025 12:22

durga4github approved these changes Oct 8, 2025

View reviewed changes

grypp approved these changes Oct 8, 2025

View reviewed changes

format

49b7e6a

check either computation if sm_90 is available or dump and check IR w…

d1e4ae5

…ithout running

castigli requested a review from grypp October 20, 2025 08:01

grypp approved these changes Oct 27, 2025

View reviewed changes

grypp reviewed Oct 27, 2025

View reviewed changes

mlir/test/Examples/NVGPU/Ch3.py Outdated Show resolved Hide resolved

grypp reviewed Oct 27, 2025

View reviewed changes

mlir/test/Examples/NVGPU/Ch3.py Outdated Show resolved Hide resolved

grypp reviewed Oct 27, 2025

View reviewed changes

mlir/test/Examples/NVGPU/tools/nvgpucompiler.py Show resolved Hide resolved

address review, remove dump ir from decorator

6c0aa6f

format python files

cb1e9ff

durga4github merged commit 299df7e into llvm:main Nov 7, 2025
10 checks passed

joker-eph added a commit that referenced this pull request Nov 7, 2025

Revert "[NVGPU] Fix nvdsl examples (#156830)"

45b4da8

This reverts commit 299df7e.

joker-eph mentioned this pull request Nov 7, 2025

Revert "[NVGPU] Fix nvdsl examples" #166943

Merged

joker-eph added a commit that referenced this pull request Nov 7, 2025

Revert "[NVGPU] Fix nvdsl examples" (#166943)

037fd30

Reverts #156830 This broke the bots.

joker-eph reviewed Nov 7, 2025

View reviewed changes

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 7, 2025

Automerge: Revert "[NVGPU] Fix nvdsl examples" (#166943)

3b66d7c

Reverts llvm/llvm-project#156830 This broke the bots.

vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025

Revert "[NVGPU] Fix nvdsl examples" (llvm#166943)

0a00cc8

Reverts llvm#156830 This broke the bots.

nigham mentioned this pull request Nov 10, 2025

[libc] Implement fchown #167286

Merged

This was referenced Nov 10, 2025

[mlir][python] Add Pythonic wrappers for gpu ops #163883

Merged

[NVGPU] Fix nvdsl examples - take 2 #167321

Open

[NVGPU] Fix nvdsl examples #156830

[NVGPU] Fix nvdsl examples #156830

Uh oh!

Conversation

castigli commented Sep 4, 2025 • edited by durga4github Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

Wolfram70 commented Sep 4, 2025

Uh oh!

abhilash1910 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

durga4github commented Oct 8, 2025

Uh oh!

castigli commented Oct 8, 2025

Uh oh!

durga4github commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

durga4github commented Oct 8, 2025

Uh oh!

grypp left a comment

Choose a reason for hiding this comment

Uh oh!

grypp commented Oct 8, 2025

Uh oh!

castigli commented Oct 8, 2025

Uh oh!

grypp commented Oct 11, 2025

Uh oh!

castigli commented Oct 13, 2025

Uh oh!

castigli commented Oct 27, 2025

Uh oh!

durga4github commented Oct 27, 2025

Uh oh!

grypp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

castigli commented Oct 31, 2025

Uh oh!

castigli commented Nov 7, 2025

Uh oh!

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

llvm-ci commented Nov 7, 2025

Uh oh!

joker-eph commented Nov 7, 2025

Uh oh!

joker-eph Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

castigli Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

castigli commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

castigli commented Sep 4, 2025 •

edited by durga4github

Loading

abhilash1910 commented Sep 5, 2025 •

edited

Loading

llvmbot commented Oct 2, 2025 •

edited

Loading

github-actions bot commented Oct 8, 2025 •

edited

Loading