You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following script will crash the compiler with ambiguous error
message:
```
import torch
import triton
import triton.language as tl
from triton.experimental import gluon
from triton.experimental.gluon import language as ttgl
from triton.experimental.gluon.language.nvidia import blackwell
from triton.experimental.gluon.language.nvidia.blackwell import mbarrier, tma
@gluon.jit
def async_tma_kernel(input_desc, XBLOCK: ttgl.constexpr, smem_layout: ttgl.constexpr):
# smem = ttgl.allocate_shared_memory(ttgl.float32, [XBLOCK, XBLOCK], smem_layout)
smem = ttgl.allocate_shared_memory(ttgl.float32, [XBLOCK, XBLOCK], mbarrier.MBarrierLayout())
bar = ttgl.allocate_shared_memory(ttgl.int64, [1], mbarrier.MBarrierLayout())
mbarrier.init(bar, count=1)
mbarrier.expect(bar, XBLOCK * XBLOCK * ttgl.float32.primitive_bitwidth // 8)
tma.async_copy_global_to_shared(input_desc, [0, 0], bar, smem)
mbarrier.wait(bar, 0)
mbarrier.invalidate(bar)
tma.async_copy_shared_to_global(input_desc, [0, 0], smem)
tma.store_wait(0)
def test_async_tma():
input = torch.randn((1024, 1024), device="cuda", dtype=torch.float32)
XBLOCK = 128
shared_layout = ttgl.NVMMASharedLayout(swizzle_byte_width=128, element_bitwidth=32, rank=2)
input_desc = gluon.nvidia.hopper.TensorDescriptor.from_tensor(input, [XBLOCK, XBLOCK], shared_layout)
h = async_tma_kernel[(1, )](input_desc, XBLOCK, shared_layout, num_warps=1)
print(f"input_desc: {input_desc}")
test_async_tma()
```
The reason is "smem" mis-use gluon's layout. This PR adds check at gluon
language level to emit better crash message.
Adding check in async_tma_copy_global_to_local's verifier however, will
crash triton's lowering path (triton-nvidia-tma-lowering), so I think
the check should be only added for gluon.
<!---
The core Triton is a small number of people, and we receive many PRs
(thank
you!). To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**
Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.
-->
# New contributor declaration
- [ x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x ] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.
- Select one of the following.
- [x] I have added tests.
- `/test` for `lit` tests
- `/unittest` for C++ tests
- `/python/test` for end-to-end tests
- [] This PR does not need a test because `error message is improved`.
- Select one of the following.
- [x ] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
and using the instructions it generates is not minimal.)
0 commit comments