Merge OpenAI Triton commit `6c3e953` #2388

whitneywhtsang · 2024-09-30T23:33:34Z

This PR change the Triton base from e7ec3fe to 6c3e953 (Sept 30).
Pass rate: 98.99%

Please do not squash and merge this PR.

Currently we sleep between each rep for Triton kernels, but not for the cuBLAS kernel. This may improve cuBLAS performance on fp8 due to thermal issues.

Fixing an compile error like below when passing dtype through kernel arg for `tl._experimental_descriptor_load`: AttributeError: 'constexpr' object has no attribute 'to_ir'

) This helps to improve writeout to use `global_store_dwordx2`. Along the way this PR - Fixed the issue with getOrder for mfma layout - Fixed the issue with reduceOp when dealing with mfma.transposed layout In general, getOrder and getThreadOrder can return different values, and this is the case for mfma.transposed layout. Therefore, we shouldn't assume order and threadOrder are always the same.

LLD is not supported on macOS. This addresses failures like > clang: error: invalid linker name in argument '-fuse-ld=lld' See https://github.com/triton-lang/triton/actions/runs/11099205977/job/30833066194#step:10:61

…idia_gpu ops (#4686) If you want to dump layouts read from an MLIR file, and that file contains ops like `triton_nvidia_gpu.warp_group_dot`, this tool needs to know about the `triton_nvidia_gpu` dialect, or else it will throw an error about not finding the dialect

I think we should always set the right alignment to the `maskedload`/`maskedstore` instructions.

Pinned LLVM to v19; cannot do the same for LLD though. This allows us to revert #4827.

Previously, if an arg inside the loop was marked as a depArg, then a new iter_arg would be added to the for loop to handle the arg; but any usages of these variables _after_ the for loop would not be updated; those usages would get the wrong value. This PR fixes this by updating the return mapping. See the comment added in StreamPipeline.cpp for an example. Co-authored-by: Hongtao Yu <[email protected]>

…IDIA GPUs (#4674) This PR adds the "cupti_pcsampling" backend for collecting and attributing instruction samples to the corresponding GPU code, including the file path, function name, and line number. It currently serializes kernel execution so that kernel runtime and GPU samples can be collected in the same pass.

…ocal_alloc` ops (#4763) This PR enables the use of `stmatrix` for `local_alloc` ops through linear layout and removes the legacy code from the `TargetInfo` class.

It's very common that we need to figure out the exact commit from which the current installed triton package is compiled. Right now it will just show a version number like `3.0.0` which isn't quite helpful. With this commit we have ``` > pip show triton Name: triton Version: 3.0.0+git78e4f837 ```

ptillet and others added 8 commits September 27, 2024 10:02

[testing] moved di = torch._dynamo.device_interface into backend (#…

0b4feb7

…4818)

09-persistent-matmul.py bugfix (#4820)

fe47f98

Currently we sleep between each rep for Triton kernels, but not for the cuBLAS kernel. This may improve cuBLAS performance on fp8 due to thermal issues.

[FRONTEND] Support passing dtype as constexpr for tma load (#4821)

6af74b2

Fixing an compile error like below when passing dtype through kernel arg for `tl._experimental_descriptor_load`: AttributeError: 'constexpr' object has no attribute 'to_ir'

[BUILD] Avoid using lld as the linker on macOS (#4827)

1e88441

LLD is not supported on macOS. This addresses failures like > clang: error: invalid linker name in argument '-fuse-ld=lld' See https://github.com/triton-lang/triton/actions/runs/11099205977/job/30833066194#step:10:61

[AMD] Add alignment information to maskedLoad/maskedStore (#4816)

1df64d1

I think we should always set the right alignment to the `maskedload`/`maskedstore` instructions.

[CI][macOS] Pin LLVM version and install lld (#4831)

256ef34

Pinned LLVM to v19; cannot do the same for LLD though. This allows us to revert #4827.

whitneywhtsang requested a review from pbchekin September 30, 2024 23:33

whitneywhtsang self-assigned this Sep 30, 2024

davidberard98 and others added 3 commits September 30, 2024 17:28

[BACKEND] Linear Layout with stmatrix part 2: support stmatrix for `l…

49266aa

…ocal_alloc` ops (#4763) This PR enables the use of `stmatrix` for `local_alloc` ops through linear layout and removes the legacy code from the `TargetInfo` class.

whitneywhtsang closed this Oct 1, 2024

whitneywhtsang force-pushed the whitneywhtsang/merge branch from aa1436d to 9bda03d Compare October 1, 2024 02:24

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 1df64d1~~ Merge OpenAI Triton commit 6af74b2 Oct 1, 2024

whitneywhtsang reopened this Oct 1, 2024

whitneywhtsang marked this pull request as ready for review October 1, 2024 03:07

pbchekin approved these changes Oct 1, 2024

View reviewed changes

Merge commit '6c3e9535c44774dfd56357acba9c2183b247f58e'

17dac54

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 2dbc39c to 17dac54 Compare October 1, 2024 17:23

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 6af74b2~~ Merge OpenAI Triton commit 6c3e953 Oct 1, 2024

whitneywhtsang merged commit 17dac54 into main Oct 1, 2024
4 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch October 1, 2024 17:58

whitneywhtsang mentioned this pull request Oct 1, 2024

Merge OpenAI Triton till Oct 11st #2383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge OpenAI Triton commit `6c3e953` #2388

Merge OpenAI Triton commit `6c3e953` #2388

Uh oh!

whitneywhtsang commented Sep 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Merge OpenAI Triton commit 6c3e953 #2388

Merge OpenAI Triton commit 6c3e953 #2388

Uh oh!

Conversation

whitneywhtsang commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Merge OpenAI Triton commit `6c3e953` #2388

Merge OpenAI Triton commit `6c3e953` #2388

whitneywhtsang commented Sep 30, 2024 •

edited

Loading