-
Notifications
You must be signed in to change notification settings - Fork 75
Merge OpenAI Triton commit 6c3e953
#2388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently we sleep between each rep for Triton kernels, but not for the cuBLAS kernel. This may improve cuBLAS performance on fp8 due to thermal issues.
Fixing an compile error like below when passing dtype through kernel arg for `tl._experimental_descriptor_load`: AttributeError: 'constexpr' object has no attribute 'to_ir'
) This helps to improve writeout to use `global_store_dwordx2`. Along the way this PR - Fixed the issue with getOrder for mfma layout - Fixed the issue with reduceOp when dealing with mfma.transposed layout In general, getOrder and getThreadOrder can return different values, and this is the case for mfma.transposed layout. Therefore, we shouldn't assume order and threadOrder are always the same.
LLD is not supported on macOS. This addresses failures like > clang: error: invalid linker name in argument '-fuse-ld=lld' See https://github.com/triton-lang/triton/actions/runs/11099205977/job/30833066194#step:10:61
…idia_gpu ops (#4686) If you want to dump layouts read from an MLIR file, and that file contains ops like `triton_nvidia_gpu.warp_group_dot`, this tool needs to know about the `triton_nvidia_gpu` dialect, or else it will throw an error about not finding the dialect
I think we should always set the right alignment to the `maskedload`/`maskedstore` instructions.
Pinned LLVM to v19; cannot do the same for LLD though. This allows us to revert #4827.
Previously, if an arg inside the loop was marked as a depArg, then a new iter_arg would be added to the for loop to handle the arg; but any usages of these variables _after_ the for loop would not be updated; those usages would get the wrong value. This PR fixes this by updating the return mapping. See the comment added in StreamPipeline.cpp for an example. Co-authored-by: Hongtao Yu <[email protected]>
…IDIA GPUs (#4674) This PR adds the "cupti_pcsampling" backend for collecting and attributing instruction samples to the corresponding GPU code, including the file path, function name, and line number. It currently serializes kernel execution so that kernel runtime and GPU samples can be collected in the same pass.
…ocal_alloc` ops (#4763) This PR enables the use of `stmatrix` for `local_alloc` ops through linear layout and removes the legacy code from the `TargetInfo` class.
aa1436d to
9bda03d
Compare
1df64d16af74b2
It's very common that we need to figure out the exact commit from which the current installed triton package is compiled. Right now it will just show a version number like `3.0.0` which isn't quite helpful. With this commit we have ``` > pip show triton Name: triton Version: 3.0.0+git78e4f837 ```
pbchekin
approved these changes
Oct 1, 2024
2dbc39c to
17dac54
Compare
6af74b26c3e953
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from e7ec3fe to 6c3e953 (Sept 30).
Pass rate: 98.99%
Please do not squash and merge this PR.