-
Notifications
You must be signed in to change notification settings - Fork 75
Merge OpenAI Triton commit 00cf53f
#5424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
whitneywhtsang
wants to merge
12
commits into
main
Choose a base branch
from
whitneywhtsang/merge
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit adds tests for `unrealized_conversion_cast` and make the visitor in the axis analysis more robust. The input AxisInfo should not be propagated if that would inject an AxisInfo with an incorrect rank. That could cause a crash. `unrealized_conversion_cast` ops typically appear during a dialect conversion, but they are also useful for debugging / rapid prototyping purposes. It allows programmers to hand-write the expected low-level IR and connect it with high-level IR that will be lowered as usual. In such a scenario, programmers write IR such as "Case 2" in the added test case. Such IR currently crashes the axis analysis. Also fix a crash when a function call with multi-dimensional function argument is analyzed. (This is now triggered due to the improved `unrealized_conversion_cast` handling.)
This PR add device-side TMA support for gluon. cc @ThomasRaoux
Use the correct condition value when other value exists.
We fix a number of cases where the constancy analysis could be improved. The code is quite messy, and the whole pass could do with a full rewrite, but we are not doing so ATM. This PR was mostly vibecoded, with a cleaning pass afterwards from me.
…s. (#8512) A few tests in tensor descriptor use "cuda" as device rather than a 'device' fixture in the test arguments. This PR changes those tests to use 'device' fixture instead so that third party users without a cuda runtime can run on these tests. <!--- The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because it is editing the test file only. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) Co-authored-by: Micah Weston <[email protected]>
Add shared memory capacity for `gfx1250` which is 320 kbyte.
This RP fixes the layout and lowering for wmma scaled with small k dim where the tensor's k dimension is smaller than the a single wmma scaled instruction's k dimension. Add corresponding lit tests for common cases.
Prevent crash when lowering memdesc of pointer
…n (#8493) TP > 1 is not supported in this mode
b61c0b1 to
2ae6dd8
Compare
chengjunlu
approved these changes
Nov 4, 2025
Signed-off-by: Whitney Tsang <[email protected]>
2ff238c to
0f499b9
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from c172d53 to 00cf53f (Oct 23).
Pass rate: 94.55%->94.54%