Merge OpenAI Triton commit `1064b59` #2550

whitneywhtsang · 2024-10-24T01:31:54Z

This PR change the Triton base from a19f324 to 1064b59 (Oct 22).
Pass rate: 98.99%->98.98%

Please do not squash and merge this PR.

This PR is introducing support for two new AMDGPU specific operations: - `amdgpu.buffer_load` : it loads from global memory via a pointer and a tensor offset - `amdgpu.buffer_store` : it store a `value` in global memory via a pointer and a tensor offset I am also adding conversions patterns in `LoadStoreOpToLLVM.cpp`. These are similar to the ones for `tt.load` and `tt.store`, but different enough to deserve a specific rewrite. I tried to hoist common functionalities between the 4 different patterns, to reduce duplication.

The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `it simply removes the definition of an unused variable`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)

Provide slide and video links for the conference. The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [ ] I am not making a trivial change, such as fixing a typo in a comment. - [ ] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [ ] This PR does not need a test because `FILL THIS IN`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)

We allow DotOperand within MemoryOpToLLVM in the buggy ampere case via LLs. This allows us to remove two workarounds that we added in a previous PR. We add tests in test_pipeliner.py We also remove some implementation-defined behaviour (overflows / NaNs) in test_core.py, thus making the tests more resilient and realistic.

And remove the outdated performance tests. We can also add various float8 types and move `scaled_dot` tests here.

Signed-off-by: Anatoly Myachev <[email protected]>

…3973) For matmul with following arithmetic operations such as `acc += tl.dot(a, b)`, currently the mma layout of the `dot` result isn't propagated into the subsequent `add`. As a result when the dot is inside a loop, there will be repeated layout conversion from mma to blocked. I'm fixing this by allowing mma layout propagated so that it can be reused.

giuseros and others added 7 commits October 21, 2024 09:13

[TEST] Reenable mixed precision dot tests (#4965)

9357902

And remove the outdated performance tests. We can also add various float8 types and move `scaled_dot` tests here.

Fix coverity issues (#4967)

ed39cb0

Signed-off-by: Anatoly Myachev <[email protected]>

whitneywhtsang requested a review from pbchekin October 24, 2024 01:31

whitneywhtsang self-assigned this Oct 24, 2024

pbchekin approved these changes Oct 24, 2024

View reviewed changes

whitneywhtsang force-pushed the whitneywhtsang/merge2 branch 2 times, most recently from bc41fdf to b385d79 Compare October 24, 2024 13:48

whitneywhtsang marked this pull request as ready for review October 24, 2024 13:48

Merge commit '1064b598385c49f03fbc73f6839f578146beb4e4'

d665a99

whitneywhtsang force-pushed the whitneywhtsang/merge2 branch from b385d79 to d665a99 Compare October 24, 2024 15:10

whitneywhtsang merged commit d665a99 into main Oct 24, 2024
4 checks passed

whitneywhtsang deleted the whitneywhtsang/merge2 branch October 24, 2024 16:06

whitneywhtsang mentioned this pull request Oct 27, 2024

Merge OpenAI Triton till Oct 25th #2483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge OpenAI Triton commit `1064b59` #2550

Merge OpenAI Triton commit `1064b59` #2550

Uh oh!

whitneywhtsang commented Oct 24, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Merge OpenAI Triton commit 1064b59 #2550

Merge OpenAI Triton commit 1064b59 #2550

Uh oh!

Conversation

whitneywhtsang commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Merge OpenAI Triton commit `1064b59` #2550

Merge OpenAI Triton commit `1064b59` #2550

whitneywhtsang commented Oct 24, 2024 •

edited

Loading