Skip to content

Conversation

@whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Oct 24, 2024

This PR change the Triton base from a19f324 to 1064b59 (Oct 22).
Pass rate: 98.99%->98.98%

Please do not squash and merge this PR.

giuseros and others added 7 commits October 21, 2024 09:13
This PR is introducing support for two new AMDGPU specific operations:
- `amdgpu.buffer_load` : it loads from global memory via a pointer and a
tensor offset
- `amdgpu.buffer_store` : it store a `value` in global memory via a
pointer and a tensor offset

I am also adding conversions patterns in `LoadStoreOpToLLVM.cpp`. These
are similar to the ones for `tt.load` and `tt.store`, but different
enough to deserve a specific rewrite. I tried to hoist common
functionalities between the 4 different patterns, to reduce duplication.
The core Triton is a small number of people, and we receive many PRs
(thank
you!).  To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.

- [ ] I am not making a trivial change, such as fixing a typo in a
comment.

- [ ] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).

- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `it simply removes the
definition of an unused variable`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
Provide slide and video links for the conference.

The core Triton is a small number of people, and we receive many PRs
(thank
you!).  To help us review your code more quickly, **if you are a new
contributor (less than 3 PRs merged) we ask that you complete the
following
tasks and include the filled-out checklist in your PR description.**

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them.

- [ ] I am not making a trivial change, such as fixing a typo in a
comment.

- [ ] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).

- [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
  - [ ] This PR does not need a test because `FILL THIS IN`.

- Select one of the following.
  - [ ] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
We allow DotOperand within MemoryOpToLLVM in the buggy ampere case via
LLs. This allows us to remove two workarounds that we added in a
previous PR.

We add tests in test_pipeliner.py

We also remove some implementation-defined behaviour (overflows / NaNs)
in test_core.py, thus making the tests more resilient and realistic.
And remove the outdated performance tests.
We can also add various float8 types and move `scaled_dot` tests here.
Signed-off-by: Anatoly Myachev <[email protected]>
…3973)

For matmul with following arithmetic operations such as `acc +=
tl.dot(a, b)`, currently the mma layout of the `dot` result isn't
propagated into the subsequent `add`. As a result when the dot is inside
a loop, there will be repeated layout conversion from mma to blocked.
I'm fixing this by allowing mma layout propagated so that it can be
reused.
@whitneywhtsang whitneywhtsang self-assigned this Oct 24, 2024
@whitneywhtsang whitneywhtsang force-pushed the whitneywhtsang/merge2 branch 2 times, most recently from bc41fdf to b385d79 Compare October 24, 2024 13:48
@whitneywhtsang whitneywhtsang marked this pull request as ready for review October 24, 2024 13:48
@whitneywhtsang whitneywhtsang merged commit d665a99 into main Oct 24, 2024
4 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge2 branch October 24, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants