Skip to content

Conversation

@whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Dec 10, 2024

This PR change the Triton base from 89c0b0a to 4d2e9e5 (Dec 9).
Pass rate: 99.84%->99.82% (#2980)

Please do not squash and merge this PR.

antiagainst and others added 11 commits December 9, 2024 00:32
…5362)

This relands triton-lang/triton#5139:

Adding a shortcut case for fp8 MFMA to dot operand layout conversion
that avoids using shared memory, to speed up FP8 attention kernels.

---------

Co-authored-by: ilia-cher <[email protected]>
…ls on (#5286)

This pull request contains changes for all tutorials except
`09-persistent-matmul.py`, as there is a lot of cuda-specific function.

---------

Signed-off-by: Anatoly Myachev <[email protected]>
All deleted libraries are either in `${triton_libs}` or in
`${conversion_libs}`.

Signed-off-by: Anatoly Myachev <[email protected]>
`local_load` should be in the same stage that the `subview` that it is
using.
1. Fix the problem that [m, k, n] but not [m, n, k] is returned on the
nvidia backend
2. Check both int8 and float8
3. Add a new compiler error test
4. Fix dtype check in AMD backend
@whitneywhtsang whitneywhtsang self-assigned this Dec 10, 2024
@whitneywhtsang whitneywhtsang marked this pull request as ready for review December 10, 2024 05:22
@whitneywhtsang whitneywhtsang merged commit 3f4fdd1 into main Dec 10, 2024
6 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch December 10, 2024 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants