Merge OpenAI Triton commit `152ef2d` #2583

whitneywhtsang · 2024-10-28T15:27:36Z

This PR change the Triton base from 13594bb to 152ef2d (Oct 24).
Pass rate: 98.98%

Please do not squash and merge this PR.

This PR added `fast_expf` operator under libdevice for AMD hardwares. Aligning with other operators in the exp family, the way to deal with denorm inputs is controled by `__HIP_FTZ`, which currently is fixed to be True. - If `__HIP_FTZ = 1`, the operator uses `llvm.amdgcn.exp2.f32`, which will flush denorms in inputs and outputs; - If `__HIP_FTZ = 0`, the operator uses `llvm.exp2.f32`, which will not flush denorms. Ref: https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/cuda2gcn/src/precision.cl Fixes ROCm/triton-internal#314

…lds of `Autotuner` (#4921) Motivation: https://github.com/triton-lang/triton/pull/4496/files#r1801756225 Signed-off-by: Anatoly Myachev <[email protected]>

… use it for vectorized atomics (#4982) Vectorized atomics on NVIDIA (triton-lang/triton#4971) are only available on Hopper (>=sm90) and PTX >= 8.1. It's possible to be running with PTX 8.0 on a Hopper machine. This PR passes ptx-version to the ttgir->llir conversion pass for NVIDIA, and uses the ptx version to determine whether vectorized atomics should be used.

`add_optimize_dot_operands` may introduce a immutable shared buffer for transposed dot operands. Our stream-pipeliner then replaces the immutable buffer with a mutable buffer to be able to reuse it across iterations (pre-fetching). This will then produce incorrect transOps because the input is mutable but the result is immutable. This PR rewrites those transOps to output a mutable layout.

…CES` is set (#4986) Based on the feedback from AMD, the device mapping problem has to be addressed by the ROCm team, so we emit an error for now.

This PR is only introducing a ttgir pass to convert `tt.load`/`tt.store` to `amdgpu.buffer_load`/`amdgpu.buffer_load`, _when this is possible_ : this means we need to check for 3 conditions: 1. The pointer arithmetic has been canonicalized (`scalarPtr->splat->addptr->load/store`) 2. The offsets are 32-bits 3. The offsets are non-negative. We use a mix of analysis and assumptions to verify this condition Right now the functionality is gated behind an `AMDGCN_USE_BUFFER_OPS`, which now also covers the pointer canonicalization pass which is mostly meant to handle this.

…#4983) This PR: - Introduces fallback from normal TTG->LLVM converter in case it does not support given local_load. - Enables conversion of MFMA dot layout to Linear Layout in local_load pattern.

knwng and others added 7 commits October 24, 2024 08:55

[AUTOTUNER] Return num_warmups, num_reps and use_cuda_graph fie…

17baf40

…lds of `Autotuner` (#4921) Motivation: https://github.com/triton-lang/triton/pull/4496/files#r1801756225 Signed-off-by: Anatoly Myachev <[email protected]>

[PROTON] Emit an error for the roctracer backend if `HIP_VISIBLE_DEVI…

9719dbf

…CES` is set (#4986) Based on the feedback from AMD, the device mapping problem has to be addressed by the ROCm team, so we emit an error for now.

[AMD] Enable shared->MFMA dot operand conversion through LinearLayout (…

152ef2d

…#4983) This PR: - Introduces fallback from normal TTG->LLVM converter in case it does not support given local_load. - Enables conversion of MFMA dot layout to Linear Layout in local_load pattern.

whitneywhtsang requested a review from pbchekin October 28, 2024 15:27

whitneywhtsang self-assigned this Oct 28, 2024

etiotto approved these changes Oct 28, 2024

View reviewed changes

whitneywhtsang closed this Oct 28, 2024

whitneywhtsang force-pushed the whitneywhtsang/merge2 branch from e1f9267 to e6df65e Compare October 28, 2024 16:52

Merge commit '152ef2deb8852d5c84f9ffba217b3a7c8f398c5f'

1bc283c

whitneywhtsang reopened this Oct 28, 2024

whitneywhtsang marked this pull request as ready for review October 28, 2024 16:52

whitneywhtsang merged commit 1bc283c into main Oct 28, 2024
8 checks passed

whitneywhtsang deleted the whitneywhtsang/merge2 branch October 28, 2024 17:59

whitneywhtsang mentioned this pull request Nov 12, 2024

Merge OpenAI Triton till Oct 25th #2483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge OpenAI Triton commit `152ef2d` #2583

Merge OpenAI Triton commit `152ef2d` #2583

Uh oh!

whitneywhtsang commented Oct 28, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Merge OpenAI Triton commit 152ef2d #2583

Merge OpenAI Triton commit 152ef2d #2583

Uh oh!

Conversation

whitneywhtsang commented Oct 28, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Merge OpenAI Triton commit `152ef2d` #2583

Merge OpenAI Triton commit `152ef2d` #2583