[Release/3.2.x] AMD Cherry Picks #5413

jataylo · 2024-12-12T14:36:04Z

Testing performed at: pytorch/pytorch#142240

This PR brings in several cherry picks from upstream triton for AMD backend and LLVM bumps. This is primarily for AMD optimisation and new architecture support from the LLVM bump.

Cherry pick list: - triton-lang#4925 - triton-lang#5053 - triton-lang#5019 - triton-lang#5002 - triton-lang#4935 - required additional cherry picks triton-lang#4991 and triton-lang#4951 - triton-lang#4998 - triton-lang#4925 - triton-lang#5281 - triton-lang#5308 - All previous LLVM hash PRs before triton-lang#5308 --------- Co-authored-by: Ilya V <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Lixun Zhang <[email protected]> Co-authored-by: Keren Zhou <[email protected]> Co-authored-by: Alexander Efimov <[email protected]> Co-authored-by: Kyle Wang <[email protected]> Co-authored-by: Jungwook Park <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]>

Reverts triton-lang#5191 due to some mlir errors in pytorch unit tests Smaller set of cherry picks: - triton-lang#5308 (and previous LLVM upgrades) - triton-lang#5281 - triton-lang#4925 - triton-lang#5053 - triton-lang#5019 - triton-lang#4998 --------- Co-authored-by: Jungwook Park <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: Hongtao Yu <[email protected]> Co-authored-by: Lei Zhang <[email protected]> Co-authored-by: Ilya V <[email protected]> Co-authored-by: Kyle Wang <[email protected]>

This reverts commit 7e401df.

This reverts commit 2d8093c.

In the case of 16 bit floats operands for tt::AtomicRMWOp, construct only one LLVM::AtomicRMWOp but use vector of elements. Such approach allows to generate packed intrinsics and process 2 elements at once. Added a lit test for f16 vectorized case. (cherry picked from commit 78c8054)

(cherry picked from commit 86a2ac7)

This commit adds support for warp-level reduction with DPP instructions, which can improve performance. See https://gpuopen.com/learn/amd-gcn-assembly-cross-lane-operations/ (cherry picked from commit 21119e3)

TritonAMDGPUTransforms now depends on it. (cherry picked from commit 0b443ce)

This relands triton-lang#5392 to enable new arch target since backend support has been added--it doesn't depend on the reverted LLVM upgrade in triton-lang#5341; basic necessary enablement is already included in the current llvm version we're using. (cherry picked from commit f257479)

Fixes triton-lang#4769 (cherry picked from commit f484cb8)

triton-lang#5064) Bumping llvm to include a loop unroller fix: llvm/llvm-project#114573. This is needed for subsequent loop unroller upstreaming work. (cherry picked from commit 3c296ab)

This pulls in llvm/llvm-project@bd9145c8c213 to enable ASan on AMD backend. (cherry picked from commit 0bd30a2)

This includes llvm/llvm-project#115627 (cherry picked from commit 6404fbb)

jataylo · 2024-12-12T14:36:45Z

cc: @atalman, @bertmaher, @zhanglx13, @antiagainst, @jungpark-mlir

atalman · 2024-12-12T20:22:11Z

@jataylo this is a lot of feature work, not landed yet on pytorch/main. Normally after branch cut we accept only critical fixes. cc @malfet @bertmaher

jataylo and others added 13 commits December 4, 2024 08:55

Revert "[AMD] rc/3.2.x cherry picks (triton-lang#5347)"

ec30446

This reverts commit 7e401df.

Revert "[AMD] release/3.2.x AMD perf cherry picks (triton-lang#5191)"

13c1aeb

This reverts commit 2d8093c.

[AMD] Restructure ReorderInstructions pass (triton-lang#4998)

0c35781

(cherry picked from commit 86a2ac7)

[AMD] Support warp-level reduction with DPP (triton-lang#5019)

83038db

This commit adds support for warp-level reduction with DPP instructions, which can improve performance. See https://gpuopen.com/learn/amd-gcn-assembly-cross-lane-operations/ (cherry picked from commit 21119e3)

[AMD] Add missing dependency to TritonAMDGPUIR (triton-lang#5053)

a04867a

TritonAMDGPUTransforms now depends on it. (cherry picked from commit 0b443ce)

[BACKEND] Update LLVM hash (triton-lang#5040)

c44ed94

Fixes triton-lang#4769 (cherry picked from commit f484cb8)

[BACKEND] Update llvm to llvm/llvm-project@fa57c7a6a5f594a9e3ae2dbe35… (

1d1ad63

triton-lang#5064) Bumping llvm to include a loop unroller fix: llvm/llvm-project#114573. This is needed for subsequent loop unroller upstreaming work. (cherry picked from commit 3c296ab)

Update to llvm/llvm-project@bd9145c8c213 (triton-lang#5180)

673b7ec

This pulls in llvm/llvm-project@bd9145c8c213 to enable ASan on AMD backend. (cherry picked from commit 0bd30a2)

[LLVM] Update to llvm-project@86b69c3 (triton-lang#5242)

7fa285c

This includes llvm/llvm-project#115627 (cherry picked from commit 6404fbb)

jataylo requested review from antiagainst, ptillet and zhanglx13 as code owners December 12, 2024 14:36

antiagainst approved these changes Dec 12, 2024

View reviewed changes

jataylo marked this pull request as draft December 12, 2024 20:58

jataylo closed this Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Release/3.2.x] AMD Cherry Picks #5413

[Release/3.2.x] AMD Cherry Picks #5413

Uh oh!

jataylo commented Dec 12, 2024

Uh oh!

jataylo commented Dec 12, 2024

Uh oh!

atalman commented Dec 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

[Release/3.2.x] AMD Cherry Picks #5413

[Release/3.2.x] AMD Cherry Picks #5413

Uh oh!

Conversation

jataylo commented Dec 12, 2024

Uh oh!

jataylo commented Dec 12, 2024

Uh oh!

atalman commented Dec 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants