Merge OpenAI Triton commit `fa229d1` #2513

whitneywhtsang · 2024-10-18T17:50:46Z

This PR change the Triton base from 8966e5c to fa229d1 (Oct 14).
Pass rate: 98.98%

Please do not squash and merge this PR.

…4819) Advanced software pipelining may require fine-grained adjustments regarding instruction scheduling in the main `tt.dot` loop to achieve higher performance. Such adjustments require detailed information regarding the number of issued `v_mfma`, `ds_read`, `ds_write` and `global_load`, instructions. This PR extends the Triton AMDGPU backend by adding instruction counting during `TritonAMDGPUToLLVM` pass execution. An example of instruction counting and instruction scheduling is demonstrated in the `createCKV3Schedule` method which implements the [CK's V3 software pipelining](https://github.com/ROCm/composable_kernel/blob/de3e3b642402eac5b4a466f6a2fa5e9f022ba680/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_v3.hpp#L160-L263). This change is experimental for better GEMM performance. The design is not final and may subject to change in the future.

I ran into an error in the RewriteTensorPointer pass. In my IR, there's an scf.if that produces a non-pointer result. The rewriteIfOp() created a new scf.if, but the use of scf.if result is still referencing the old one, which caused a compile error. In this patch, I updated all uses of scf.if with the results of the new if-op.

…rectory and update name (#4899) If no install location is set, CMake by default puts all shared libraries in triton/_C. This PR moves the instrumentation test/example out of the triton install directory into it's own stand alone directory that can be populated with future development examples and gives it a more useful name.

Adjust the placement of LDS writes and reads to immediately follow the definition of their operands in case where LDS write is in the loop but it's operand is not. This is a heuristic for optimizing fused attention by hoisting Q tensor LDS read/write operations outside of the loop, as Q is a loop invariant and can be loaded once before entering the loop. In the previous implementation, the heuristic incorrectly assumed that the operand of the LDS write had to be a load operation, which is unnecessary. Additionally, there was no explicit check to verify whether the LDS write was in the loop while its defining operand was not. This PR addresses both issues. --------- Co-authored-by: Ognjen Plavsic <[email protected]>

This helps backend to interleave global load and mfma instructions and can reduce global load issue latency.

triton-lang/triton#4589 mistakenly deactivated these and reverted to the previous always-cast-to-int32 semantics.

pbchekin

Looks good but we need to fix the instrumentation test.

ravil-mobile and others added 6 commits October 12, 2024 22:56

[AMD] Sink the 2nd tt.load after local_load's (#4823)

664ac51

This helps backend to interleave global load and mfma instructions and can reduce global load issue latency.

Re-enable NumPy 2.0 semantics for add, sub, mul. (#4905)

fa229d1

triton-lang/triton#4589 mistakenly deactivated these and reverted to the previous always-cast-to-int32 semantics.

whitneywhtsang requested a review from pbchekin October 18, 2024 17:50

whitneywhtsang self-assigned this Oct 18, 2024

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 8966e5c~~ Merge OpenAI Triton commit fa229d1 Oct 18, 2024

pbchekin approved these changes Oct 18, 2024

View reviewed changes

Merge commit 'fa229d1c4bee16c094be9427334575ec1e79f66c'

f213106

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 8085aea to f213106 Compare October 18, 2024 23:10

whitneywhtsang marked this pull request as ready for review October 19, 2024 01:24

whitneywhtsang merged commit f213106 into main Oct 19, 2024
4 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch October 19, 2024 01:24

whitneywhtsang mentioned this pull request Oct 27, 2024

Merge OpenAI Triton till Oct 25th #2483

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge OpenAI Triton commit `fa229d1` #2513

Merge OpenAI Triton commit `fa229d1` #2513

Uh oh!

whitneywhtsang commented Oct 18, 2024

Uh oh!

pbchekin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Merge OpenAI Triton commit fa229d1 #2513

Merge OpenAI Triton commit fa229d1 #2513

Uh oh!

Conversation

whitneywhtsang commented Oct 18, 2024

Uh oh!

pbchekin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Merge OpenAI Triton commit `fa229d1` #2513

Merge OpenAI Triton commit `fa229d1` #2513