-
Notifications
You must be signed in to change notification settings - Fork 76
Merge OpenAI Triton commit fa229d1
#2513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…4819) Advanced software pipelining may require fine-grained adjustments regarding instruction scheduling in the main `tt.dot` loop to achieve higher performance. Such adjustments require detailed information regarding the number of issued `v_mfma`, `ds_read`, `ds_write` and `global_load`, instructions. This PR extends the Triton AMDGPU backend by adding instruction counting during `TritonAMDGPUToLLVM` pass execution. An example of instruction counting and instruction scheduling is demonstrated in the `createCKV3Schedule` method which implements the [CK's V3 software pipelining](https://github.com/ROCm/composable_kernel/blob/de3e3b642402eac5b4a466f6a2fa5e9f022ba680/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_v3.hpp#L160-L263). This change is experimental for better GEMM performance. The design is not final and may subject to change in the future.
I ran into an error in the RewriteTensorPointer pass. In my IR, there's an scf.if that produces a non-pointer result. The rewriteIfOp() created a new scf.if, but the use of scf.if result is still referencing the old one, which caused a compile error. In this patch, I updated all uses of scf.if with the results of the new if-op.
…rectory and update name (#4899) If no install location is set, CMake by default puts all shared libraries in triton/_C. This PR moves the instrumentation test/example out of the triton install directory into it's own stand alone directory that can be populated with future development examples and gives it a more useful name.
Adjust the placement of LDS writes and reads to immediately follow the definition of their operands in case where LDS write is in the loop but it's operand is not. This is a heuristic for optimizing fused attention by hoisting Q tensor LDS read/write operations outside of the loop, as Q is a loop invariant and can be loaded once before entering the loop. In the previous implementation, the heuristic incorrectly assumed that the operand of the LDS write had to be a load operation, which is unnecessary. Additionally, there was no explicit check to verify whether the LDS write was in the loop while its defining operand was not. This PR addresses both issues. --------- Co-authored-by: Ognjen Plavsic <[email protected]>
This helps backend to interleave global load and mfma instructions and can reduce global load issue latency.
triton-lang/triton#4589 mistakenly deactivated these and reverted to the previous always-cast-to-int32 semantics.
8966e5cfa229d1
pbchekin
approved these changes
Oct 18, 2024
Contributor
pbchekin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but we need to fix the instrumentation test.
8085aea to
f213106
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from 8966e5c to fa229d1 (Oct 14).
Pass rate: 98.98%
Please do not squash and merge this PR.