-
Notifications
You must be signed in to change notification settings - Fork 76
Merge OpenAI Triton commit 78c8054
#2595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ly (#4958) Change to improve platform independence. How it works? On Windows: ```python >>> import sysconfig >>> sysconfig.get_config_var("EXT_SUFFIX") '.cp310-win_amd64.pyd' >>> sysconfig.get_config_var("EXT_SUFFIX").split(".")[-1] 'pyd' ``` On Linux: ```python >>> import sysconfig >>> sysconfig.get_config_var("EXT_SUFFIX") '.cpython-310-x86_64-linux-gnu.so' >>> sysconfig.get_config_var("EXT_SUFFIX").split(".")[-1] 'so' ``` --------- Signed-off-by: Anatoly Myachev <[email protected]>
Specifically, it fixes problems when `srcLayout` and `dstLayout` have different number of registers but the same number of not free registers. We solved the problem by padding free registers to either `srcLayout` or `dstLayout`, but this can be improved by fixing the `invertAndCompose` function.
This adds float16 to the list of dtypes tested in test_tensor_atomic_rmw. Note that the numerics were previously bad for this test when run in float16; this PR "fixes" the numerics by internally doing the sum in float32 (upcast, sum, downcast). Since the purpose is to test the atomic_rmw, and the numerical issues of doing sums in low-precision dtypes are generally know, I think this strategy should be fine for this test.
In the case of 16 bit floats operands for tt::AtomicRMWOp, construct only one LLVM::AtomicRMWOp but use vector of elements. Such approach allows to generate packed intrinsics and process 2 elements at once. Added a lit test for f16 vectorized case.
lib/Analysis/Utility.cpp
Outdated
| // comp describes the layout function to create dst from src. | ||
| LinearLayout comp = dstLayout->invertAndCompose(*srcLayout); | ||
| LinearLayout comp = | ||
| dstLayoutWithFreeRegs.invertAndCompose(srcLayoutWithFreeRegs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @victor-eds! As far as I understand, out code is not ready to work with layouts of the same size. Could you suggest anything?
Some examples of layout representations before and after `resize`
minimalCvtLayout
numSrcRegs: 8
numDstRegs: 16
srcLayout:
- register=1 -> (0, 1)
register=2 -> (0, 2)
register=4 -> (0, 4)
- lane=1 -> (0, 8)
lane=2 -> (0, 16)
lane=4 -> (0, 32)
lane=8 -> (0, 0)
lane=16 -> (0, 0)
- warp=1 -> (0, 0)
warp=2 -> (0, 0)
- block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]
srcLayoutWithFreeRegs:
- register=1 -> (0, 1)
register=2 -> (0, 2)
register=4 -> (0, 4)
register=8 -> (0, 0)
- lane=1 -> (0, 8)
lane=2 -> (0, 16)
lane=4 -> (0, 32)
lane=8 -> (0, 0)
lane=16 -> (0, 0)
- warp=1 -> (0, 0)
warp=2 -> (0, 0)
- block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]
dstLayout:
- register=1 -> (0, 1)
register=2 -> (0, 2)
register=4 -> (0, 4)
register=8 -> (0, 8)
- lane=1 -> (0, 16)
lane=2 -> (0, 32)
lane=4 -> (0, 0)
lane=8 -> (0, 0)
lane=16 -> (0, 0)
- warp=1 -> (0, 0)
warp=2 -> (0, 0)
- block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]
dstLayoutWithFreeRegs:
- register=1 -> (0, 1)
register=2 -> (0, 2)
register=4 -> (0, 4)
register=8 -> (0, 8)
- lane=1 -> (0, 16)
lane=2 -> (0, 32)
lane=4 -> (0, 0)
lane=8 -> (0, 0)
lane=16 -> (0, 0)
- warp=1 -> (0, 0)
warp=2 -> (0, 0)
- block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to revert it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a new issue after this pull request is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
86db591 to
c8d44f1
Compare
c8d44f1 to
77f98f0
Compare
This PR change the Triton base from 152ef2d to 78c8054 (Oct 27).
Pass rate:
99.84%Please do not squash and merge this PR.