-
Notifications
You must be signed in to change notification settings - Fork 76
Merge OpenAI Triton commit 78c8054
#2604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ly (#4958) Change to improve platform independence. How it works? On Windows: ```python >>> import sysconfig >>> sysconfig.get_config_var("EXT_SUFFIX") '.cp310-win_amd64.pyd' >>> sysconfig.get_config_var("EXT_SUFFIX").split(".")[-1] 'pyd' ``` On Linux: ```python >>> import sysconfig >>> sysconfig.get_config_var("EXT_SUFFIX") '.cpython-310-x86_64-linux-gnu.so' >>> sysconfig.get_config_var("EXT_SUFFIX").split(".")[-1] 'so' ``` --------- Signed-off-by: Anatoly Myachev <[email protected]>
Specifically, it fixes problems when `srcLayout` and `dstLayout` have different number of registers but the same number of not free registers. We solved the problem by padding free registers to either `srcLayout` or `dstLayout`, but this can be improved by fixing the `invertAndCompose` function.
This adds float16 to the list of dtypes tested in test_tensor_atomic_rmw. Note that the numerics were previously bad for this test when run in float16; this PR "fixes" the numerics by internally doing the sum in float32 (upcast, sum, downcast). Since the purpose is to test the atomic_rmw, and the numerical issues of doing sums in low-precision dtypes are generally know, I think this strategy should be fine for this test.
In the case of 16 bit floats operands for tt::AtomicRMWOp, construct only one LLVM::AtomicRMWOp but use vector of elements. Such approach allows to generate packed intrinsics and process 2 elements at once. Added a lit test for f16 vectorized case.
77f98f0 to
30f0d5d
Compare
|
@whitneywhtsang @pbchekin |
Yes, usually we enable merge commits temporary for PRs like this. |
We can always use command line to merge instead of UI. |
|
FYI @anmyachev, usually I remove the second half of the auto-generated message from like |
This PR change the Triton base from 152ef2d to 78c8054 (Oct 27).
Pass rate:
99.84%Please do not squash and merge this PR.
Repeating #2595.