Skip to content

Conversation

@anmyachev
Copy link
Contributor

@anmyachev anmyachev commented Oct 30, 2024

This PR change the Triton base from 152ef2d to 78c8054 (Oct 27).
Pass rate: 99.84%

Please do not squash and merge this PR.

anmyachev and others added 4 commits October 24, 2024 22:04
…ly (#4958)

Change to improve platform independence.

How it works?

On Windows:
```python
>>> import sysconfig
>>> sysconfig.get_config_var("EXT_SUFFIX")
'.cp310-win_amd64.pyd'
>>> sysconfig.get_config_var("EXT_SUFFIX").split(".")[-1]
'pyd'
```

On Linux:
```python
>>> import sysconfig
>>> sysconfig.get_config_var("EXT_SUFFIX")
'.cpython-310-x86_64-linux-gnu.so'
>>> sysconfig.get_config_var("EXT_SUFFIX").split(".")[-1]
'so'
```

---------

Signed-off-by: Anatoly Myachev <[email protected]>
Specifically, it fixes problems when `srcLayout` and `dstLayout` have
different number of registers but the same number of not free registers.
We solved the problem by padding free registers to either `srcLayout` or
`dstLayout`, but this can be improved by fixing the `invertAndCompose`
function.
This adds float16 to the list of dtypes tested in
test_tensor_atomic_rmw. Note that the numerics were previously bad for
this test when run in float16; this PR "fixes" the numerics by
internally doing the sum in float32 (upcast, sum, downcast). Since the
purpose is to test the atomic_rmw, and the numerical issues of doing
sums in low-precision dtypes are generally know, I think this strategy
should be fine for this test.
In the case of 16 bit floats operands for tt::AtomicRMWOp, construct
only one LLVM::AtomicRMWOp but use vector of elements.
Such approach allows to generate packed intrinsics and process 2
elements at once.
Added a lit test for f16 vectorized case.
// comp describes the layout function to create dst from src.
LinearLayout comp = dstLayout->invertAndCompose(*srcLayout);
LinearLayout comp =
dstLayoutWithFreeRegs.invertAndCompose(srcLayoutWithFreeRegs);
Copy link
Contributor Author

@anmyachev anmyachev Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @victor-eds! As far as I understand, out code is not ready to work with layouts of the same size. Could you suggest anything?

Some examples of layout representations before and after `resize`
minimalCvtLayout
numSrcRegs: 8
numDstRegs: 16
srcLayout: 
 - register=1 -> (0, 1)
   register=2 -> (0, 2)
   register=4 -> (0, 4)
 - lane=1 -> (0, 8)
   lane=2 -> (0, 16)
   lane=4 -> (0, 32)
   lane=8 -> (0, 0)
   lane=16 -> (0, 0)
 - warp=1 -> (0, 0)
   warp=2 -> (0, 0)
 - block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]

srcLayoutWithFreeRegs: 
 - register=1 -> (0, 1)
   register=2 -> (0, 2)
   register=4 -> (0, 4)
   register=8 -> (0, 0)
 - lane=1 -> (0, 8)
   lane=2 -> (0, 16)
   lane=4 -> (0, 32)
   lane=8 -> (0, 0)
   lane=16 -> (0, 0)
 - warp=1 -> (0, 0)
   warp=2 -> (0, 0)
 - block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]

dstLayout: 
 - register=1 -> (0, 1)
   register=2 -> (0, 2)
   register=4 -> (0, 4)
   register=8 -> (0, 8)
 - lane=1 -> (0, 16)
   lane=2 -> (0, 32)
   lane=4 -> (0, 0)
   lane=8 -> (0, 0)
   lane=16 -> (0, 0)
 - warp=1 -> (0, 0)
   warp=2 -> (0, 0)
 - block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]

dstLayoutWithFreeRegs: 
 - register=1 -> (0, 1)
   register=2 -> (0, 2)
   register=4 -> (0, 4)
   register=8 -> (0, 8)
 - lane=1 -> (0, 16)
   lane=2 -> (0, 32)
   lane=4 -> (0, 0)
   lane=8 -> (0, 0)
   lane=16 -> (0, 0)
 - warp=1 -> (0, 0)
   warp=2 -> (0, 0)
 - block is a size 1 dimension
where out dims are: [dim0 (size 1), dim1 (size 64)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to revert it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a new issue after this pull request is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlad-penkin vlad-penkin linked an issue Oct 30, 2024 that may be closed by this pull request
@anmyachev anmyachev marked this pull request as ready for review October 31, 2024 14:41
@anmyachev anmyachev requested a review from pbchekin October 31, 2024 14:45
@anmyachev anmyachev merged commit ef407fb into main Oct 31, 2024
4 checks passed
@anmyachev anmyachev deleted the amyachev/merge0 branch October 31, 2024 15:00
anmyachev added a commit that referenced this pull request Oct 31, 2024
@anmyachev anmyachev restored the amyachev/merge0 branch October 31, 2024 15:05
anmyachev added a commit that referenced this pull request Oct 31, 2024
This reverts commit ef407fb.

Accidentally made squash. After this revert I will simply repeat
#2595 and make
a merge. Sorry for that.
anmyachev added a commit that referenced this pull request Nov 1, 2024
This PR change the Triton base from
152ef2d
to
78c8054
(Oct 27).
Pass rate: `99.84%`

Please do not squash and merge this PR.

Repeating
#2595.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Merge OpenAI Triton till Nov 8th

6 participants