-
Notifications
You must be signed in to change notification settings - Fork 76
Reland upstream commit f9688ab
#2517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fa229d1f9688ab
| } | ||
| if (auto dotLayout = dyn_cast<DotOperandEncodingAttr>(layout)) { | ||
| auto rank = getWarpsPerCTA(dotLayout.getParent()).size(); | ||
| if (dyn_cast<intel::DpasEncodingAttr>(dotLayout.getParent())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whitneywhtsang this is the only change needed to fix lit tests. In the new code, the swap occurs conditionally (if (opIdx == 1)), which apparently did not work for dpas so I returned unconditional swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick update, I wonder if the logic can be added in intel specific files instead. Do you have any suggestions @chengjunlu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can propose the changes to make the getOrderForDotOperand as the interface to the MmaTraits.
I found the AMD engineer refactor the code at this PR ff02a46. But their comments about the order is not general and doesn't make sense to Intel GPU. The order should be overridable by the parent layout of the DotOp layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can propose the changes to make the getOrderForDotOperand as the interface to the MmaTraits.
@chengjunlu with this approach, BlockedEncodingAttr type needs to be handled separately, since it does not inherit MmaTraits interface. Tests in CI are now falling because of this (should be fixed in the last commit).
What can be done about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not a easy way to unified the BlockedEncodingAttr and the MmaTraits unless let it to inherited the MmaTraits as well. But I am not sure whether it worth to do so for now.
For the simplicity of the changes, we can handle it separately and check the feed back of the public Triton.
1b6689e to
0464571
Compare
f9688abf9688ab
1061078 to
4d53f8e
Compare
|
FYI @anmyachev, squashed the last few commits, so is easier to isolate changes needed to review. |
victor-eds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look sensible to me. Just a question: why does our encoding have a different order compared to the rest? What would be the cost of modifying it so the order matches other dot encodings?
victor-eds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, changes LGTM. I am just revoking approval till we get an assessment of costs: changing our order vs upstreaming this change (we may just be asked to change our order instead).
4d53f8e to
284b824
Compare
Good question. There is a chance that this issue was discussed earlier during initial implementation. Let's ask more experienced Triton developers than me :) @chengjunlu @whitneywhtsang do you have an answer to this question?
If there is no answer to this question, then I can research the implementation history myself and look for answers to the question of why it was done this way and not differently (any pointers and links to code will speed up the process). However, this is not fast, wouldn't it be better to merge this pull request to simplify merging subsequent commits? @whitneywhtsang P.S. after rebase the tests don't pass, I'll have a look (fixed, my changes after rebase were lost, i returned them) |
a9e6384 to
b00713d
Compare
Actually the all the DotOp layout has the same order before this PR ff02a46. (The linear ID of the The code before AMD's change I think the AMD's engineer has different interpretation based on their comments about the new code: For Intel GPU, the layout is only used to describe the layout of the value in register and the matrix A and B are both row-major in register. |
lib/Dialect/TritonGPU/IR/Dialect.cpp
Outdated
| return getOrderForDotOperand(dotLayout.getOpIdx(), rank); | ||
| } else { | ||
| std::iota(order.rbegin(), order.rend(), 0); | ||
| if (auto mmaParent = dyn_cast<MmaEncodingTrait>(dotLayout.getParent())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make a clear review at the first time. If the original code changes is only made for AMD, then we can keep all those DotOp register layout order unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, it is good for the third party extension can override the getOrder for DotOp layout with MmaEncodingTrait. I am neutral to this change.
…)"" This reverts commit 25a7cba.
Signed-off-by: Whitney Tsang <[email protected]>
b00713d to
be7965c
Compare
|
I suggest speeding up the merge of this pull request, leaving only the workaround for now. We can try to upstream the interface function @victor-eds @whitneywhtsang @chengjunlu please approve if this makes sense. |
I think it makes sense, let's remove the last two commits, and add a FIXME comment in f9ccfeb. |
a7ba67a to
e33df88
Compare
Done. |
Signed-off-by: Anatoly Myachev <[email protected]>
e33df88 to
c637c07
Compare
|
I'm fine with this merge 👍 |
Please do not squash and merge this PR.