Reland upstream commit `f9688ab` #2517

whitneywhtsang · 2024-10-21T10:46:36Z

Please do not squash and merge this PR.

anmyachev · 2024-10-21T22:03:46Z

lib/Dialect/TritonGPU/IR/Dialect.cpp

  }
  if (auto dotLayout = dyn_cast<DotOperandEncodingAttr>(layout)) {
    auto rank = getWarpsPerCTA(dotLayout.getParent()).size();
+    if (dyn_cast<intel::DpasEncodingAttr>(dotLayout.getParent())) {


@whitneywhtsang this is the only change needed to fix lit tests. In the new code, the swap occurs conditionally (if (opIdx == 1)), which apparently did not work for dpas so I returned unconditional swap.

Thanks for the quick update, I wonder if the logic can be added in intel specific files instead. Do you have any suggestions @chengjunlu ?

I think we can propose the changes to make the getOrderForDotOperand as the interface to the MmaTraits.

I found the AMD engineer refactor the code at this PR ff02a46. But their comments about the order is not general and doesn't make sense to Intel GPU. The order should be overridable by the parent layout of the DotOp layout.

I think we can propose the changes to make the getOrderForDotOperand as the interface to the MmaTraits.

@chengjunlu with this approach, BlockedEncodingAttr type needs to be handled separately, since it does not inherit MmaTraits interface. Tests in CI are now falling because of this (should be fixed in the last commit).

What can be done about this?

There is not a easy way to unified the BlockedEncodingAttr and the MmaTraits unless let it to inherited the MmaTraits as well. But I am not sure whether it worth to do so for now.

For the simplicity of the changes, we can handle it separately and check the feed back of the public Triton.

whitneywhtsang · 2024-10-24T00:10:58Z

FYI @anmyachev, squashed the last few commits, so is easier to isolate changes needed to review.

victor-eds

Changes look sensible to me. Just a question: why does our encoding have a different order compared to the rest? What would be the cost of modifying it so the order matches other dot encodings?

victor-eds

As I said, changes LGTM. I am just revoking approval till we get an assessment of costs: changing our order vs upstreaming this change (we may just be asked to change our order instead).

anmyachev · 2024-10-28T15:03:06Z

Just a question: why does our encoding have a different order compared to the rest? What would be the cost of modifying it so the order matches other dot encodings?

Good question. There is a chance that this issue was discussed earlier during initial implementation. Let's ask more experienced Triton developers than me :) @chengjunlu @whitneywhtsang do you have an answer to this question?

I am just revoking approval till we get an assessment of costs: changing our order vs upstreaming this change (we may just be asked to change our order instead).

If there is no answer to this question, then I can research the implementation history myself and look for answers to the question of why it was done this way and not differently (any pointers and links to code will speed up the process). However, this is not fast, wouldn't it be better to merge this pull request to simplify merging subsequent commits? @whitneywhtsang

P.S. after rebase the tests don't pass, I'll have a look (fixed, my changes after rebase were lost, i returned them)

chengjunlu · 2024-10-29T01:03:22Z

Changes look sensible to me. Just a question: why does our encoding have a different order compared to the rest? What would be the cost of modifying it so the order matches other dot encodings?

Actually the all the DotOp layout has the same order before this PR ff02a46. (The linear ID of the register dim to coordinate are computed as in row major.)

The code before AMD's change

 else if (auto dotLayout = dyn_cast<DotOperandEncodingAttr>(layout)) {
    auto rank = getWarpsPerCTA(dotLayout.getParent()).size();
    SmallVector<unsigned> order(rank);
    for (auto i = 0; i < rank; ++i)
      order[i] = rank - 1 - i;
    return order;
  }

I think the AMD's engineer has different interpretation based on their comments about the new code:

// The 'order' field typically represents a descending sorted array of
  // dimensions based on contiguity. For instance, in axisInfo utilities that
  // retrieve tensor contiguity, it's assumed that the dimension with the
  // highest contiguity corresponds to order[0].
  //
  // The relation between contiguity and order is only relevant if the layout
  // interfaces with HBM, as is the case when we load tensor from HBM to
  // registers in the dot layout to bypass LDS. When bypassing LDS, we make the
  // following assumptions about tensor layouts:
  // - Tensor A (opIdx == 0) is considered to be row-major.
  // - Tensor B (opIdx == 1) is considered to be column-major.
  //
  // Based on these assumptions, we define the following orders:
  // - For opIdx == 0, we assume an order of [1, 0].
  // - For opIdx == 1, we assume an order of [0, 1].

For Intel GPU, the layout is only used to describe the layout of the value in register and the matrix A and B are both row-major in register.

chengjunlu · 2024-10-29T01:14:19Z

lib/Dialect/TritonGPU/IR/Dialect.cpp

-      return getOrderForDotOperand(dotLayout.getOpIdx(), rank);
-    } else {
-      std::iota(order.rbegin(), order.rend(), 0);
+    if (auto mmaParent = dyn_cast<MmaEncodingTrait>(dotLayout.getParent())) {


I didn't make a clear review at the first time. If the original code changes is only made for AMD, then we can keep all those DotOp register layout order unchanged.

On the other hand, it is good for the third party extension can override the getOrder for DotOp layout with MmaEncodingTrait. I am neutral to this change.

…)"" This reverts commit 25a7cba.

Signed-off-by: Whitney Tsang <[email protected]>

anmyachev · 2024-11-03T15:38:57Z

I suggest speeding up the merge of this pull request, leaving only the workaround for now. We can try to upstream the interface function getOrderForDotOperand separately.

@victor-eds @whitneywhtsang @chengjunlu please approve if this makes sense.

whitneywhtsang · 2024-11-03T16:19:14Z

I suggest speeding up the merge of this pull request, leaving only the workaround for now. We can try to upstream the interface function getOrderForDotOperand separately.

@victor-eds @whitneywhtsang @chengjunlu please approve if this makes sense.

I think it makes sense, let's remove the last two commits, and add a FIXME comment in f9ccfeb.

anmyachev · 2024-11-03T17:56:16Z

I suggest speeding up the merge of this pull request, leaving only the workaround for now. We can try to upstream the interface function getOrderForDotOperand separately.
@victor-eds @whitneywhtsang @chengjunlu please approve if this makes sense.

I think it makes sense, let's remove the last two commits, and add a FIXME comment in f9ccfeb.

Done.

Signed-off-by: Anatoly Myachev <[email protected]>

victor-eds · 2024-11-04T09:26:35Z

I'm fine with this merge 👍

whitneywhtsang requested a review from pbchekin October 21, 2024 10:46

whitneywhtsang self-assigned this Oct 21, 2024

whitneywhtsang changed the title ~~Merge OpenAI Triton commit fa229d1~~ Merge OpenAI Triton commit f9688ab Oct 21, 2024

anmyachev reviewed Oct 21, 2024

View reviewed changes

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 1b6689e to 0464571 Compare October 22, 2024 10:17

whitneywhtsang assigned anmyachev Oct 22, 2024

whitneywhtsang linked an issue Oct 22, 2024 that may be closed by this pull request

Reland upstream commit f9688ab #2526

Closed

whitneywhtsang changed the title ~~Merge OpenAI Triton commit f9688ab~~ Reland upstream commit f9688ab Oct 22, 2024

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 1061078 to 4d53f8e Compare October 24, 2024 00:09

anmyachev marked this pull request as ready for review October 24, 2024 11:58

whitneywhtsang requested a review from chengjunlu October 24, 2024 16:34

etiotto requested a review from victor-eds October 25, 2024 15:14

victor-eds approved these changes Oct 28, 2024

View reviewed changes

victor-eds reviewed Oct 28, 2024

View reviewed changes

victor-eds self-requested a review October 28, 2024 13:06

anmyachev force-pushed the whitneywhtsang/merge branch from 4d53f8e to 284b824 Compare October 28, 2024 14:53

anmyachev force-pushed the whitneywhtsang/merge branch 2 times, most recently from a9e6384 to b00713d Compare October 28, 2024 18:46

chengjunlu reviewed Oct 29, 2024

View reviewed changes

whitneywhtsang added 2 commits November 3, 2024 14:40

Revert "Revert "[BACKEND] Small fixes for dot operand properties (#4895…

e1460b1

…)"" This reverts commit 25a7cba.

[intel] Small fixes for dot operand properties

4b7942a

Signed-off-by: Whitney Tsang <[email protected]>

anmyachev force-pushed the whitneywhtsang/merge branch from b00713d to be7965c Compare November 3, 2024 14:42

anmyachev force-pushed the whitneywhtsang/merge branch from a7ba67a to e33df88 Compare November 3, 2024 17:55

fix order for DPAS

c637c07

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev force-pushed the whitneywhtsang/merge branch from e33df88 to c637c07 Compare November 3, 2024 18:13

whitneywhtsang merged commit c637c07 into main Nov 4, 2024
4 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch November 4, 2024 00:13

Reland upstream commit f9688ab #2517

Reland upstream commit f9688ab #2517

Uh oh!

Conversation

whitneywhtsang commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anmyachev Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

whitneywhtsang Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

chengjunlu Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmyachev Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chengjunlu Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

whitneywhtsang commented Oct 24, 2024

Uh oh!

victor-eds left a comment

Choose a reason for hiding this comment

Uh oh!

victor-eds left a comment

Choose a reason for hiding this comment

Uh oh!

anmyachev commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chengjunlu commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chengjunlu Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

chengjunlu Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

anmyachev commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whitneywhtsang commented Nov 3, 2024

Uh oh!

anmyachev commented Nov 3, 2024

Uh oh!

Uh oh!

victor-eds commented Nov 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Reland upstream commit `f9688ab` #2517

Reland upstream commit `f9688ab` #2517

whitneywhtsang commented Oct 21, 2024 •

edited

Loading

chengjunlu Oct 22, 2024 •

edited

Loading

anmyachev Oct 23, 2024 •

edited

Loading

anmyachev commented Oct 28, 2024 •

edited

Loading

chengjunlu commented Oct 29, 2024 •

edited

Loading

anmyachev commented Nov 3, 2024 •

edited

Loading