Skip to content

Conversation

@the-strawhat
Copy link

Related RFC

[RFC][AMD] Optimizations for Paged Attention: Proposal with Multiple Features(#8281)

[Feature 2] Implicit Layout Conversion

Problem
We found that partitioning the seq_len dimension of matrix K and the head_dim dimension of matrix V across different wavefronts yields significant benefits (because loading matrices K and V does not require shared memory).
However, the AMD backend pass of the Triton compiler does not support directly loading data from global memory into registers and completing a DotOp with DotOperandEncodingAttr. Instead, it requires an intermediate transfer through shared memory using ConvertLayoutOp.
The goal of implicit layout conversion is to directly load data in a layout that matches matrix multiplication requirements, eliminating unnecessary conversions.

Core Process

  1. Traverse operations until a DotOp is matched.
  2. For both operands of the DotOp, trace backward until a LoadOp is matched, and determine whether implicit layout conversion is feasible based on the types of operations along the path (essentially, implicit conversion cannot be applied if data copying occurs).
    • For ConvertLayoutOp and LocalLoadOp-like operations, check whether data copying occurs. If so, implicit conversion is not allowed.
    • For other operations, currently only TransOp, ReshapeOp, LoadOp, and ElementwiseOp are supported.
  3. Extract the innermost vectorization size (vecSize) from the LayoutAttr of the matched LoadOp, and update the kWidth in the LayoutAttr of the DotOp operands.
    • Certain constraints must be applied to the value of vecSize.
  4. Back-propagate the updated LayoutAttr from the DotOp operands to all operations along the path to the LoadOp.
  5. Forward-propagate the updated LayoutAttr to all subsequent operations.
    • If ConvertLayoutOp exists, remove it (Step 4 ensures that all operations along the path have consistent layouts).
    • Insert ConvertLayoutOp before and after each operation to ensure legality; these will be removed later by RemoveLayoutConversions.

Copy link
Contributor

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is to be done (@antiagainst should judge here), it should be split into first hosting the convert_layout next to the load, as we do in hoistConvertDotOperand, and just then performing a local transformation, rather than trying to do everything in one go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants