-
Notifications
You must be signed in to change notification settings - Fork 45
XeGPU RFC update: Add matrix_desc and operations for share local memory access #1092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
docs/rfcs/XeGPU.md
Outdated
@@ -329,6 +329,20 @@ Attribute `Memory_kind` describes the memory kind. "global" means the global mem | |||
|
|||
`nbarrier` and `fence` operations lower to uniform instructions, so there is no need to specify the `sg_map`. | |||
|
|||
## XeGPU operations to access share local memory | |||
Users must create `matrix_desc` to hold a matrix in the share local memory. The matrix must be row-major. The matrix can attach a attribute for its memory layout, for example, a blocked layout or just original non-blocked row-major layout (aka. linear layout). | |||
User can get a subview of an existing `matrix_desc` to get a new `matrix_desc`, potentially having a stride. Then user can use load_matrix and store_matrix to move the matrix data between share local memory and vectors (registers). The matrix is typically 2d and but can be multi-dimension. XeGPU's load_matrix and store_matrix works at workgroup level only. It uses xegpu.layout to describe how the matrix is decomposed to data fragments and maps to work items. The workgroup level operation loads the entire matrix to vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're talking about WG-level here I think #1033 should me merged before this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
Users create a `matrix_desc` to represent a matrix stored in shared local memory (SLM). The operation takes a memory buffer (1D int8 memref with empty layout) and create a structured representation of the share local memory. The result matrix_desc has proper information including shape, element type, and memory layout attributes (@block and @strides). The @block attribute indicates that the matrix follows a blocked layout, enabling optimized lowering to 1D block loads. The @strides attribute specifies the logical strides of each dimension and is typically used to support chunked loads. | ||
|
||
When there is no input memref operand, it allocates SLM for the matrix, assuming a row-major contiguous layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of making memref operand optional?
Please review these guidelines to help with the review process: