-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[MLIR][Conversion] XeGPU to XeVM: create_nd_tdesc - Add support for base memory rank > 2 #164701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… for base memory rank > 2
Any specific reason for this? why not increase the payload size to accommodate additional ranks information. I think having a payload is useful to retrieve info at the consumer. I would reconsider this unless we plan to get rid of createNd and move everything to consumer side. |
Several reasons:
|
|
FYI, @dchigarev |
I see. In that case, can you clarify how it is handled when tensor_desc is a block arg or a func arg? Sorry I did not have time to a closer look at the changes yet. I feel like after this change, create_nd does not really give us any value, rather it becomes a burden because we have to look it up during lowering. We can simply move all info to loadNd to simplify things. We should consider removing create_nd in that case. |
For example, if passed as a block arg or func arg.
Short answer is, lowering pattern will not handle those cases. Block arg case can be an issue. Block args are used for loops or control flow. Will update the PR to search for create_nd_tdesc through block args but not interprocedurally. |
|
Agree with @charithaintc that Issue with This PR tries to work around that issue by forwarding static values from operands/attributes of create_nd_tdesc directly to the consumer ops to compensate for the disconnected static information channel. But as @charithaintc pointed out, forwarding becomes challenging in case of control flow and function calls. That is why runtime payload/structure is used as a general solution for memref type. Deprecating |
|
I think that it is not worthy the complexity to support 3d+ offsets at the core lowering. The requirement was only to create a 2D tensor tile out of 3D+ tensor and offsets, not walking the tensor tile with 3D+ offsets. So it can be addressed by asking user to do subview upfront to 2d tensor, and xegpu only allow 2d tensor tile on top of 2d flatten tensor unless future HW relaxes it. |
|
Closing PR as an alternative solution was suggested above. |
create_nd_tdesc is currently lowered to a fixed size vector that encodes 2D shape and strides for base memory.
Supporting base memory rank > 2, requires a different approach.
Consumers of create_nd_tdesc op - load_nd, store_nd, prefetch_nd - now gets base memory information directly from create_nd_tdesc instead of going through the fixed payload.
As a result of this change, result type of create_nd_tdesc is lowered to single i64 value.