Skip to content

Conversation

dchigarev
Copy link
Contributor

As suggested here, adding optional layout attribute to LoadGather/StoreScatter ops.

@dchigarev
Copy link
Contributor Author

Some tests are failing after rebasing on the main branch. Fixing those...


xegpu::DistributeLayoutAttr getDistributeLayout() {
xegpu::DistributeLayoutAttr layout = nullptr;
if (auto tdescType = getTensorDescType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are deprecating the load_gather w/ tdesc format, so no need to check it here.


xegpu::DistributeLayoutAttr getDistributeLayout() {
xegpu::DistributeLayoutAttr layout = nullptr;
if (auto tdescType = getTensorDescType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to support the tdesc form.

xegpu::CachePolicyAttr l2_hint,
xegpu::CachePolicyAttr l3_hint,
DistributeLayoutAttr layout) {
build(builder, state, valueType, source, Value(), mask, IntegerAttr(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we have this form: load without offsets?

xegpu::CachePolicyAttr l2_hint,
xegpu::CachePolicyAttr l3_hint,
DistributeLayoutAttr layout) {
build(builder, state, value, dest, Value(), mask, IntegerAttr(), l1_hint,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also no offsets?

// CHECK-NEXT: %[[T2:.*]] = xegpu.create_tdesc %[[ARG1]], %[[CST]] : memref<256xf16>, vector<16xindex> ->
// CHECK-SAME: !xegpu.tensor_desc<16x16xf16, #xegpu.scatter_tdesc_attr<chunk_size = 16 : i64>, #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>>
// CHECK-NEXT: %{{.*}} = xegpu.load %[[T2]], %[[CST0]] {layout_result_0 = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>}
// CHECK-NEXT: %{{.*}} = xegpu.load %[[T2]], %[[CST0]] <{layout = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to change the layout here.
It is checking whether the propagation set the temporarily layout attribute for the load result correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may create separate test that test how the propagation honor the user's setting. Say, user to set a different layout like
layout = #xegpu.layout<lane_layout = [4, 4], lane_data = [1, 2]
for store and expect it propagating from store to load.

Once user set it, the propagation should honor user's setting instead of using its default one.

Note that these xegpu.load variant is to be deprecated. Please just focus on xegpu.load variant that has memref as input.
Also the test may not use chunk_size. We don't really expect user to use the chunk load.

layout_result_0 = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>
} : memref<256xf16>, vector<16xindex>, vector<16xi1> -> vector<16x8xf16>
%3 = xegpu.load %src[%offset], %1 <{chunk_size=8,
layout = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 2]>
Copy link
Contributor

@Jianhui-Li Jianhui-Li Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave these two tests as is.

%offset = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>} dense<0> : vector<256x16xindex>
%mask = arith.constant {layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>} dense<1> : vector<256x16xi1>
%load = xegpu.load %src[%offset], %mask {chunk_size = 1, layout_result_0 = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>, l1_hint = #xegpu.cache_hint<cached>}
%load = xegpu.load %src[%offset], %mask {chunk_size = 1, layout = #xegpu.layout<sg_layout = [8, 4], sg_data = [32, 4]>, l1_hint = #xegpu.cache_hint<cached>}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think you need to touch these tests.
These tests assume the temporary layout attributes being assigned, so not related to the permanent layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants