-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[MLIR][XeVM] Add xevm blockload and blockstore op definition. #158118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir Author: Sang Ik Lee (silee2) ChangesAdd op definition for subgroup block load and store ops: Full diff: https://github.com/llvm/llvm-project/pull/158118.diff 2 Files Affected:
diff --git a/mlir/include/mlir/Dialect/LLVMIR/XeVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/XeVMOps.td
index f457f47d56219..5b7814c37bbd1 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/XeVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/XeVMOps.td
@@ -187,6 +187,78 @@ def XeVM_StoreCacheControlAttr
let assemblyFormat = "`<` $value `>`";
}
+def XeVM_BlockLoadOp
+ : XeVM_Op<"blockload">,
+ Results<(outs FixedVectorOfRankAndType<[1], [XeVM_ElemType]>:$res)>,
+ Arguments<(ins Arg<LLVM_AnyPointer, "", [MemRead]>:$ptr,
+ OptionalAttr<XeVM_LoadCacheControlAttr>:$cache_control)> {
+ let summary = "subgroup block load";
+ let description = [{
+ Reads one or more components of Result data for each invocation
+ in the subgroup from the specified `ptr` as a block operation.
+ The data is read strided, so the first value read is:
+ ```
+ ptr[ SubgroupLocalInvocationId ]
+ ```
+ and the second value read is:
+ ```
+ ptr[ SubgroupLocalInvocationId + SubgroupMaxSize ]
+ ```
+ Result type may be a scalar or vector type of scalar element type.
+
+ The parameters are:
+ * `ptr` - the base address to load from
+ * `cache_control` - an enumerator that sets the cache behaviour
+
+ Example:
+ ```mlir
+ %loaded_a = xevm.blockload %src,
+ <{cache_control=#xevm.load_cache_control<L1uc_L2uc_L3uc>}>
+ : (!llvm.ptr<1>) -> vector<4xi16>
+ ```
+ }];
+ let assemblyFormat = [{
+ operands prop-dict attr-dict `:` functional-type(operands, results)
+ }];
+}
+
+def XeVM_BlockStoreOp
+ : XeVM_Op<"blockstore">,
+ Arguments<(ins Arg<LLVM_AnyPointer, "", [MemWrite]>:$ptr,
+ FixedVectorOfRankAndType<[1], [XeVM_ElemType]>:$val,
+ OptionalAttr<XeVM_StoreCacheControlAttr>:$cache_control)> {
+ let summary = "subgroup block store";
+ let description = [{
+ Writes one or more components of `val` for each invocation
+ in the subgroup to the specified `ptr` as a block operation.
+ The data is written strided, so the first value is written to:
+ ```
+ ptr[ SubgroupLocalInvocationId ]
+ ```
+ and the second value is written to:
+ ```
+ ptr[ SubgroupLocalInvocationId + SubgroupMaxSize ]
+ ```
+ `val` type may be a scalar or vector type of scalar element type.
+
+ The parameters are:
+ * `ptr` - the base address to store to
+ * `val` - the value to store
+ * `cache_control` - an enumerator that sets the cache behaviour
+
+ Example:
+ ```mlir
+ xevm.blockstore %ptr, %val
+ <{cache_control=#xevm.store_cache_control<L1uc_L2uc_L3uc>}>
+ : (!llvm.ptr<1>, vector<4xi16>)
+ ```
+ }];
+
+ let assemblyFormat = [{
+ operands prop-dict attr-dict `:` `(` type(operands) `)`
+ }];
+}
+
def XeVM_BlockLoad2dOp
: XeVM_Op<"blockload2d">,
Results<(outs FixedVectorOfRankAndType<[1], [XeVM_ElemType]>:$res)>,
diff --git a/mlir/test/Dialect/LLVMIR/xevm.mlir b/mlir/test/Dialect/LLVMIR/xevm.mlir
index 3dd5f872f898c..bb1f650a1cd12 100644
--- a/mlir/test/Dialect/LLVMIR/xevm.mlir
+++ b/mlir/test/Dialect/LLVMIR/xevm.mlir
@@ -58,6 +58,29 @@ func.func @blockprefetch2d(%ptr: !llvm.ptr<1>, %base_width: i32, %base_height: i
return
}
+// -----
+// CHECK-LABEL: func.func @blockload(
+// CHECK-SAME: %[[ARG0:.*]]: !llvm.ptr<1>)
+func.func @blockload(%ptr: !llvm.ptr<1>) -> vector<4xi16> {
+ // CHECK: %[[VAR0:.*]] = xevm.blockload %[[ARG0]]
+ // CHECK-SAME: cache_control = #xevm.load_cache_control<L1uc_L2uc_L3uc>
+ // CHECK-SAME: (!llvm.ptr<1>) -> vector<4xi16>
+ %loaded = xevm.blockload %ptr <{cache_control=#xevm.load_cache_control<L1uc_L2uc_L3uc>}>
+ : (!llvm.ptr<1>) -> vector<4xi16>
+ return %loaded : vector<4xi16>
+}
+
+// -----
+// CHECK-LABEL: func.func @blockstore(
+// CHECK-SAME: %[[ARG0:.*]]: !llvm.ptr<1>,
+// CHECK-SAME: %[[ARG1:.*]]: vector<4xi32>)
+func.func @blockstore(%ptr: !llvm.ptr<1>, %value: vector<4xi32>) {
+ // CHECK: xevm.blockstore %[[ARG0]], %[[ARG1]]
+ // CHECK-SAME: (!llvm.ptr<1>, vector<4xi32>)
+ xevm.blockstore %ptr, %value : (!llvm.ptr<1>, vector<4xi32>)
+ return
+}
+
// -----
// CHECK-LABEL: func.func @mma(
// CHECK-SAME: %[[ARG0:.*]]: vector<8xf32>, %[[ARG1:.*]]: vector<8xi16>, %[[ARG2:.*]]: vector<8xi32>)
|
Result type may be a scalar or vector type of scalar element type. | ||
|
||
The parameters are: | ||
* `ptr` - the base address to load from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding: it must be uniform across subgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated description.
FYI, links to related specs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
vTy = op.getVal().getType(); | ||
int elemTySize = vTy.getElementType().getIntOrFloatBitWidth() / 8; | ||
if (elemTySize == 1) { | ||
llvm::SmallSet<int, 5> validSizes{1, 2, 4, 8, 16}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems target specific? add a TODO or move to a dedicated location for HW specifics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not target arch or chip specific but the restrictions are OpenCL / SPIR-V Intel extensions specific.
In that sense, it applies to all Intel HW and not target specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put links to related specs above.
// CHECK-SAME: cache_control = #xevm.load_cache_control<L1uc_L2uc_L3uc> | ||
// CHECK-SAME: (!llvm.ptr<1>) -> vector<4xi16> | ||
%loaded = xevm.blockload %ptr <{cache_control=#xevm.load_cache_control<L1uc_L2uc_L3uc>}> | ||
: (!llvm.ptr<1>) -> vector<4xi16> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is output not a multiple of SG size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output is distributed to work item lanes.
The vector size represents how many elements are gathered per work item lane.
Add op definition for subgroup block load and store ops:
xevm.blockload and xevm.blockstore