You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
%element = vector.extract %0[%i] : f32 into vector<8xf32>
50
50
%updated = arith.addf %partial, %element : f32
51
51
scf.yield %updated : f32
52
52
}
@@ -145,7 +145,7 @@ linalg.generic {
145
145
%c0 = arith.constant 0.0 : f32
146
146
%0 = arith.cmpf ogt %in_one, %c0 : f32
147
147
%1 = arith.select %0, %in_one, %c0 : f32
148
-
linalg.yield %1 : f32
148
+
linalg.yield %1 : f32
149
149
}
150
150
```
151
151
@@ -185,7 +185,7 @@ In the case of `linalg.generic` operations, the iteration space is implicit and
185
185
For example, tiling the matrix multiplication presented above with tile sizes `(2, 8)`, we obtain a loop nest around a `linalg.generic` expressing the same operation on a `2x8` tensor.
186
186
187
187
```mlir
188
-
// A special "multi-for" loop that supports tensor-insertion semantics
188
+
// A special "multi-for" loop that supports tensor-insertion semantics
189
189
// as opposed to implicit updates. The resulting 8x16 tensor will be produced
190
190
// by this loop.
191
191
// The trip count of iterators is computed dividing the original tensor size,
@@ -202,9 +202,9 @@ For example, tiling the matrix multiplication presented above with tile sizes `(
202
202
// Take slices of inputs and outputs. Only the "i" and "j" dimensions are sliced.
@@ -238,15 +238,15 @@ After materializing loops with tiling, another key code generation transformatio
238
238
1. the subset (slice) of the operand that is used by the tile, and
239
239
2. the tensor-level structured operation producing the whole tensor that is being sliced.
240
240
241
-
By inverting the `indexing_map` and applying it to the set of elements accessed through the slice, we can compute the part of the iteration space of the operation defining the full tensor necessary to compute the tile. Thus fusion boils down to replacing the `tensor.extract_slice` operation with the tile of the `linalg.generic` producing the original operand.
241
+
By inverting the `indexing_map` and applying it to the set of elements accessed through the slice, we can compute the part of the iteration space of the operation defining the full tensor necessary to compute the tile. Thus fusion boils down to replacing the `tensor.extract_slice` operation with the tile of the `linalg.generic` producing the original operand.
242
242
243
243
Let us assume that the matrix multiplication operation is followed by another operation that multiplies each element of the resulting matrix with itself. This trailing elementwise operation has a 2D iteration space, unlike the 3D one in matrix multiplication. Nevertheless, it is possible to tile the trailing operation and then fuse the producer of its operand, the matmul, into the loop generated by tiling. The untiled dimension will be used in its entirety.
244
244
245
245
246
246
```mlir
247
247
// Same loop as before.
248
-
%0 = scf.forall (%i, %j) in (4, 2)
249
-
shared_outs(%shared = %init)
248
+
%0 = scf.forall (%i, %j) in (4, 2)
249
+
shared_outs(%shared = %init)
250
250
-> (tensor<8x16xf32>, tensor<8x16xf32>) {
251
251
// Scale the loop induction variables by the tile sizes.
0 commit comments