You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* adding ExtractSliceOp
* adding CreateOp
* adding InsertSliceOp
* separating dist-business from PTensorType-attributes using DistTensorType; no more dist in PTensorType
* function boundary handling for Dist
* adding and using EasyValue
* restructuring Dist-Ops to largely accept ValueRanges instead of memrefs
* enabling n-dimensional tensors
Copy file name to clipboardExpand all lines: docs/rfcs/20220804-ptensor/README.md
+11-13Lines changed: 11 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,9 +26,7 @@ Additionally we propose appropriate passes
26
26
3. Converting __intel-sycl.device_region__ to appropriate runtime calls
27
27
28
28
### ptensor Type
29
-
Since operations are expected to execute in the same location as its input tensors, it is necessary to carry the tensor-location from the point of its allocation to the point of the operation. For this, we introduce a type which logically extends the `mlir::tensor` type with two boolean attributes:
30
-
*`device`: indicates if it should live on a device
31
-
*`dist`: indicates if it should be distributed
29
+
Since operations are expected to execute in the same location as its input tensors, it is necessary to carry the tensor-location from the point of its allocation to the point of the operation. For this, we introduce a type which logically extends the `mlir::MemRefType` with a boolean attribute `device`, indicateing if it should live on a device.
32
30
33
31
The actual device and distributed team can be assigned by the approriate operands of creation operations (see below).
34
32
@@ -37,7 +35,7 @@ The tensors themselves are assumed to eventually lower to memrefs.
37
35
Notice: By default device and distribution support is disabled and so renders conventional host operations.
38
36
39
37
### __PTensor__ Operations
40
-
The initial set of operations matches the requirements of the core of [array-API](https://data-apis.org/array-api/latest/API_specification/index.html). The operations in the PTensor dialect operate on ptensors. To allow operations on standard tensors and memrefs the PTensor dialect provides the operation `from_ranked` to convert MemRefs and RankedTensors to ptensors with default `device` and `team`.
38
+
The initial set of operations matches the requirements of the core of [array-API](https://data-apis.org/array-api/latest/API_specification/index.html). The operations in the PTensor dialect operate on ptensors. To allow operations on standard tensors and memrefs the PTensor dialect provides the operation `from_ranked` to convert MemRefs and MemRefs to ptensors with default `device` and `team`.
41
39
42
40
Notice: some of the operations mutate existing ptensors.
43
41
@@ -50,7 +48,7 @@ It constitutes an error if an operation has multiple (input and output) argument
50
48
Similarly, it constitutes an error if an operation has multiple (input and output) arguments of type ptensor and their `team` attribute is not the same on all ptensor arguments.
51
49
52
50
#### Broadcasting/Ranked Tensors
53
-
PTensor operates on ranked tensors. In rare cases the shape of input tensor(s) needs to be known as well. Unranked tensors are not supported.
51
+
PTensor operates on MemRefs. In rare cases the shape of input tensor(s) needs to be known as well. Unranked memrefs are not supported.
54
52
55
53
PTensor operations follow the [broadcasting semantics of the array-API](https://data-apis.org/array-api/latest/API_specification/broadcasting.html).
56
54
@@ -76,7 +74,7 @@ The below set of operations accrues from the following rules:
The Dist dialect provides operations dealing with tensors which are partitioned and distributed across multiple processes. The operations assume some kind of a runtime which handles aspects like communication and partitioning.
@@ -163,9 +161,9 @@ All passes which consume `ptensor`s and -operations comply to compute-follows-da
163
161
164
162
#### --convert-ptensor-to-linalg
165
163
This pass completely lowers ptensor operations:
166
-
-__Tensor__: `ptensor.ptensor` will be type-converted to a RankedTensor
164
+
-__Tensor__: `ptensor.ptensor` will be type-converted to a MemRef
167
165
- Wtihin the pass each PTensor gets "instantiated" by a `init_ptensor` which also accepts `team`, `handle` and `device`. This allows accessing device and distributed runtime attributes during lowering.
168
-
- function boundaries are currently not handled explicitly. device and dist information will be lost and normal RankedTensors are exchanged.
166
+
- function boundaries are currently not handled explicitly. device and dist information will be lost and normal MemRefs are exchanged.
169
167
-__Linalg__: The actual functionality will be represented by one or more operations of the Linalg dialect.
170
168
-__intel_sycl__: Appropriate `intel_sycl.device_region` will be put around operations which have inputs of type `ptensor.ptensor` with a non-null `device` attribute.
171
169
- utility dialects like __memref__, __shape__, __affine__, __func__ and __arith__
0 commit comments