@@ -101,26 +101,45 @@ bufferization strategy would be unacceptable for high-performance codegen. When
101101choosing an already existing buffer, we must be careful not to accidentally
102102overwrite data that is still needed later in the program.
103103
104- To simplify this problem, One-Shot Bufferize was designed for ops that are in
105- * destination-passing style* . For every tensor result, such ops have a tensor
106- operand, whose buffer could be utilized for storing the result of the op in the
107- absence of other conflicts. We call such tensor operands the * destination* .
104+ To simplify this problem, One-Shot Bufferize was designed to take advantage of
105+ * destination-passing style* . This form exists in itself independently of
106+ bufferization and is tied to SSA semantics: many ops are “updating” part of
107+ their input SSA variable. For example the LLVM instruction
108+ [ ` insertelement ` ] ( https://llvm.org/docs/LangRef.html#insertelement-instruction )
109+ is inserting an element inside a vector. Since SSA values are immutable, the
110+ operation returns a copy of the input vector with the element inserted.
111+ Another example in MLIR is ` linalg.generic ` , which always has an extra ` outs `
112+ operand which provides the initial values to update (for example when the
113+ operation is doing a reduction).
114+
115+ This input is referred to as "destination" in the following (quotes are
116+ important as this operand isn't modified in place but copied) and comes into
117+ place in the context of bufferization as a possible "anchor" for the
118+ bufferization algorithm. This allows the user to shape the input in a form that
119+ guarantees close to optimal bufferization result when carefully choosing the
120+ SSA value used as "destination".
121+
122+ For every tensor result, a "destination-passing" style op has a corresponding
123+ tensor operand. If there aren't any other uses of this tensor, the bufferization
124+ can alias it with the op result and perform the operation "in-place" by reusing
125+ the buffer allocated for this "destination" input.
108126
109127As an example, consider the following op: `%0 = tensor.insert %cst into
110128%t[ %idx] : tensor<?xf32>`
111129
112- ` %t ` is the destination in this example. When choosing a buffer for the result
130+ ` %t ` is the " destination" in this example. When choosing a buffer for the result
113131` %0 ` , denoted as ` buffer(%0) ` , One-Shot Bufferize considers only two options:
114132
115- 1 . ` buffer(%0) = buffer(%t) ` , or
133+ 1 . ` buffer(%0) = buffer(%t) ` : alias the "destination" tensor with the
134+ result and perform the operation in-place.
1161352 . ` buffer(%0) ` is a newly allocated buffer.
117136
118137There may be other buffers in the same function that could potentially be used
119138for ` buffer(%0) ` , but those are not considered by One-Shot Bufferize to keep the
120139bufferization simple. One-Shot Bufferize could be extended to consider such
121140buffers in the future to achieve a better quality of bufferization.
122141
123- Tensor ops that are not in destination-passing style always bufferize to a
142+ Tensor ops that are not in destination-passing style always bufferized to a
124143memory allocation. E.g.:
125144
126145``` mlir
@@ -131,10 +150,10 @@ memory allocation. E.g.:
131150} : tensor<?xf32>
132151```
133152
134- The result of ` tensor.generate ` does not have a destination operand, so
153+ The result of ` tensor.generate ` does not have a " destination" operand, so
135154bufferization allocates a new buffer. This could be avoided by choosing an
136155op such as ` linalg.generic ` , which can express the same computation with a
137- destination operand, as specified behind outputs (` outs ` ):
156+ " destination" operand, as specified behind outputs (` outs ` ):
138157
139158``` mlir
140159#map = affine_map<(i) -> (i)>
@@ -159,14 +178,13 @@ slice of a tensor:
159178```
160179
161180The above example bufferizes to a ` memref.subview ` , followed by a
162- "` linalg.generic ` on memrefs" that overwrites the memory of the subview. The
163- ` tensor.insert_slice ` bufferizes to a no-op (in the absence of RaW conflicts
164- such as a subsequent read of ` %s ` ).
181+ "` linalg.generic ` on memrefs" that overwrites the memory of the subview, assuming
182+ that the slice ` %t ` has no other user. The ` tensor.insert_slice ` then bufferizes
183+ to a no-op (in the absence of RaW conflicts such as a subsequent read of ` %s ` ).
165184
166185RaW conflicts are detected with an analysis of SSA use-def chains (details
167186later). One-Shot Bufferize works best if there is a single SSA use-def chain,
168- where the result of a tensor op is the destination operand of the next tensor
169- ops, e.g.:
187+ where the result of a tensor op is the operand of the next tensor ops, e.g.:
170188
171189``` mlir
172190%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)
0 commit comments