You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Perform an atomic maximum on the value stored at dst with an optional memory-order.
96
105
97
-
:param dst: Destination buffer where the atomic maximum will be performed
106
+
If memory_order is None the runtime extern "AtomicMax" is called without an explicit memory-order id; otherwise the provided memory_order string is mapped to a numeric id using the module's memory-order map and passed to the extern.
107
+
108
+
:param dst: Destination buffer/address to apply the atomic max.
98
109
:type dst: Buffer
99
-
:param value: Value to be atomically added
110
+
:param value: Value to compare/store atomically.
100
111
:type value: PrimExpr
112
+
:param memory_order: Optional memory-order name (e.g. "relaxed", "acquire", "seq_cst").
113
+
If provided, it is translated to the corresponding numeric memory-order id before the call.
114
+
:type memory_order: str | None
101
115
102
-
:returns:Handle to the atomic maximum operation
116
+
:returns:A handle/expression representing the issued atomic maximum operation.
Atomically add `value` into `dst`, returning a handle to the operation.
122
138
123
-
:param dst: Destination buffer where the atomic addition will be performed
124
-
:type dst: Buffer
125
-
:param value: Value to be atomically added
126
-
:type value: PrimExpr
139
+
Supports scalar/addressed extern atomic add when neither argument exposes extents, or tile-region-based atomic add for Buffer/BufferRegion/BufferLoad inputs. If both arguments are plain Buffers their shapes must be structurally equal. If at least one side exposes extents, extents are aligned (missing dimensions are treated as size 1); an assertion is raised if extents cannot be deduced. The optional `memory_order` (one of "relaxed","consume","acquire","release","acq_rel","seq_cst") is used only for the direct extern `AtomicAdd` path when no extents are available — otherwise the tile-region path ignores `memory_order`.
127
140
128
-
:returns:Handle to the atomic addition operation
141
+
:returns:A handle representing the atomic addition operation.
:param dtype: New dtype for the buffer. Defaults to None.
206
-
:type dtype: Union[str, None], optional
207
-
208
-
:returns: A new buffer view with the specified shape and dtype
209
-
:rtype: Buffer
214
+
If `shape` is None the source buffer's shape is used; if `dtype` is None the source buffer's dtype is used. The returned buffer shares the same underlying data as `src` (no copy).
Perform cumulative sum on input buffer, store the result to output buffer.
146
-
147
-
:param src: The input buffer
148
-
:type src: tir.Buffer
149
-
:param dst: The output buffer. Defaults to None.
150
-
:type dst: tir.Buffer, optional
151
-
:param dim: The dimension to perform cumulative sum on. Defaults to 0.
152
-
:type dim: int, optional
153
-
:param reverse: Whether to perform reverse cumulative sum. Defaults to False.
154
-
:type reverse: bool, optional
155
-
156
-
:returns: Handle to the cumulative sum operation
145
+
Compute the cumulative sum of `src` along `dim`, writing results to `dst`.
146
+
147
+
Negative `dim` indices are normalized (Python-style). If `dst` is None, the operation is performed in-place into `src`. Raises ValueError when `dim` is out of bounds for `src.shape`. When `src.scope() == "local.fragment"`, this delegates to `cumsum_fragment`; otherwise it emits the `tl.cumsum` intrinsic.
148
+
149
+
:returns: A handle to the emitted cumulative-sum operation.
157
150
:rtype: tir.Call
158
151
159
152
160
153
.. py:function:: finalize_reducer(reducer)
161
154
162
-
Finalize the reducer buffer.
155
+
Finalize a reducer buffer by emitting the `tl.finalize_reducer` intrinsic.
156
+
157
+
This returns a TVM `tir.Call` handle that finalizes the given reducer using its writable pointer.
158
+
The call does not modify Python objects directly; it produces the low-level intrinsic call used by the IR.
163
159
164
-
:param reducer:The reducer buffer
160
+
:param reducer:Reducer buffer whose writable pointer will be finalized.
165
161
:type reducer: tir.Buffer
166
162
167
-
:returns: Handle to the finalize reducer operation
163
+
:returns: Handle to the finalize reducer intrinsic call.
Copy file name to clipboardExpand all lines: _sources/autoapi/tilelang/transform/index.rst.txt
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -372,13 +372,21 @@ Package Contents
372
372
373
373
.. py:function:: LowerDeviceKernelLaunch()
374
374
375
-
LowerDeviceKernelLaunch
375
+
Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.
376
376
377
+
This pass transforms high-level device kernel launch and related intrinsics into lower-level
378
+
IR suitable for backend code generation and device-side lowering.
379
+
380
+
:returns: The transform pass that performs device kernel launch lowering.
381
+
:rtype: tvm.transform.Pass
377
382
378
383
379
384
.. py:function:: LayoutReducer()
380
385
381
-
LayoutReducer
386
+
Return a TVM transform pass that performs layout reduction/normalization.
387
+
388
+
This wrapper delegates to the underlying FFI implementation and returns a pass object suitable for use in a PassContext or pass pipeline. The pass is intended to simplify or reduce tensor/layout-related representations during relay/tile transformations.
382
389
390
+
:returns: The transform pass object produced by the FFI backend.
<td><p>Bind target information and progressively legalize and lower frontend Tile IR into a form suitable for downstream optimization and codegen.</p></td>
<spanclass="sig-prename descclassname"><spanclass="pre">tilelang.engine.phase.</span></span><spanclass="sig-name descname"><spanclass="pre">LowerAndLegalize</span></span><spanclass="sig-paren">(</span><emclass="sig-param"><spanclass="n"><spanclass="pre">mod</span></span></em>, <emclass="sig-param"><spanclass="n"><spanclass="pre">target</span></span></em><spanclass="sig-paren">)</span><aclass="headerlink" href="#tilelang.engine.phase.LowerAndLegalize" title="Link to this definition">¶</a></dt>
588
-
<dd><dlclass="field-list simple">
588
+
<dd><p>Bind target information and progressively legalize and lower frontend Tile IR into a form suitable for downstream optimization and codegen.</p>
589
+
<p>This pass pipeline:
590
+
- Binds the provided target to the module.
591
+
- Legalizes frontend Tile IR into TVM-compatible constructs.
592
+
- Simplifies expressions.
593
+
- Configures reducer layouts and performs layout inference for fragments and shared memory.
594
+
- Lowers high-level tile operations and L2 persistent maps.
595
+
- Legalizes vectorized loops and inserts safety checks for memory accesses.
596
+
- Re-simplifies to remove redundancies introduced by safety checks.
597
+
- Attempts loop vectorization for dynamic-shaped loops.</p>
0 commit comments