Skip to content

Commit b48a907

Browse files
Update docs
1 parent ae9335a commit b48a907

File tree

5 files changed

+213
-3
lines changed

5 files changed

+213
-3
lines changed

_sources/autoapi/tilelang/language/reduce/index.rst.txt

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@ Functions
2626
tilelang.language.reduce.cumsum_fragment
2727
tilelang.language.reduce.cumsum
2828
tilelang.language.reduce.finalize_reducer
29+
tilelang.language.reduce.warp_reduce_sum
30+
tilelang.language.reduce.warp_reduce_max
31+
tilelang.language.reduce.warp_reduce_min
32+
tilelang.language.reduce.warp_reduce_bitand
33+
tilelang.language.reduce.warp_reduce_bitor
2934

3035

3136
Module Contents
@@ -236,3 +241,78 @@ Module Contents
236241
:rtype: tir.Call
237242

238243

244+
.. py:function:: warp_reduce_sum(value)
245+
246+
Perform warp reduction sum on a register value.
247+
248+
This function reduces a value across all threads in a warp using shuffle operations.
249+
Each thread provides a register `value`, and after the reduction, all threads
250+
will have the sum of all values across the warp.
251+
252+
:param value: The input register value to reduce
253+
:type value: tir.PrimExpr
254+
255+
:returns: The reduced sum value (same on all threads in the warp)
256+
:rtype: tir.PrimExpr
257+
258+
259+
.. py:function:: warp_reduce_max(value)
260+
261+
Perform warp reduction max on a register value.
262+
263+
This function reduces a value across all threads in a warp using shuffle operations.
264+
Each thread provides a register `value`, and after the reduction, all threads
265+
will have the max of all values across the warp.
266+
267+
:param value: The input register value to reduce
268+
:type value: tir.PrimExpr
269+
270+
:returns: The reduced max value (same on all threads in the warp)
271+
:rtype: tir.PrimExpr
272+
273+
274+
.. py:function:: warp_reduce_min(value)
275+
276+
Perform warp reduction min on a register value.
277+
278+
This function reduces a value across all threads in a warp using shuffle operations.
279+
Each thread provides a register `value`, and after the reduction, all threads
280+
will have the min of all values across the warp.
281+
282+
:param value: The input register value to reduce
283+
:type value: tir.PrimExpr
284+
285+
:returns: The reduced min value (same on all threads in the warp)
286+
:rtype: tir.PrimExpr
287+
288+
289+
.. py:function:: warp_reduce_bitand(value)
290+
291+
Perform warp reduction bitwise-and on a register value.
292+
293+
This function reduces a value across all threads in a warp using shuffle operations.
294+
Each thread provides a register `value`, and after the reduction, all threads
295+
will have the bitwise-and of all values across the warp.
296+
297+
:param value: The input register value to reduce
298+
:type value: tir.PrimExpr
299+
300+
:returns: The reduced bitwise-and value (same on all threads in the warp)
301+
:rtype: tir.PrimExpr
302+
303+
304+
.. py:function:: warp_reduce_bitor(value)
305+
306+
Perform warp reduction bitwise-or on a register value.
307+
308+
This function reduces a value across all threads in a warp using shuffle operations.
309+
Each thread provides a register `value`, and after the reduction, all threads
310+
will have the bitwise-or of all values across the warp.
311+
312+
:param value: The input register value to reduce
313+
:type value: tir.PrimExpr
314+
315+
:returns: The reduced bitwise-or value (same on all threads in the warp)
316+
:rtype: tir.PrimExpr
317+
318+

autoapi/tilelang/language/reduce/index.html

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -541,6 +541,21 @@ <h2>Functions<a class="headerlink" href="#functions" title="Link to this heading
541541
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.language.reduce.finalize_reducer" title="tilelang.language.reduce.finalize_reducer"><code class="xref py py-obj docutils literal notranslate"><span class="pre">finalize_reducer</span></code></a>(reducer)</p></td>
542542
<td><p>Finalize a reducer buffer by emitting the <cite>tl.finalize_reducer</cite> intrinsic.</p></td>
543543
</tr>
544+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_sum" title="tilelang.language.reduce.warp_reduce_sum"><code class="xref py py-obj docutils literal notranslate"><span class="pre">warp_reduce_sum</span></code></a>(value)</p></td>
545+
<td><p>Perform warp reduction sum on a register value.</p></td>
546+
</tr>
547+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_max" title="tilelang.language.reduce.warp_reduce_max"><code class="xref py py-obj docutils literal notranslate"><span class="pre">warp_reduce_max</span></code></a>(value)</p></td>
548+
<td><p>Perform warp reduction max on a register value.</p></td>
549+
</tr>
550+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_min" title="tilelang.language.reduce.warp_reduce_min"><code class="xref py py-obj docutils literal notranslate"><span class="pre">warp_reduce_min</span></code></a>(value)</p></td>
551+
<td><p>Perform warp reduction min on a register value.</p></td>
552+
</tr>
553+
<tr class="row-even"><td><p><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_bitand" title="tilelang.language.reduce.warp_reduce_bitand"><code class="xref py py-obj docutils literal notranslate"><span class="pre">warp_reduce_bitand</span></code></a>(value)</p></td>
554+
<td><p>Perform warp reduction bitwise-and on a register value.</p></td>
555+
</tr>
556+
<tr class="row-odd"><td><p><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_bitor" title="tilelang.language.reduce.warp_reduce_bitor"><code class="xref py py-obj docutils literal notranslate"><span class="pre">warp_reduce_bitor</span></code></a>(value)</p></td>
557+
<td><p>Perform warp reduction bitwise-or on a register value.</p></td>
558+
</tr>
544559
</tbody>
545560
</table>
546561
</div>
@@ -853,6 +868,106 @@ <h2>Module Contents<a class="headerlink" href="#module-contents" title="Link to
853868
</dl>
854869
</dd></dl>
855870

871+
<dl class="py function">
872+
<dt class="sig sig-object py" id="tilelang.language.reduce.warp_reduce_sum">
873+
<span class="sig-prename descclassname"><span class="pre">tilelang.language.reduce.</span></span><span class="sig-name descname"><span class="pre">warp_reduce_sum</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.language.reduce.warp_reduce_sum" title="Link to this definition"></a></dt>
874+
<dd><p>Perform warp reduction sum on a register value.</p>
875+
<p>This function reduces a value across all threads in a warp using shuffle operations.
876+
Each thread provides a register <cite>value</cite>, and after the reduction, all threads
877+
will have the sum of all values across the warp.</p>
878+
<dl class="field-list simple">
879+
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
880+
<dd class="field-odd"><p><strong>value</strong> (<em>tir.PrimExpr</em>) – The input register value to reduce</p>
881+
</dd>
882+
<dt class="field-even">Returns<span class="colon">:</span></dt>
883+
<dd class="field-even"><p>The reduced sum value (same on all threads in the warp)</p>
884+
</dd>
885+
<dt class="field-odd">Return type<span class="colon">:</span></dt>
886+
<dd class="field-odd"><p>tir.PrimExpr</p>
887+
</dd>
888+
</dl>
889+
</dd></dl>
890+
891+
<dl class="py function">
892+
<dt class="sig sig-object py" id="tilelang.language.reduce.warp_reduce_max">
893+
<span class="sig-prename descclassname"><span class="pre">tilelang.language.reduce.</span></span><span class="sig-name descname"><span class="pre">warp_reduce_max</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.language.reduce.warp_reduce_max" title="Link to this definition"></a></dt>
894+
<dd><p>Perform warp reduction max on a register value.</p>
895+
<p>This function reduces a value across all threads in a warp using shuffle operations.
896+
Each thread provides a register <cite>value</cite>, and after the reduction, all threads
897+
will have the max of all values across the warp.</p>
898+
<dl class="field-list simple">
899+
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
900+
<dd class="field-odd"><p><strong>value</strong> (<em>tir.PrimExpr</em>) – The input register value to reduce</p>
901+
</dd>
902+
<dt class="field-even">Returns<span class="colon">:</span></dt>
903+
<dd class="field-even"><p>The reduced max value (same on all threads in the warp)</p>
904+
</dd>
905+
<dt class="field-odd">Return type<span class="colon">:</span></dt>
906+
<dd class="field-odd"><p>tir.PrimExpr</p>
907+
</dd>
908+
</dl>
909+
</dd></dl>
910+
911+
<dl class="py function">
912+
<dt class="sig sig-object py" id="tilelang.language.reduce.warp_reduce_min">
913+
<span class="sig-prename descclassname"><span class="pre">tilelang.language.reduce.</span></span><span class="sig-name descname"><span class="pre">warp_reduce_min</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.language.reduce.warp_reduce_min" title="Link to this definition"></a></dt>
914+
<dd><p>Perform warp reduction min on a register value.</p>
915+
<p>This function reduces a value across all threads in a warp using shuffle operations.
916+
Each thread provides a register <cite>value</cite>, and after the reduction, all threads
917+
will have the min of all values across the warp.</p>
918+
<dl class="field-list simple">
919+
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
920+
<dd class="field-odd"><p><strong>value</strong> (<em>tir.PrimExpr</em>) – The input register value to reduce</p>
921+
</dd>
922+
<dt class="field-even">Returns<span class="colon">:</span></dt>
923+
<dd class="field-even"><p>The reduced min value (same on all threads in the warp)</p>
924+
</dd>
925+
<dt class="field-odd">Return type<span class="colon">:</span></dt>
926+
<dd class="field-odd"><p>tir.PrimExpr</p>
927+
</dd>
928+
</dl>
929+
</dd></dl>
930+
931+
<dl class="py function">
932+
<dt class="sig sig-object py" id="tilelang.language.reduce.warp_reduce_bitand">
933+
<span class="sig-prename descclassname"><span class="pre">tilelang.language.reduce.</span></span><span class="sig-name descname"><span class="pre">warp_reduce_bitand</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.language.reduce.warp_reduce_bitand" title="Link to this definition"></a></dt>
934+
<dd><p>Perform warp reduction bitwise-and on a register value.</p>
935+
<p>This function reduces a value across all threads in a warp using shuffle operations.
936+
Each thread provides a register <cite>value</cite>, and after the reduction, all threads
937+
will have the bitwise-and of all values across the warp.</p>
938+
<dl class="field-list simple">
939+
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
940+
<dd class="field-odd"><p><strong>value</strong> (<em>tir.PrimExpr</em>) – The input register value to reduce</p>
941+
</dd>
942+
<dt class="field-even">Returns<span class="colon">:</span></dt>
943+
<dd class="field-even"><p>The reduced bitwise-and value (same on all threads in the warp)</p>
944+
</dd>
945+
<dt class="field-odd">Return type<span class="colon">:</span></dt>
946+
<dd class="field-odd"><p>tir.PrimExpr</p>
947+
</dd>
948+
</dl>
949+
</dd></dl>
950+
951+
<dl class="py function">
952+
<dt class="sig sig-object py" id="tilelang.language.reduce.warp_reduce_bitor">
953+
<span class="sig-prename descclassname"><span class="pre">tilelang.language.reduce.</span></span><span class="sig-name descname"><span class="pre">warp_reduce_bitor</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">value</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#tilelang.language.reduce.warp_reduce_bitor" title="Link to this definition"></a></dt>
954+
<dd><p>Perform warp reduction bitwise-or on a register value.</p>
955+
<p>This function reduces a value across all threads in a warp using shuffle operations.
956+
Each thread provides a register <cite>value</cite>, and after the reduction, all threads
957+
will have the bitwise-or of all values across the warp.</p>
958+
<dl class="field-list simple">
959+
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
960+
<dd class="field-odd"><p><strong>value</strong> (<em>tir.PrimExpr</em>) – The input register value to reduce</p>
961+
</dd>
962+
<dt class="field-even">Returns<span class="colon">:</span></dt>
963+
<dd class="field-even"><p>The reduced bitwise-or value (same on all threads in the warp)</p>
964+
</dd>
965+
<dt class="field-odd">Return type<span class="colon">:</span></dt>
966+
<dd class="field-odd"><p>tir.PrimExpr</p>
967+
</dd>
968+
</dl>
969+
</dd></dl>
970+
856971
</section>
857972
</section>
858973

@@ -926,6 +1041,11 @@ <h2>Module Contents<a class="headerlink" href="#module-contents" title="Link to
9261041
<li><a class="reference internal" href="#tilelang.language.reduce.cumsum_fragment"><code class="docutils literal notranslate"><span class="pre">cumsum_fragment()</span></code></a></li>
9271042
<li><a class="reference internal" href="#tilelang.language.reduce.cumsum"><code class="docutils literal notranslate"><span class="pre">cumsum()</span></code></a></li>
9281043
<li><a class="reference internal" href="#tilelang.language.reduce.finalize_reducer"><code class="docutils literal notranslate"><span class="pre">finalize_reducer()</span></code></a></li>
1044+
<li><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_sum"><code class="docutils literal notranslate"><span class="pre">warp_reduce_sum()</span></code></a></li>
1045+
<li><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_max"><code class="docutils literal notranslate"><span class="pre">warp_reduce_max()</span></code></a></li>
1046+
<li><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_min"><code class="docutils literal notranslate"><span class="pre">warp_reduce_min()</span></code></a></li>
1047+
<li><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_bitand"><code class="docutils literal notranslate"><span class="pre">warp_reduce_bitand()</span></code></a></li>
1048+
<li><a class="reference internal" href="#tilelang.language.reduce.warp_reduce_bitor"><code class="docutils literal notranslate"><span class="pre">warp_reduce_bitor()</span></code></a></li>
9291049
</ul>
9301050
</li>
9311051
</ul>

genindex.html

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7424,6 +7424,16 @@ <h2>W</h2>
74247424
</li>
74257425
</ul></li>
74267426
<li><a href="autoapi/tilelang/intrinsics/mfma_macro_generator/index.html#tilelang.intrinsics.mfma_macro_generator.MatrixCoreIntrinEmitter.warp_cols">warp_cols (tilelang.intrinsics.mfma_macro_generator.MatrixCoreIntrinEmitter attribute)</a>
7427+
</li>
7428+
<li><a href="autoapi/tilelang/language/reduce/index.html#tilelang.language.reduce.warp_reduce_bitand">warp_reduce_bitand() (in module tilelang.language.reduce)</a>
7429+
</li>
7430+
<li><a href="autoapi/tilelang/language/reduce/index.html#tilelang.language.reduce.warp_reduce_bitor">warp_reduce_bitor() (in module tilelang.language.reduce)</a>
7431+
</li>
7432+
<li><a href="autoapi/tilelang/language/reduce/index.html#tilelang.language.reduce.warp_reduce_max">warp_reduce_max() (in module tilelang.language.reduce)</a>
7433+
</li>
7434+
<li><a href="autoapi/tilelang/language/reduce/index.html#tilelang.language.reduce.warp_reduce_min">warp_reduce_min() (in module tilelang.language.reduce)</a>
7435+
</li>
7436+
<li><a href="autoapi/tilelang/language/reduce/index.html#tilelang.language.reduce.warp_reduce_sum">warp_reduce_sum() (in module tilelang.language.reduce)</a>
74277437
</li>
74287438
<li><a href="autoapi/tilelang/intrinsics/mfma_macro_generator/index.html#tilelang.intrinsics.mfma_macro_generator.MatrixCoreIntrinEmitter.warp_row_tiles">warp_row_tiles (tilelang.intrinsics.mfma_macro_generator.MatrixCoreIntrinEmitter attribute)</a>
74297439

@@ -7457,14 +7467,14 @@ <h2>W</h2>
74577467
</li>
74587468
<li><a href="autoapi/tilelang/language/builtin/index.html#tilelang.language.builtin.warpgroup_commit_batch">warpgroup_commit_batch() (in module tilelang.language.builtin)</a>
74597469
</li>
7470+
</ul></td>
7471+
<td style="width: 33%; vertical-align: top;"><ul>
74607472
<li><a href="autoapi/tilelang/language/builtin/index.html#tilelang.language.builtin.warpgroup_fence_operand">warpgroup_fence_operand() (in module tilelang.language.builtin)</a>
74617473
</li>
74627474
<li><a href="autoapi/tilelang/language/builtin/index.html#tilelang.language.builtin.warpgroup_wait">warpgroup_wait() (in module tilelang.language.builtin)</a>
74637475
</li>
74647476
<li><a href="autoapi/tilelang/language/warpgroup/index.html#tilelang.language.warpgroup.WarpSpecialize">WarpSpecialize() (in module tilelang.language.warpgroup)</a>
74657477
</li>
7466-
</ul></td>
7467-
<td style="width: 33%; vertical-align: top;"><ul>
74687478
<li><a href="autoapi/tilelang/transform/index.html#tilelang.transform.WarpSpecialized">WarpSpecialized() (in module tilelang.transform)</a>
74697479
</li>
74707480
<li><a href="autoapi/tilelang/transform/index.html#tilelang.transform.WarpSpecializedPipeline">WarpSpecializedPipeline() (in module tilelang.transform)</a>

objects.inv

24 Bytes
Binary file not shown.

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)