@@ -7,47 +7,48 @@ The examples are organized into the following categories:
7
7
Basic Operations
8
8
~~~~~~~~~~~~~~~
9
9
10
- - ``add.py ``: Element-wise addition with broadcasting support
11
- - ``exp.py ``: Element-wise exponential function
12
- - ``sum.py ``: Sum reduction along the last dimension
13
- - ``long_sum.py ``: Efficient sum reduction along a long dimension
14
- - ``softmax.py ``: Different implementations of the softmax function
10
+ - :doc: `add.py <add >`: Element-wise addition with broadcasting support
11
+ - :doc: `exp.py <exp >`: Element-wise exponential function
12
+ - :doc: `sum.py <sum >`: Sum reduction along the last dimension
13
+ - :doc: `long_sum.py <long_sum >`: Efficient sum reduction along a long dimension
14
+ - :doc: `softmax.py <softmax >`: Different implementations of the softmax function
15
+
15
16
16
17
Matrix Multiplication Operations
17
18
~~~~~~~~~~~~~~~~
18
19
19
- - `` matmul.py ` `: Basic matrix multiplication
20
- - `` bmm.py ` `: Batch matrix multiplication
21
- - `` matmul_split_k.py ` `: Matrix multiplication using split-K algorithm for better parallelism
22
- - `` matmul_layernorm.py ` `: Fused matrix multiplication and layer normalization
23
- - `` fp8_gemm.py ` `: Matrix multiplication using FP8 precision
20
+ - :doc: ` matmul.py < matmul > `: Basic matrix multiplication
21
+ - :doc: ` bmm.py < bmm > `: Batch matrix multiplication
22
+ - :doc: ` matmul_split_k.py < matmul_split_k > `: Matrix multiplication using split-K algorithm for better parallelism
23
+ - :doc: ` matmul_layernorm.py < matmul_layernorm > `: Fused matrix multiplication and layer normalization
24
+ - :doc: ` fp8_gemm.py < fp8_gemm > `: Matrix multiplication using FP8 precision
24
25
25
26
Attention Operations
26
27
~~~~~~~~~~~~~~~~~~~
27
28
28
- - `` attention.py ` `: Scaled dot-product attention mechanism
29
- - `` fp8_attention.py ` `: Attention mechanism using FP8 precision
29
+ - :doc: ` attention.py < attention > `: Scaled dot-product attention mechanism
30
+ - :doc: ` fp8_attention.py < fp8_attention > `: Attention mechanism using FP8 precision
30
31
31
32
Normalization
32
33
~~~~~~~~~~~~
33
34
34
- - `` rms_norm.py ` `: Root Mean Square (RMS) normalization
35
+ - :doc: ` rms_norm.py < rms_norm > `: Root Mean Square (RMS) normalization
35
36
36
37
Sparse and Jagged Tensors
37
38
~~~~~~~~~~~~~~~~~~~~~~~~~
38
39
39
- - `` jagged_dense_add.py ` `: Addition between a jagged tensor and a dense tensor
40
- - `` jagged_mean.py ` `: Computing the mean of each row in a jagged tensor
41
- - `` segment_reduction.py ` `: Segmented reduction operation
42
- - `` moe_matmul_ogs.py ` `: Mixture-of-Experts matrix multiplication using Outer-Gather-Scatter
40
+ - :doc: ` jagged_dense_add.py < jagged_dense_add > `: Addition between a jagged tensor and a dense tensor
41
+ - :doc: ` jagged_mean.py < jagged_mean > `: Computing the mean of each row in a jagged tensor
42
+ - :doc: ` segment_reduction.py < segment_reduction > `: Segmented reduction operation
43
+ - :doc: ` moe_matmul_ogs.py < moe_matmul_ogs > `: Mixture-of-Experts matrix multiplication using Outer-Gather-Scatter
43
44
44
45
Other Operations
45
46
~~~~~~~~~~~~~~~
46
47
47
- - `` concatenate.py ` `: Tensor concatenation along a dimension
48
- - `` cross_entropy.py ` `: Cross entropy loss function
49
- - `` embedding.py ` `: Embedding lookup operation
50
- - `` all_gather_matmul.py ` `: All-gather operation followed by matrix multiplication
48
+ - :doc: ` concatenate.py < concatenate > `: Tensor concatenation along a dimension
49
+ - :doc: ` cross_entropy.py < cross_entropy > `: Cross entropy loss function
50
+ - :doc: ` embedding.py < embedding > `: Embedding lookup operation
51
+ - :doc: ` all_gather_matmul.py < all_gather_matmul > `: All-gather operation followed by matrix multiplication
51
52
52
53
.. toctree ::
53
54
:maxdepth: 2
0 commit comments