@@ -35,31 +35,31 @@ The IRON Python API for Ryzen™ AI NPUs is described in the following paper:
3535
3636| Section | Description | Datatype | AIE2 | AIE2P | Status | Design Example |
3737| :--------| :------------| :---------| :-----| :------| :-------| :-------------|
38- | [ Element-wise Add] ( ./aie_kernels/generic/add.cc ) | Element-wise addition kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /elementwise_add/] ( ./example /elementwise_add/ ) |
39- | [ Element-wise Mul] ( ./aie_kernels/generic/mul.cc ) | Element-wise multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /elementwise_mul/] ( ./example /elementwise_mul/ ) |
40- | [ GEMM] ( ./aie_kernels/aie2p/mm.cc ) | General Matrix Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /gemm/] ( ./example /gemm/ ) |
41- | [ GEMV] ( ./aie_kernels/generic/mv.cc ) | General Matrix-Vector Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example/matrix_vector_mul /] ( ./example/matrix_vector_mul / ) |
42- | [ GQA] ( ./aie_kernels/aie2p/mha.cc ) | Grouped Query Attention kernel (Single pipeline) | bfloat16 | | ✓ | 🟢 | [ example /mha/] ( ./example /mha/ ) |
43- | [ MHA] ( ./aie_kernels/aie2p/mha.cc ) | Multi-Head Attention kernel & Grouped Query Attention | bfloat16 | | ✓ | 🟢 | [ example /mha/] ( ./example /mha/ ) |
44- | [ RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /rms_norm/] ( ./example /rms_norm/ ) |
45- | [ RoPE] ( ./aie_kernels/generic/rope.cc ) | Rotary Positional Embedding kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /rope/] ( ./example /rope/ ) |
46- | [ SiLU] ( ./aie_kernels/aie2/silu.cc ) | Sigmoid Linear Unit activation kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /silu/] ( ./example /silu/ ) |
47- | [ Softmax] ( ./aie_kernels/aie2/softmax.cc ) | Softmax kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /softmax/] ( ./example /softmax/ ) |
48- | [ Weighted RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | Weighted RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /rms_norm/] ( ./example /rms_norm/ ) |
49- | [ Copy] ( ./aie_kernels/generic/passThrough.cc ) | Copy | bfloat16 | ✓ | ✓ | 🟢 | [ example /mem_copy/] ( ./example /mem_copy/ ) |
50- | [ Transpose] ( ./aie_kernels/generic/transpose.cc ) | Transpose | bfloat16 | ✓ | ✓ | 🟢 | [ example /transpose/] ( ./example /transpose/ ) |
51- | [ AXPY] ( ./aie_kernels/generic/axpy.cc ) | AXPY | bfloat16 | ✓ | ✓ | 🟢 | [ example /axpy/] ( ./example /axpy/ ) |
38+ | [ Element-wise Add] ( ./aie_kernels/generic/add.cc ) | Element-wise addition kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /elementwise_add/] ( ./operators /elementwise_add/ ) |
39+ | [ Element-wise Mul] ( ./aie_kernels/generic/mul.cc ) | Element-wise multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /elementwise_mul/] ( ./operators /elementwise_mul/ ) |
40+ | [ GEMM] ( ./aie_kernels/aie2p/mm.cc ) | General Matrix Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /gemm/] ( ./operators /gemm/ ) |
41+ | [ GEMV] ( ./aie_kernels/generic/mv.cc ) | General Matrix-Vector Multiplication kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators/gemv /] ( ./operators/gemv / ) |
42+ | [ GQA] ( ./aie_kernels/aie2p/mha.cc ) | Grouped Query Attention kernel (Single pipeline) | bfloat16 | | ✓ | 🟢 | [ operators /mha/] ( ./operators /mha/ ) |
43+ | [ MHA] ( ./aie_kernels/aie2p/mha.cc ) | Multi-Head Attention kernel & Grouped Query Attention | bfloat16 | | ✓ | 🟢 | [ operators /mha/] ( ./operators /mha/ ) |
44+ | [ RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /rms_norm/] ( ./operators /rms_norm/ ) |
45+ | [ RoPE] ( ./aie_kernels/generic/rope.cc ) | Rotary Positional Embedding kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /rope/] ( ./operators /rope/ ) |
46+ | [ SiLU] ( ./aie_kernels/aie2/silu.cc ) | Sigmoid Linear Unit activation kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /silu/] ( ./operators /silu/ ) |
47+ | [ Softmax] ( ./aie_kernels/aie2/softmax.cc ) | Softmax kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /softmax/] ( ./operators /softmax/ ) |
48+ | [ Weighted RMSNorm] ( ./aie_kernels/aie2/rms_norm.cc ) | Weighted RMSNorm kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /rms_norm/] ( ./operators /rms_norm/ ) |
49+ | [ Copy] ( ./aie_kernels/generic/passThrough.cc ) | Copy | bfloat16 | ✓ | ✓ | 🟢 | [ operators /mem_copy/] ( ./operators /mem_copy/ ) |
50+ | [ Transpose] ( ./aie_kernels/generic/transpose.cc ) | Transpose | bfloat16 | ✓ | ✓ | 🟢 | [ operators /transpose/] ( ./operators /transpose/ ) |
51+ | [ AXPY] ( ./aie_kernels/generic/axpy.cc ) | AXPY | bfloat16 | ✓ | ✓ | 🟢 | [ operators /axpy/] ( ./operators /axpy/ ) |
5252| [ Reduction] ( ) | Reduction | bfloat16 | | | 🟡 | |
53- | [ Dequant] ( ./aie_kernels/generic/expand.cc ) | Dequant Q4NX from [ AWQ] ( https://github.com/mit-han-lab/llm-awq ) to bfloat16 | bfloat16 | ✓ | ✓ | 🟢 | [ example /dequant/] ( ./example /dequant/ ) |
54- | [ RELU] ( ./aie_kernels/aie2/relu.cc ) | RELU | bfloat16 | ✓ | ✓ | 🟢 | [ example /relu/] ( ./example /relu/ ) |
55- | [ Leaky RELU] ( ./aie_kernels/aie2p/leaky_relu.cc ) (WIP) | Leaky RELU kernel | bfloat16 | | ✓ | ⚪ | [ example /leaky_relu/] ( ./example /leaky_relu/ ) |
56- | [ GELU] ( ./aie_kernels/aie2/gelu.cc ) | GELU | bfloat16 | ✓ | ✓ | 🟢 | [ example /gelu/] ( ./example /gelu/ ) |
57- | [ LayerNorm] ( ./aie_kernels/aie2/layer_norm.cc ) | LayerNorm | bfloat16 | ✓ | ✓ | 🟢 | [ example /layer_norm/] ( ./example /layer_norm/ ) |
53+ | [ Dequant] ( ./aie_kernels/generic/expand.cc ) | Dequant Q4NX from [ AWQ] ( https://github.com/mit-han-lab/llm-awq ) to bfloat16 | bfloat16 | ✓ | ✓ | 🟢 | [ operators /dequant/] ( ./operators /dequant/ ) |
54+ | [ RELU] ( ./aie_kernels/aie2/relu.cc ) | RELU | bfloat16 | ✓ | ✓ | 🟢 | [ operators /relu/] ( ./operators /relu/ ) |
55+ | [ Leaky RELU] ( ./aie_kernels/aie2p/leaky_relu.cc ) (WIP) | Leaky RELU kernel | bfloat16 | | ✓ | ⚪ | [ operators /leaky_relu/] ( ./operators /leaky_relu/ ) |
56+ | [ GELU] ( ./aie_kernels/aie2/gelu.cc ) | GELU | bfloat16 | ✓ | ✓ | 🟢 | [ operators /gelu/] ( ./operators /gelu/ ) |
57+ | [ LayerNorm] ( ./aie_kernels/aie2/layer_norm.cc ) | LayerNorm | bfloat16 | ✓ | ✓ | 🟢 | [ operators /layer_norm/] ( ./operators /layer_norm/ ) |
5858| [ Convolution] ( ) | Convolution | bfloat16 | | | 🟡 | |
5959| [ MaxPool] ( ) | MaxPool | bfloat16 | | | ⚪ | |
6060| [ AveragePool] ( ) | AveragePool | bfloat16 | | | ⚪ | |
61- | [ Tanh] ( ./aie_kernels/aie2/tanh.cc ) | Tanh kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /tanh/] ( ./example /tanh/ ) |
62- | [ Sigmoid] ( ./aie_kernels/aie2/sigmoid.cc ) | Sigmoid kernel | bfloat16 | ✓ | ✓ | 🟢 | [ example /sigmoid/] ( ./example /sigmoid/ ) |
61+ | [ Tanh] ( ./aie_kernels/aie2/tanh.cc ) | Tanh kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /tanh/] ( ./operators /tanh/ ) |
62+ | [ Sigmoid] ( ./aie_kernels/aie2/sigmoid.cc ) | Sigmoid kernel | bfloat16 | ✓ | ✓ | 🟢 | [ operators /sigmoid/] ( ./operators /sigmoid/ ) |
6363
6464> Use this dashboard to quickly check the status of each kernel and locate relevant setup, build, and usage information.
6565
0 commit comments