-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Labels
LLKquasartest-infraThis label is used for issues, pull requests, or tasks related to the LLK testing frameworkThis label is used for issues, pull requests, or tasks related to the LLK testing framework
Description
This is an offshoot of this issue: tenstorrent/tt-metal#37122
Ran the model and then ran a script to provide every single LLK invocation + data formats + tile shapes + other arguments we need to pass.
So for the following test:
export HF_MODEL=/proj_sw/user_dev/llama32-data/Llama3.2-1B-Instruct python -m tracy --op-support-count 10000 -r -m pytest models/tt_transformers/demo/simple_text_demo.py -k "\"performance and batch-1\""
Here is the result:
LLK API Cross-Model Summary
| LLK API | Configs | Total Invocations | TTNN Ops | Op Args | Input Data Formats | Output Data Formats | Tile Dims | Math Fidelity | Math Approx | FP32 Dest Accum | Dst Sync Mode | Kernel Defines |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
llk_math_eltwise_binary |
5 | 1688 | BinaryNgDeviceOperation | BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB | Bfp8_b, Float16_b | Bfp8_b, Float16_b | 32x32 | LoFi | False | False | SyncHalf | SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_add_int |
1 | 57 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=add_int_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32 | Int32 | Int32 | 32x32 | LoFi | False | True | SyncHalf | WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_add_int_init |
1 | 57 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=add_int_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32 | Int32 | Int32 | 32x32 | LoFi | False | True | SyncHalf | WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_gt_int32 |
1 | 8 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=gt_int32_tile_init();; BINARY_SFPU_OP=gt_int32_tile | Int32 | Int32 | 32x32 | LoFi | False | True | SyncHalf | WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_gt_int32_init |
1 | 8 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=gt_int32_tile_init();; BINARY_SFPU_OP=gt_int32_tile | Int32 | Int32 | 32x32 | LoFi | False | True | SyncHalf | WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_mul |
2 | 844 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=mul_binary_tile_init();; BINARY_SFPU_OP=mul_binary_tile | Bfp8_b, Float16_b | Bfp8_b, Float16_b | 32x32 | LoFi | False | False | SyncHalf | BCAST_INPUT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_mul_init |
2 | 844 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=mul_binary_tile_init();; BINARY_SFPU_OP=mul_binary_tile | Bfp8_b, Float16_b | Bfp8_b, Float16_b | 32x32 | LoFi | False | False | SyncHalf | BCAST_INPUT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_mul_int |
1 | 12 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=mul_int_tile_initDataFormat::Int32();; BINARY_SFPU_OP=mul_int_tileDataFormat::Int32 | Int32 | Int32 | 32x32 | LoFi | False | True | SyncHalf | WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_mul_int_init |
1 | 12 | BinaryNgDeviceOperation | BINARY_SFPU_INIT=mul_int_tile_initDataFormat::Int32();; BINARY_SFPU_OP=mul_int_tileDataFormat::Int32 | Int32 | Int32 | 32x32 | LoFi | False | True | SyncHalf | WHERE_TST=0; WHERE_TTS=0 |
llk_math_eltwise_binary_sfpu_where |
2 | 12 | BinaryNgDeviceOperation, TernaryDeviceOperation | BINARY_SFPU_INIT=where_tile_init();; BINARY_SFPU_OP=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init | Float16_b | Float16_b | 32x32 | LoFi | False | False | SyncHalf | BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; WHERE_TST=0; WHERE_TTS=1 |
llk_math_eltwise_binary_sfpu_where_init |
2 | 12 | BinaryNgDeviceOperation, TernaryDeviceOperation | BINARY_SFPU_INIT=where_tile_init();; BINARY_SFPU_OP=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init | Float16_b | Float16_b | 32x32 | LoFi | False | False | SyncHalf | BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; WHERE_TST=0; WHERE_TTS=1 |
llk_math_eltwise_unary_datacopy |
7 | 933 | BinaryNgDeviceOperation, TernaryDeviceOperation | BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();, where_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32, where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init | Bfp8_b, Float16_b, Int32 | Bfp8_b, Float16_b, Int32 | 32x32 | LoFi | False | False, True | SyncHalf | BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0, 1 |
llk_math_matmul |
13 | 4331 | MatmulDeviceOperation | Bfp4_b, Bfp8_b, Float16_b, Tf32 | Bfp8_b, Float16_b, Float32 | 32x32 | LoFi | False | False, True | SyncHalf | FP32_DEST_ACC_EN=1; MATMUL_DRAM_SHARDED=1; PACKER_L1_ACC=1 | |
llk_pack |
27 | 7013 | BinaryNgDeviceOperation, EmbeddingsDeviceOperation, MatmulDeviceOperation, TernaryDeviceOperation | BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB; BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();, where_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32, where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init | Bfp8_b, Float16_b, Float32, Int32, UInt32 | Bfp4_b, Bfp8_b, Float16_b, Float32, Int32, UInt32 | 32x32 | LoFi | False | False, True | SyncHalf | BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; FP32_DEST_ACC_EN=1; MATMUL_DRAM_SHARDED=1; PACKER_L1_ACC=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0, 1 |
llk_pack_relu_config |
10 | 2609 | BinaryNgDeviceOperation | BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB; BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32 | Bfp8_b, Float16_b, Int32 | Bfp8_b, Float16_b, Int32 | 32x32 | LoFi | False | False, True | SyncHalf | BCAST_INPUT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0 |
llk_unpack_A |
7 | 933 | BinaryNgDeviceOperation, TernaryDeviceOperation | BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();, where_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32, where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init | Bfp8_b, Float16_b, Int32 | Bfp8_b, Float16_b, Int32 | 32x32 | LoFi | False | False, True | SyncHalf | BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0, 1 |
llk_unpack_AB |
5 | 1688 | BinaryNgDeviceOperation | BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB | Bfp8_b, Float16_b | Bfp8_b, Float16_b | 32x32 | LoFi | False | False | SyncHalf | SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0 |
llk_unpack_AB_matmul |
13 | 4331 | MatmulDeviceOperation | Bfp4_b, Bfp8_b, Float16_b, Float32 | Bfp4_b, Bfp8_b, Float16_b, Tf32 | 32x32 | LoFi | False | False, True | SyncHalf | FP32_DEST_ACC_EN=1; MATMUL_DRAM_SHARDED=1; PACKER_L1_ACC=1 | |
llk_unpack_tilize |
2 | 61 | EmbeddingsDeviceOperation | Float16_b, UInt32 | Float16_b, UInt32 | 32x32 | LoFi | False | False | SyncHalf |
@fvranicTT @nvelickovicTT @vmilicevicTT , fyi
Quasar missing features:
- Binary SFPU instruction: Add Int
- Binary SFPU instruction: Greater than Int
- Binary SFPU instruction: MUL
- Binary SFPU instruction: Where
- Unary SFPU Instruction: SiLu
test infra missing features:
- MXFP4 (The model above uses Bfp4 for matmul)
- Binary SFPU testing? (unsure)
@fvranicTT we need to add testing for everything else
@ryanzhuTT is working on SiLu #1295
Reactions are currently unavailable
Metadata
Metadata
Labels
LLKquasartest-infraThis label is used for issues, pull requests, or tasks related to the LLK testing frameworkThis label is used for issues, pull requests, or tasks related to the LLK testing framework