Skip to content

[Quasar] LLama test LLKs #1364

@rtawfik01

Description

@rtawfik01

This is an offshoot of this issue: tenstorrent/tt-metal#37122

Ran the model and then ran a script to provide every single LLK invocation + data formats + tile shapes + other arguments we need to pass.

So for the following test:
export HF_MODEL=/proj_sw/user_dev/llama32-data/Llama3.2-1B-Instruct python -m tracy --op-support-count 10000 -r -m pytest models/tt_transformers/demo/simple_text_demo.py -k "\"performance and batch-1\""

Here is the result:

LLK API Cross-Model Summary

LLK API Configs Total Invocations TTNN Ops Op Args Input Data Formats Output Data Formats Tile Dims Math Fidelity Math Approx FP32 Dest Accum Dst Sync Mode Kernel Defines
llk_math_eltwise_binary 5 1688 BinaryNgDeviceOperation BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB Bfp8_b, Float16_b Bfp8_b, Float16_b 32x32 LoFi False False SyncHalf SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_add_int 1 57 BinaryNgDeviceOperation BINARY_SFPU_INIT=add_int_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32 Int32 Int32 32x32 LoFi False True SyncHalf WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_add_int_init 1 57 BinaryNgDeviceOperation BINARY_SFPU_INIT=add_int_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32 Int32 Int32 32x32 LoFi False True SyncHalf WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_gt_int32 1 8 BinaryNgDeviceOperation BINARY_SFPU_INIT=gt_int32_tile_init();; BINARY_SFPU_OP=gt_int32_tile Int32 Int32 32x32 LoFi False True SyncHalf WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_gt_int32_init 1 8 BinaryNgDeviceOperation BINARY_SFPU_INIT=gt_int32_tile_init();; BINARY_SFPU_OP=gt_int32_tile Int32 Int32 32x32 LoFi False True SyncHalf WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_mul 2 844 BinaryNgDeviceOperation BINARY_SFPU_INIT=mul_binary_tile_init();; BINARY_SFPU_OP=mul_binary_tile Bfp8_b, Float16_b Bfp8_b, Float16_b 32x32 LoFi False False SyncHalf BCAST_INPUT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_mul_init 2 844 BinaryNgDeviceOperation BINARY_SFPU_INIT=mul_binary_tile_init();; BINARY_SFPU_OP=mul_binary_tile Bfp8_b, Float16_b Bfp8_b, Float16_b 32x32 LoFi False False SyncHalf BCAST_INPUT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_mul_int 1 12 BinaryNgDeviceOperation BINARY_SFPU_INIT=mul_int_tile_initDataFormat::Int32();; BINARY_SFPU_OP=mul_int_tileDataFormat::Int32 Int32 Int32 32x32 LoFi False True SyncHalf WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_mul_int_init 1 12 BinaryNgDeviceOperation BINARY_SFPU_INIT=mul_int_tile_initDataFormat::Int32();; BINARY_SFPU_OP=mul_int_tileDataFormat::Int32 Int32 Int32 32x32 LoFi False True SyncHalf WHERE_TST=0; WHERE_TTS=0
llk_math_eltwise_binary_sfpu_where 2 12 BinaryNgDeviceOperation, TernaryDeviceOperation BINARY_SFPU_INIT=where_tile_init();; BINARY_SFPU_OP=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init Float16_b Float16_b 32x32 LoFi False False SyncHalf BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; WHERE_TST=0; WHERE_TTS=1
llk_math_eltwise_binary_sfpu_where_init 2 12 BinaryNgDeviceOperation, TernaryDeviceOperation BINARY_SFPU_INIT=where_tile_init();; BINARY_SFPU_OP=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init Float16_b Float16_b 32x32 LoFi False False SyncHalf BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; WHERE_TST=0; WHERE_TTS=1
llk_math_eltwise_unary_datacopy 7 933 BinaryNgDeviceOperation, TernaryDeviceOperation BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();, where_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32, where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init Bfp8_b, Float16_b, Int32 Bfp8_b, Float16_b, Int32 32x32 LoFi False False, True SyncHalf BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0, 1
llk_math_matmul 13 4331 MatmulDeviceOperation Bfp4_b, Bfp8_b, Float16_b, Tf32 Bfp8_b, Float16_b, Float32 32x32 LoFi False False, True SyncHalf FP32_DEST_ACC_EN=1; MATMUL_DRAM_SHARDED=1; PACKER_L1_ACC=1
llk_pack 27 7013 BinaryNgDeviceOperation, EmbeddingsDeviceOperation, MatmulDeviceOperation, TernaryDeviceOperation BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB; BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();, where_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32, where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init Bfp8_b, Float16_b, Float32, Int32, UInt32 Bfp4_b, Bfp8_b, Float16_b, Float32, Int32, UInt32 32x32 LoFi False False, True SyncHalf BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; FP32_DEST_ACC_EN=1; MATMUL_DRAM_SHARDED=1; PACKER_L1_ACC=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0, 1
llk_pack_relu_config 10 2609 BinaryNgDeviceOperation BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB; BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32 Bfp8_b, Float16_b, Int32 Bfp8_b, Float16_b, Int32 32x32 LoFi False False, True SyncHalf BCAST_INPUT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0
llk_unpack_A 7 933 BinaryNgDeviceOperation, TernaryDeviceOperation BINARY_SFPU_INIT=add_int_tile_init();, gt_int32_tile_init();, mul_binary_tile_init();, mul_int_tile_initDataFormat::Int32();, where_tile_init();; BINARY_SFPU_OP=add_int_tileDataFormat::Int32, gt_int32_tile, mul_binary_tile, mul_int_tileDataFormat::Int32, where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_FUNC=where_tileDataFormat::Float16_b; TERNARY_SFPU_OP_INIT=where_tile_init Bfp8_b, Float16_b, Int32 Bfp8_b, Float16_b, Int32 32x32 LoFi False False, True SyncHalf BCAST_A=0; BCAST_B=0; BCAST_C=0; BCAST_INPUT=1; FILL_LLK=fill_tile; FILL_WITH_VALUE_FLOAT=1; SFPU_OP_COMPUTE_KERNEL_API_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0, 1
llk_unpack_AB 5 1688 BinaryNgDeviceOperation BINARY_OP=add_tiles, mul_tiles, sub_tiles; BINARY_OP_TYPE=EltwiseBinaryType::ELWADD, EltwiseBinaryType::ELWMUL, EltwiseBinaryType::ELWSUB Bfp8_b, Float16_b Bfp8_b, Float16_b 32x32 LoFi False False SyncHalf SFPU_OP_UNARY_COMP_INCLUDE=1; WHERE_TST=0; WHERE_TTS=0
llk_unpack_AB_matmul 13 4331 MatmulDeviceOperation Bfp4_b, Bfp8_b, Float16_b, Float32 Bfp4_b, Bfp8_b, Float16_b, Tf32 32x32 LoFi False False, True SyncHalf FP32_DEST_ACC_EN=1; MATMUL_DRAM_SHARDED=1; PACKER_L1_ACC=1
llk_unpack_tilize 2 61 EmbeddingsDeviceOperation Float16_b, UInt32 Float16_b, UInt32 32x32 LoFi False False SyncHalf

@fvranicTT @nvelickovicTT @vmilicevicTT , fyi

Quasar missing features:

  • Binary SFPU instruction: Add Int
  • Binary SFPU instruction: Greater than Int
  • Binary SFPU instruction: MUL
  • Binary SFPU instruction: Where
  • Unary SFPU Instruction: SiLu

test infra missing features:

  • MXFP4 (The model above uses Bfp4 for matmul)
  • Binary SFPU testing? (unsure)

@fvranicTT we need to add testing for everything else
@ryanzhuTT is working on SiLu #1295

Metadata

Metadata

Labels

LLKquasartest-infraThis label is used for issues, pull requests, or tasks related to the LLK testing framework

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions