-
Notifications
You must be signed in to change notification settings - Fork 14k
Feature/txe sqr #15567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Feature/txe sqr #15567
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@FR-702 @FIR-702 - llama.cpp: Sync with latest opensource
This change has following. 1. Move to new SDK 0.1.2 2. remove the requirement for libgomp in fpga build
@FIR-707: Fix requirement for libgomp and move to new sdk 0.1.2
The chanegs have following
1. Enable profiling for tsavorite backed for txe
2. Add std c++20 for compiling the profiler
The test results are as follows
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin# ./run_platform_test.sh
Check if tnApcMgr is running; if it is not, uncomment below line and execute the run_platform_test.sh script.
Running on v0.1.1.tsv30_05_24_2025
[2018-03-09 13:52:26.300409] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully.
[2018-03-09 13:52:27.339] [info] [llama.cpp:56] Execution time: 1019 ms
[2018-03-09 13:52:27.347638] 2909:2909 [ info] [LlamaForCausalLM_Random v. 2] TestBase.h:154: Model executed successfully. Validating result...
[2018-03-09 13:52:27.380511] 2909:2909 [ info] [LlamaForCausalLM_Random v. 2] TestBase.h:193: PASS [relative err=0.000000, relTol=1.000000e-05]
[2018-03-09 13:52:27.405665] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully.
Profiling Results (LlamaForCausalLM_Random):
------------------------------------------------------------------------------------------------------------------------
Calls Total(ms) T/call Self(ms) Function
------------------------------------------------------------------------------------------------------------------------
243 498.000 2.049 0.000 [45%] RuntimeHostShim::awaitCommandListCompletion
84 200.688 2.389 200.688 └─ [18%] [ txe_blob_1 ]
32 76.626 2.395 76.626 └─ [ 7%] [ txe_blob_6 ]
16 55.493 3.468 55.493 └─ [ 5%] [ txe_blob_12 ]
8 31.821 3.978 31.821 └─ [ 3%] [ txe_blob_10 ]
8 31.322 3.915 31.322 └─ [ 3%] [ txe_blob_7 ]
8 31.152 3.894 31.152 └─ [ 3%] [ txe_blob_8 ]
8 27.693 3.462 27.693 └─ [ 2%] [ txe_blob_9 ]
17 26.019 1.531 26.019 └─ [ 2%] [ txe_blob_2 ]
17 25.906 1.524 25.906 └─ [ 2%] [ txe_blob_5 ]
17 25.899 1.523 25.899 └─ [ 2%] [ txe_blob_3 ]
17 25.833 1.520 25.833 └─ [ 2%] [ txe_blob_4 ]
8 23.993 2.999 23.993 └─ [ 2%] [ txe_blob_11 ]
3 6.002 2.001 6.002 └─ [ 1%] [ txe_blob_0 ]
1 35.000 35.000 35.000 [ 3%] RuntimeHostShim::finalize
188 33.000 0.176 33.000 [ 3%] RuntimeHostShim::copy
1 16.000 16.000 16.000 [ 1%] RuntimeHostShim::initialize
13 1.000 0.077 1.000 [ 0%] RuntimeHostShim::loadBlob
573 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate
573 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList
922 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList
243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList
13 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob
33 0.000 0.000 0.000 [ 0%] RuntimeHostShim::stridedCopy
========================================================================================================================
3532 1116.000 0.316 1116.000 [100%] TOTAL
========================================================================================================================
register_backend: registered backend Tsavorite (1 devices)
register_device: registered device Tsavorite (txe)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (CPU)
load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin/tsi-ggml/libggml-tsavorite.so
load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin/tsi-ggml/libggml-cpu.so
build: 5464 (194fbaa9) with gcc (GCC) 13.3.0 for x86_64-pc-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
TXE Device MEMORY Summary total 134217728 and free 134217728
llama_model_load_from_file_impl: using device Tsavorite (txe) - 128 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 75 tensors from /tsi/anoop_feb26/tinyllama-vo-5m-para.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Vicuna Hf
llama_model_loader: - kv 3: general.size_label str = 4.6M
llama_model_loader: - kv 4: general.license str = apache-2.0
llama_model_loader: - kv 5: llama.block_count u32 = 8
llama_model_loader: - kv 6: llama.context_length u32 = 2048
llama_model_loader: - kv 7: llama.embedding_length u32 = 64
llama_model_loader: - kv 8: llama.feed_forward_length u32 = 256
llama_model_loader: - kv 9: llama.attention.head_count u32 = 16
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 11: general.file_type u32 = 32
llama_model_loader: - kv 12: llama.vocab_size u32 = 32000
llama_model_loader: - kv 13: llama.rope.dimension_count u32 = 4
llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
llama_model_loader: - kv 15: tokenizer.ggml.pre str = default
llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 22: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 23: general.quantization_version u32 = 2
llama_model_loader: - type f32: 17 tensors
llama_model_loader: - type bf16: 58 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = BF16
print_info: file size = 8.82 MiB (16.00 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 3
load: token to piece cache size = 0.1914 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 2048
print_info: n_embd = 64
print_info: n_layer = 8
print_info: n_head = 16
print_info: n_head_kv = 16
print_info: n_rot = 4
print_info: n_swa = 0
print_info: n_swa_pattern = 1
print_info: n_embd_head_k = 4
print_info: n_embd_head_v = 4
print_info: n_gqa = 1
print_info: n_embd_k_gqa = 64
print_info: n_embd_v_gqa = 64
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 256
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 2048
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = ?B
print_info: model params = 4.62 M
print_info: general.name = Vicuna Hf
print_info: vocab type = SPM
print_info: n_vocab = 32000
print_info: n_merges = 0
print_info: BOS token = 1 '<s>'
print_info: EOS token = 2 '</s>'
print_info: UNK token = 0 '<unk>'
print_info: PAD token = 0 '<unk>'
print_info: LF token = 13 '<0x0A>'
print_info: EOG token = 2 '</s>'
print_info: max token length = 18
load_tensors: loading model tensors, this can take a while... (mmap = true)
TXE Device MEMORY Summary total 134217728 and free 134217728
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/9 layers to GPU
load_tensors: CPU_Mapped model buffer size = 8.82 MiB
..............
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 12288
llama_context: n_ctx_per_seq = 12288
llama_context: n_batch = 1024
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (12288) > n_ctx_train (2048) -- possible training context overflow
[2018-03-09 13:52:28.706203] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully.
llama_context: CPU output buffer size = 0.12 MiB
llama_kv_cache_unified: CPU KV buffer size = 24.00 MiB
llama_kv_cache_unified: size = 24.00 MiB ( 12288 cells, 8 layers, 1 seqs), K (f16): 12.00 MiB, V (f16): 12.00 MiB
ggml_backend_tsavorite_buffer_type_alloc_buffer is called from llama data Loader
ANoop Allocating memory from tsi_alloc with size 266240
Allocating memory from tsi_alloc with size 266240 starting memory 0xffff93e00080
Address of Newly Created BUffer 0xffff93e00080 and size 266240
llama_context: tsavorite compute buffer size = 0.25 MiB
llama_context: CPU compute buffer size = 408.51 MiB
llama_context: graph nodes = 294
llama_context: graph splits = 67 (with bs=512), 37 (with bs=1)
common_init_from_params: setting dry_penalty_last_n to ctx_size = 12288
main: llama threadpool init, n_threads = 4
main: model was trained on only 2048 context tokens (12288 specified)
system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : NEON = 1 | ARM_FMA = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 |
sampler seed: 177927434
sampler params:
repeat_last_n = 5, repeat_penalty = 1.500, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 12288
top_k = 50, top_p = 0.900, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.000
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 12288, n_batch = 1024, n_predict = 10, n_keep = 1
my cat's name was Tim. He loved to play with his toy
llama_perf_sampler_print: sampling time = 195.98 ms / 16 runs ( 12.25 ms per token, 81.64 tokens per second)
llama_perf_context_print: load time = 1577.27 ms
llama_perf_context_print: prompt eval time = 305.19 ms / 6 tokens ( 50.86 ms per token, 19.66 tokens per second)
llama_perf_context_print: eval time = 803.59 ms / 9 runs ( 89.29 ms per token, 11.20 tokens per second)
llama_perf_context_print: total time = 2628.44 ms / 15 tokens
TXE_ADD Operation, total tensor: 10 Number of Kernel Call: 10 Number of tensor got spilt: 0 Min Num of Elem 64 Max Num of Elem 64
TXE_SUB Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0
TXE_MULT Operation, total tensor: 170 Number of Kernel Call: 245 Number of tensor got spilt: 0 Min Num of Elem 64 Max Num of Elem 384
TXE_DIV Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0
TXE_SQRT Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0
TXE_NEG Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0
TXE_ABS Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0
TXE_SIN Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0
TXE_SIGMOID Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0
[2018-03-09 13:52:32.222949] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully.
GGML Tsavorite Profiling Results:
------------------------------------------------------------------------------------------------------------------------
Calls Total(ms) T/call Self(ms) Function
------------------------------------------------------------------------------------------------------------------------
255 255.000 1.000 0.000 [ 7%] RuntimeHostShim::awaitCommandListCompletion
245 379.466 1.549 379.466 └─ [11%] [ txe_mult_blob ]
10 15.443 1.544 15.443 └─ [ 0%] [ txe_add_blob ]
1 35.000 35.000 35.000 [ 1%] RuntimeHostShim::finalize
1 19.000 19.000 2.000 [ 1%] GGML Tsavorite
1 17.000 17.000 17.000 └─ [ 0%] RuntimeHostShim::initialize
256 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate
1020 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager
255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList
255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::loadBlob
255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob
255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList
255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList
255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob
255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate
========================================================================================================================
3318 3529.000 1.064 3529.000 [100%] TOTAL
========================================================================================================================
root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin#
Fir 709 - gGGML: Adding SILU Kernel
as follows /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: /proj/work/atrivedi/workspace/06_02_2025/llama.cpp/ggml-tsi-kernel/fpga/host/host_abs.o: in function `txe_abs_host': LLVMDialectModule:(.text+0x18): undefined reference to `tsi_alloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x24): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x30): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x3c): undefined reference to `tsi_create_command_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x58): undefined reference to `tsi_load_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x64): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x70): undefined reference to `tsi_launch_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x7c): undefined reference to `tsi_add_command_to_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x84): undefined reference to `tsi_finalize_command_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x8c): undefined reference to `tsi_wait' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x94): undefined reference to `tsi_unload_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0xa0): undefined reference to `tsi_dealloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: /proj/work/atrivedi/workspace/06_02_2025/llama.cpp/ggml-tsi-kernel/fpga/host/host_add.o: in function `txe_add_host': LLVMDialectModule:(.text+0x20): undefined reference to `tsi_alloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x2c): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x38): undefined reference to `tsi_shmem_handle_from_ptr'
runtime/utils/lib/ path
FIR-714: Updated the SDK Release r0.1.3
FIR-714: Updated the SDK Release r0.1.3
FIR 722 --- ggml-tsi-kernel latest changes updated
This is a first version of FlaskInterface tool with following 1. Xterm Interface via Browser via /terminal endpoint 2. Serial console interface via Browser via /serial endpoint
@FIR-715: Added FlaskInterface tool for serial port
Just testing my first git pull
Llama.cpp: Webserver & HTML pages support
@FIR-781 - LLama.cpp ggml Stats:Adding Backend and Unary OP Detail
@FIR-782 Llama.cpp: Partial Offloading of Tsavorite Operations
* Added ls -l so Karrar can see files * Resolved comments * Uncommented a print statement * Fixed the custom method I made --------- Co-authored-by: Lewis Lui <[email protected]>
@FIR-783 -Llama.cpp: Update new Release version
@FIR-787 -llama.cpp: Fixed AWS linking issue for tsi-ggml-aws-latest.tz
@FIR-790 -LLama.cpp: Add all Src tensor shape & size
…t to measure performance
@FIR-827 -llama.cpp: python script to run model with different prompt to measure performance
@FIR-895 - llama.cpp: updating the MLIR SDK Version to 1.8
Signed-off-by: Dinesh Reddy <[email protected]>
Signed-off-by: Dinesh Reddy <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[dreddy@wssw01 llama.cpp]$ ./build-posix/bin/simple-backend-tsi "sqr"
load_model: using TSavorite backend
Calculating mem_size 384 1 and creating ggml context
Creating input Tensor
Creating Backend Buffer
Loading Input Tensor Data to Backend Buffer
Bringing tensor data from Backend buffer and printing 32 tensor data:
[ 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 ]
main: compute buffer size: 0.2500 KB
Under Test case for compute API creating build_graph
Compute Done
operation type: 5, num of elements 32
compute is also done
TEST CASE PASSED
GGML Tsavorite Profiling Results:
Calls Total(ms) T/call Self(ms) Function
[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)
[Thread] tsi::runtime::TsavRTPosix::loadBlob (cumulative over all threads)
[Thread] tsi::runtime::TsavRTPosix::unloadBlob (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::awaitCommandListCompletion (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
========================================================================================================================
- 2025.8510 0.0000 2025.8510 [100.00%] TOTAL
Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.8333
[dreddy@wssw01 llama.cpp]$