Integrate copy2fpga filetransfer #14306

mmankal · 2025-06-20T15:07:56Z

Make sure to read the contributing guidelines before submitting a PR

@FR-702 @FIR-702 - llama.cpp: Sync with latest opensource

This change has following. 1. Move to new SDK 0.1.2 2. remove the requirement for libgomp in fpga build

@FIR-707: Fix requirement for libgomp and move to new sdk 0.1.2

The chanegs have following 1. Enable profiling for tsavorite backed for txe 2. Add std c++20 for compiling the profiler The test results are as follows root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin# ./run_platform_test.sh Check if tnApcMgr is running; if it is not, uncomment below line and execute the run_platform_test.sh script. Running on v0.1.1.tsv30_05_24_2025 [2018-03-09 13:52:26.300409] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully. [2018-03-09 13:52:27.339] [info] [llama.cpp:56] Execution time: 1019 ms [2018-03-09 13:52:27.347638] 2909:2909 [ info] [LlamaForCausalLM_Random v. 2] TestBase.h:154: Model executed successfully. Validating result... [2018-03-09 13:52:27.380511] 2909:2909 [ info] [LlamaForCausalLM_Random v. 2] TestBase.h:193: PASS [relative err=0.000000, relTol=1.000000e-05] [2018-03-09 13:52:27.405665] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully. Profiling Results (LlamaForCausalLM_Random): ------------------------------------------------------------------------------------------------------------------------ Calls Total(ms) T/call Self(ms) Function ------------------------------------------------------------------------------------------------------------------------ 243 498.000 2.049 0.000 [45%] RuntimeHostShim::awaitCommandListCompletion 84 200.688 2.389 200.688 └─ [18%] [ txe_blob_1 ] 32 76.626 2.395 76.626 └─ [ 7%] [ txe_blob_6 ] 16 55.493 3.468 55.493 └─ [ 5%] [ txe_blob_12 ] 8 31.821 3.978 31.821 └─ [ 3%] [ txe_blob_10 ] 8 31.322 3.915 31.322 └─ [ 3%] [ txe_blob_7 ] 8 31.152 3.894 31.152 └─ [ 3%] [ txe_blob_8 ] 8 27.693 3.462 27.693 └─ [ 2%] [ txe_blob_9 ] 17 26.019 1.531 26.019 └─ [ 2%] [ txe_blob_2 ] 17 25.906 1.524 25.906 └─ [ 2%] [ txe_blob_5 ] 17 25.899 1.523 25.899 └─ [ 2%] [ txe_blob_3 ] 17 25.833 1.520 25.833 └─ [ 2%] [ txe_blob_4 ] 8 23.993 2.999 23.993 └─ [ 2%] [ txe_blob_11 ] 3 6.002 2.001 6.002 └─ [ 1%] [ txe_blob_0 ] 1 35.000 35.000 35.000 [ 3%] RuntimeHostShim::finalize 188 33.000 0.176 33.000 [ 3%] RuntimeHostShim::copy 1 16.000 16.000 16.000 [ 1%] RuntimeHostShim::initialize 13 1.000 0.077 1.000 [ 0%] RuntimeHostShim::loadBlob 573 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate 573 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate 243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList 922 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager 243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob 243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList 243 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList 13 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob 33 0.000 0.000 0.000 [ 0%] RuntimeHostShim::stridedCopy ======================================================================================================================== 3532 1116.000 0.316 1116.000 [100%] TOTAL ======================================================================================================================== register_backend: registered backend Tsavorite (1 devices) register_device: registered device Tsavorite (txe) register_backend: registered backend CPU (1 devices) register_device: registered device CPU (CPU) load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin/tsi-ggml/libggml-tsavorite.so load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin/tsi-ggml/libggml-cpu.so build: 5464 (194fbaa) with gcc (GCC) 13.3.0 for x86_64-pc-linux-gnu (debug) main: llama backend init main: load the model and apply lora adapter, if any TXE Device MEMORY Summary total 134217728 and free 134217728 llama_model_load_from_file_impl: using device Tsavorite (txe) - 128 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 75 tensors from /tsi/anoop_feb26/tinyllama-vo-5m-para.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Vicuna Hf llama_model_loader: - kv 3: general.size_label str = 4.6M llama_model_loader: - kv 4: general.license str = apache-2.0 llama_model_loader: - kv 5: llama.block_count u32 = 8 llama_model_loader: - kv 6: llama.context_length u32 = 2048 llama_model_loader: - kv 7: llama.embedding_length u32 = 64 llama_model_loader: - kv 8: llama.feed_forward_length u32 = 256 llama_model_loader: - kv 9: llama.attention.head_count u32 = 16 llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 11: general.file_type u32 = 32 llama_model_loader: - kv 12: llama.vocab_size u32 = 32000 llama_model_loader: - kv 13: llama.rope.dimension_count u32 = 4 llama_model_loader: - kv 14: tokenizer.ggml.model str = llama llama_model_loader: - kv 15: tokenizer.ggml.pre str = default llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 23: general.quantization_version u32 = 2 llama_model_loader: - type f32: 17 tensors llama_model_loader: - type bf16: 58 tensors print_info: file format = GGUF V3 (latest) print_info: file type = BF16 print_info: file size = 8.82 MiB (16.00 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 3 load: token to piece cache size = 0.1914 MB print_info: arch = llama print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 64 print_info: n_layer = 8 print_info: n_head = 16 print_info: n_head_kv = 16 print_info: n_rot = 4 print_info: n_swa = 0 print_info: n_swa_pattern = 1 print_info: n_embd_head_k = 4 print_info: n_embd_head_v = 4 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 64 print_info: n_embd_v_gqa = 64 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 256 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 10000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = ?B print_info: model params = 4.62 M print_info: general.name = Vicuna Hf print_info: vocab type = SPM print_info: n_vocab = 32000 print_info: n_merges = 0 print_info: BOS token = 1 '<s>' print_info: EOS token = 2 '</s>' print_info: UNK token = 0 '<unk>' print_info: PAD token = 0 '<unk>' print_info: LF token = 13 '<0x0A>' print_info: EOG token = 2 '</s>' print_info: max token length = 18 load_tensors: loading model tensors, this can take a while... (mmap = true) TXE Device MEMORY Summary total 134217728 and free 134217728 load_tensors: offloading 0 repeating layers to GPU load_tensors: offloaded 0/9 layers to GPU load_tensors: CPU_Mapped model buffer size = 8.82 MiB .............. llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 12288 llama_context: n_ctx_per_seq = 12288 llama_context: n_batch = 1024 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: freq_base = 10000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (12288) > n_ctx_train (2048) -- possible training context overflow [2018-03-09 13:52:28.706203] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully. llama_context: CPU output buffer size = 0.12 MiB llama_kv_cache_unified: CPU KV buffer size = 24.00 MiB llama_kv_cache_unified: size = 24.00 MiB ( 12288 cells, 8 layers, 1 seqs), K (f16): 12.00 MiB, V (f16): 12.00 MiB ggml_backend_tsavorite_buffer_type_alloc_buffer is called from llama data Loader ANoop Allocating memory from tsi_alloc with size 266240 Allocating memory from tsi_alloc with size 266240 starting memory 0xffff93e00080 Address of Newly Created BUffer 0xffff93e00080 and size 266240 llama_context: tsavorite compute buffer size = 0.25 MiB llama_context: CPU compute buffer size = 408.51 MiB llama_context: graph nodes = 294 llama_context: graph splits = 67 (with bs=512), 37 (with bs=1) common_init_from_params: setting dry_penalty_last_n to ctx_size = 12288 main: llama threadpool init, n_threads = 4 main: model was trained on only 2048 context tokens (12288 specified) system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : NEON = 1 | ARM_FMA = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | sampler seed: 177927434 sampler params: repeat_last_n = 5, repeat_penalty = 1.500, frequency_penalty = 0.000, presence_penalty = 0.000 dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 12288 top_k = 50, top_p = 0.900, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.000 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist generate: n_ctx = 12288, n_batch = 1024, n_predict = 10, n_keep = 1 my cat's name was Tim. He loved to play with his toy llama_perf_sampler_print: sampling time = 195.98 ms / 16 runs ( 12.25 ms per token, 81.64 tokens per second) llama_perf_context_print: load time = 1577.27 ms llama_perf_context_print: prompt eval time = 305.19 ms / 6 tokens ( 50.86 ms per token, 19.66 tokens per second) llama_perf_context_print: eval time = 803.59 ms / 9 runs ( 89.29 ms per token, 11.20 tokens per second) llama_perf_context_print: total time = 2628.44 ms / 15 tokens TXE_ADD Operation, total tensor: 10 Number of Kernel Call: 10 Number of tensor got spilt: 0 Min Num of Elem 64 Max Num of Elem 64 TXE_SUB Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_MULT Operation, total tensor: 170 Number of Kernel Call: 245 Number of tensor got spilt: 0 Min Num of Elem 64 Max Num of Elem 384 TXE_DIV Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_SQRT Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_NEG Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_ABS Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_SIN Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_SIGMOID Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 [2018-03-09 13:52:32.222949] 271:272 [ info] :: </proj/work/atrivedi/workspace/05_25_2025/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully. GGML Tsavorite Profiling Results: ------------------------------------------------------------------------------------------------------------------------ Calls Total(ms) T/call Self(ms) Function ------------------------------------------------------------------------------------------------------------------------ 255 255.000 1.000 0.000 [ 7%] RuntimeHostShim::awaitCommandListCompletion 245 379.466 1.549 379.466 └─ [11%] [ txe_mult_blob ] 10 15.443 1.544 15.443 └─ [ 0%] [ txe_add_blob ] 1 35.000 35.000 35.000 [ 1%] RuntimeHostShim::finalize 1 19.000 19.000 2.000 [ 1%] GGML Tsavorite 1 17.000 17.000 17.000 └─ [ 0%] RuntimeHostShim::initialize 256 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate 1020 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager 255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList 255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::loadBlob 255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob 255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList 255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList 255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob 255 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate ======================================================================================================================== 3318 3529.000 1.064 3529.000 [100%] TOTAL ======================================================================================================================== root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv30_05_24_2025/bin#

Fir 709 - gGGML: Adding SILU Kernel

as follows /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: /proj/work/atrivedi/workspace/06_02_2025/llama.cpp/ggml-tsi-kernel/fpga/host/host_abs.o: in function `txe_abs_host': LLVMDialectModule:(.text+0x18): undefined reference to `tsi_alloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x24): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x30): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x3c): undefined reference to `tsi_create_command_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x58): undefined reference to `tsi_load_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x64): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x70): undefined reference to `tsi_launch_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x7c): undefined reference to `tsi_add_command_to_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x84): undefined reference to `tsi_finalize_command_list' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x8c): undefined reference to `tsi_wait' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x94): undefined reference to `tsi_unload_blob' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0xa0): undefined reference to `tsi_dealloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: /proj/work/atrivedi/workspace/06_02_2025/llama.cpp/ggml-tsi-kernel/fpga/host/host_add.o: in function `txe_add_host': LLVMDialectModule:(.text+0x20): undefined reference to `tsi_alloc' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x2c): undefined reference to `tsi_shmem_handle_from_ptr' /proj/rel/sw/arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-linux-gnu/bin/../lib/gcc/aarch64-none-linux-gnu/14.2.1/../../../../aarch64-none-linux-gnu/bin/ld: LLVMDialectModule:(.text+0x38): undefined reference to `tsi_shmem_handle_from_ptr'

runtime/utils/lib/ path

FIR-714: Updated the SDK Release r0.1.3

FIR 722 --- ggml-tsi-kernel latest changes updated

This is a first version of FlaskInterface tool with following 1. Xterm Interface via Browser via /terminal endpoint 2. Serial console interface via Browser via /serial endpoint

@FIR-715: Added FlaskInterface tool for serial port

Just testing my first git pull

Llama.cpp: Webserver & HTML pages support

:@FIR-733 - Lllama.cpp: Webserver, add JOB status support for Model

@FIR-731 - serial_script.py changes to identify end of output

This commit has two changes 1. Added another endpoint llama-cli to invole the run_platform_test.sh directly 2. Updated reading of output to byte by byte to identify marking prompt and exit when the marker is seen

@FIR-737: Added another endpoint llama-cli t invoke directly in URL

run_platform_test.sh Co-authored-by: Ashish Trivedi <[email protected]>

@FIR-736 - lama.cpp: Disable all logs except token generation log

Co-authored-by: Lewis Lui <[email protected]>

…path (#16) The changes are as follows 1. change directory to right folder before running the commands 2. Add system-info and txe-restart functionlity Co-authored-by: Ashish Trivedi <[email protected]>

@FIR-720--GGML: Add TMU(MAT_MUL) kernel

* @FIR-754: Added all parameter parsing for the llama-cli The test results are as follows Model Response cd /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/; ./run_llama_cli.sh "My cat's name" " 50 tinyllama-vo-5m-para.gguf tSavorite 1.5 1024 50 0.9 5 12288 0.0 [2018-03-09 13:03:17.788243] 271:272 [[32m info[m] :: </proj/work/mmankali/bld-setuptest/tsirel-31/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:129> TXE resource allocation request processed successfully. My cat's name was Tim. He loved to play with his toy car. He would run and jump in the park, making loud noises. Tim was very happy with his new toy car. One day, Tim's mom said, "Tim. You llama_perf_sampler_print: sampling time = 999.96 ms / 56 runs ( 17.86 ms per token, 56.00 tokens per second)llama_perf_context_print: load time = 1713.55 ms llama_perf_context_print: prompt eval time = 603.51 ms / 6 tokens ( 100.58 ms per token, 9.94 tokens per second) llama_perf_context_print: eval time = 7069.36 ms / 49 runs ( 144.27 ms per token, 6.93 tokens per second) llama_perf_context_print: total time = 10046.17 ms / 55 tokens [2018-03-09 13:03:28.875126] 271:272 [[32m info[m] :: </proj/work/mmankali/bld-setuptest/tsirel-31/tsi_yocto_workspace/tsi-apc-manager/platform/rsm_mgr/rsm_process_req.c:145> TXE resource release request processed successfully. GGML Tsavorite Profiling Results: ------------------------------------------------------------------------------------------------------------------------ Calls Total(ms) T/call Self(ms) Function ------------------------------------------------------------------------------------------------------------------------ 2715 2720.000 1.002 0.000 [25%] RuntimeHostShim::awaitCommandListCompletion 1740 2635.984 1.515 2635.984 └─ [24%] [ txe_silu ] 925 1379.715 1.492 1379.715 └─ [12%] [ txe_mult ] 50 74.450 1.489 74.450 └─ [ 1%] [ txe_add ] 2715 0.448 0.000 0.448 └─ [ 0%] TXE 0 Idle 1 34.000 34.000 34.000 [ 0%] RuntimeHostShim::finalize 1 16.000 16.000 1.000 [ 0%] GGML Tsavorite 1 15.000 15.000 15.000 └─ [ 0%] RuntimeHostShim::initialize 2716 0.000 0.000 0.000 [ 0%] RuntimeHostShim::allocate 9120 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager 2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList 2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::loadBlob 2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob 2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::addCommandToList 2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::finalizeCommandList 2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob 2715 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate ======================================================================================================================== 33558 11098.000 0.331 11098.000 [100%] TOTAL ======================================================================================================================== ⟵ Back to Form The URL used is as follows http://10.50.0.124:5003/llama-cli?model=tiny-llama&backend=tSavorite&tokens=10&prompt=My+cat%27s+name&repeat-penalty=1.5&batch-size=1024&top-k=50&top-p=0.9&last-n=5&context-length=12288&temp=0.0 * @FIR-754: Addressed review comments. --------- Co-authored-by: Ashish Trivedi <[email protected]>

#20) The test results with ./run_llama_cli.sh with 5 tokens is as follows +++ root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin# ./run_llama_cli.sh my cat's name is Max. He' llama_perf_sampler_print: sampling time = 111.70 ms / 11 runs ( 10.15 ms per token, 98.47 tokens per second)llama_perf_context_print: load time = 132926.48 ms llama_perf_context_print: prompt eval time = 109957.33 ms / 6 tokens (18326.22 ms per token, 0.05 tokens per second) llama_perf_context_print: eval time = 195682.91 ms / 4 runs (48920.73 ms per token, 0.02 tokens per second) llama_perf_context_print: total time = 328764.01 ms / 10 tokens GGML Tsavorite Profiling Results: ------------------------------------------------------------------------------------------------------------------------ Calls Total(ms) T/call Self(ms) Function ------------------------------------------------------------------------------------------------------------------------ 33160 100086.000 3.018 47907.157 [32%] RuntimeHostShim::awaitCommandListCompletion 18920 29912.952 1.581 29912.952 └─ [10%] [ txe_silu ] 14080 22010.102 1.563 22010.102 └─ [ 7%] [ txe_mult ] 160 253.071 1.582 253.071 └─ [ 0%] [ txe_add ] 33160 1.178 0.000 1.178 └─ [ 0%] TXE 0 Idle 1 114.000 114.000 18.000 [ 0%] GGML Tsavorite 1 96.000 96.000 96.000 └─ [ 0%] RuntimeHostShim::initialize 1 52.000 52.000 52.000 [ 0%] RuntimeHostShim::finalize 33160 26.000 0.001 26.000 [ 0%] RuntimeHostShim::loadBlob 33160 23.000 0.001 23.000 [ 0%] RuntimeHostShim::finalizeCommandList 33160 5.000 0.000 5.000 [ 0%] RuntimeHostShim::addCommandToList 33161 3.000 0.000 3.000 [ 0%] RuntimeHostShim::allocate 33160 3.000 0.000 3.000 [ 0%] RuntimeHostShim::createCommandList 113720 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager 33160 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob 33160 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob 33160 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate ======================================================================================================================== 412163 308849.000 0.749308849.000 [100%] TOTAL ======================================================================================================================== root@agilex7_dk_si_agf014ea:/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin# +++

Anoop Kapoor and others added 30 commits May 23, 2025 22:13

@FIR-702 - llama.cpp: Sync with latest opensource

bb1f981

Releasing next version

6995385

Updated MLIR_SDK_VERSION version

6841096

Updated the Version

1a1514a

Merge pull request #1 from tsisw/FIR-702

ca06b4d

@FR-702 @FIR-702 - llama.cpp: Sync with latest opensource

@FIR-707: Fix requirement for libgomp and move to new sdk 0.1.2

d9dd83c

This change has following. 1. Move to new SDK 0.1.2 2. remove the requirement for libgomp in fpga build

Merge pull request #2 from tsisw/FIR-707

9a1440f

@FIR-707: Fix requirement for libgomp and move to new sdk 0.1.2

@FIR-709 - GGML: Adding SILU Kernel

9d65b92

@FIR-709: Fixed the script

f919789

Merge pull request #4 from tsisw/FIR-709

614dab8

Fir 709 - gGGML: Adding SILU Kernel

@FIR-714: Updated SDK version to r0.1.3 version

9459c0c

@FIR-714: Updated TLIBS to be passed to llama_build function

c18585c

@FIR-714: Updated to use 1.30 external dependencies

47ceff0

@FIR-714: Fixed the issues of not finding fpga libs using

cea50af

runtime/utils/lib/ path

Merge pull request #5 from tsisw/FIR-714

a7b7e46

FIR-714: Updated the SDK Release r0.1.3

Merge pull request #5 from tsisw/FIR-714

d4484c5

FIR-714: Updated the SDK Release r0.1.3

Updated README

bbecb01

@FIR-722 --updating the latest changes for ggml-tsi-kernel code

d7685c7

Merge pull request #6 from tsisw/FIR-722

17d0984

FIR 722 --- ggml-tsi-kernel latest changes updated

@FIR-715: Added FlaskInterface tool for serial port

9688963

This is a first version of FlaskInterface tool with following 1. Xterm Interface via Browser via /terminal endpoint 2. Serial console interface via Browser via /serial endpoint

Merge pull request #7 from tsisw/FIR-715

c369a62

@FIR-715: Added FlaskInterface tool for serial port

Merge branch 'master' of github.com:tsisw/llama.cpp

77a3e26

Just testing my first git pull

Just wanted to see if I could push. Added one comment

a4b77bf

@FIR-732 - Llama.cpp: Webserver & HTML pages support

21ba6d1

@FIR-732: Added print back to ensure stdout has data

597f928

Merge pull request #8 from tsisw/FIR-732

ce31089

Llama.cpp: Webserver & HTML pages support

@FIR-733 - Lllama.cpp: Webserver, add JOB status support for Model

8a5ffff

removing commented code

52ae0e9

akapoor3518 and others added 19 commits June 11, 2025 21:52

Merge pull request #9 from tsisw/FIR-733

f1dcd89

:@FIR-733 - Lllama.cpp: Webserver, add JOB status support for Model

@FIR-731 - serial_script.py changes to identify end of output

ffe045a

Some more changes to address the comments

3211f60

Removed a comment

a411fd9

Merge pull request #10 from tsisw/FIR-733

ca783a3

@FIR-731 - serial_script.py changes to identify end of output

@FIR-737: Added another endpoint llama-cli t invoke directly in URL

41d98b7

This commit has two changes 1. Added another endpoint llama-cli to invole the run_platform_test.sh directly 2. Updated reading of output to byte by byte to identify marking prompt and exit when the marker is seen

Merge pull request #11 from tsisw/FIR-737

1b474f4

@FIR-737: Added another endpoint llama-cli t invoke directly in URL

@FIR-738: Updated the run_llama_cli to be run instead of (#12)

2aeae8f

run_platform_test.sh Co-authored-by: Ashish Trivedi <[email protected]>

@FIR-736 - lama.cpp: Disable all logs except token generation log

52e4a58

Merge pull request #13 from tsisw/FIR-736

66c3743

@FIR-736 - lama.cpp: Disable all logs except token generation log

Changed run_platform_test.sh to run_llama_cli.sh (#14)

6191598

Co-authored-by: Lewis Lui <[email protected]>

@FIR-748: Added endpoints for health, sysinfo, upload and restart (#15)

cd734f0

@FIR-742: Add system-info, txe-restart functionality and cd to right …

f53f23c

…path (#16) The changes are as follows 1. change directory to right folder before running the commands 2. Add system-info and txe-restart functionlity Co-authored-by: Ashish Trivedi <[email protected]>

@FIR-720 - GGML: Add TMU(MAT_MUL) kernel

1a9ba9d

Merge pull request #17 from tsisw/FIR-720

6047d7a

@FIR-720--GGML: Add TMU(MAT_MUL) kernel

@FIR-756: Removed the echo of command in flask output (#19)

f5713b3

FIR 760- Integrate copy2fpga file transfer into tsisw/llama.cpp

ef9f7bd

mmankal closed this Jun 20, 2025

mmankal deleted the integrate-copy2fpga-filetransfer branch June 20, 2025 15:08

mmankal restored the integrate-copy2fpga-filetransfer branch June 20, 2025 15:11

github-actions bot added documentation Improvements or additions to documentation build Compilation issues testing Everything test related examples python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jun 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate copy2fpga filetransfer #14306

Integrate copy2fpga filetransfer #14306

Uh oh!

mmankal commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Integrate copy2fpga filetransfer #14306

Integrate copy2fpga filetransfer #14306

Uh oh!

Conversation

mmankal commented Jun 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants