@@ -15,7 +15,30 @@ make LLAMA_PIM=1
1515Prepare your model files as the original README.md shows. A 4-bit-quantified model in gguf format is prefered.
1616
1717```
18- ./llama-cli -m /mnt/LLM-models/chinese-alpaca-2-7b/gguf/chinese-alpaca-7b_q4_0.gguf --temp 0 -t 1 --no-warmup -p "列举5个北京经典美食。只列举名字,不要介绍。"
18+ ./llama-cli -m /mnt/LLM-models/chinese-alpaca-2-7b/gguf/chinese-alpaca-7b_q4_0.gguf \
19+ --temp 0 -t 1 --no-warmup -p "列举5个北京经典美食。只列举名字,不要介绍。"
20+ ```
21+
22+ Which may output:
23+ ``` shell
24+ ...
25+ sampler seed: 4294967295
26+ sampler params:
27+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
28+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
29+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
30+ sampler chain: logits -> logit-bias -> penalties -> greedy
31+ generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1
32+
33+ 列举5个北京经典美食。只列举名字,不要介绍。1. 烤鸭 2. 炸酱面 3. 豆汁 4. 羊蝎子 5. 驴打滚 [end of text]
34+
35+
36+ llama_perf_sampler_print: sampling time = 1.02 ms / 49 runs ( 0.02 ms per token, 47804.88 tokens per second)
37+ llama_perf_context_print: load time = 4097.04 ms
38+ llama_perf_context_print: prompt eval time = 2966.36 ms / 16 tokens ( 185.40 ms per token, 5.39 tokens per second)
39+ llama_perf_context_print: eval time = 12105.60 ms / 32 runs ( 378.30 ms per token, 2.64 tokens per second)
40+ llama_perf_context_print: total time = 16206.10 ms / 48 tokens
41+
1942```
2043
2144## 3. llama-ts for tensor test
@@ -40,8 +63,8 @@ There are several macros defined in `include/llama.h` that controls the bahavior
4063
4164```c++
4265#ifdef PIM_KERNEL
43- #define NR_DPUS 64
44- #define NR_LAYER 2
66+ #define NR_DPUS 64 //Number of DPUs to execute the kernel
67+ #define NR_LAYER 2 //Number of transformer layers to offload
4568#define DPU_BINARY "./dpu/gemv_dpu"
4669...
4770#endif // PIM_KERNEL
@@ -53,4 +76,4 @@ The PIM binary `dpu/gemv_dpu` is built from `dpu/dpu_main.c` by typing:
5376cd dpu
5477./pim_build.sh
5578```
56- So check ` dpu/dpu_main.c ` to find out how the kernel is implemented.
79+ Check ` dpu/dpu_main.c ` to find out how the kernel is implemented.
0 commit comments