You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/smartphones-and-mobile/Build-Llama3-Chat-Android-App-Using-Executorch-And-XNNPACK/5-Run-Benchmark-on-Android.md
+40-10Lines changed: 40 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,7 +97,7 @@ You should see your device listed to confirm it is connected.
Use the Llama runner to execute the model on the phone with the `adb` command:
108
108
109
109
```bash
110
-
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_128_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1
110
+
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_128_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1 --cpu_threads=5
111
111
```
112
112
113
113
The output should look something like this.
114
114
115
115
```
116
-
I 00:00:09.624421 executorch:stats.h:111] Prompt Tokens: 54 Generated Tokens: 73
117
-
I 00:00:09.624423 executorch:stats.h:117] Model Load Time: 3.464000 (seconds)
118
-
I 00:00:09.624425 executorch:stats.h:127] Total inference time: 2.871000 (seconds) Rate: 25.426681 (tokens/second)
119
-
I 00:00:09.624427 executorch:stats.h:135] Prompt evaluation: 0.202000 (seconds) Rate: 267.326733 (tokens/second)
I 00:00:09.624432 executorch:stats.h:154] Time to first generated token: 0.202000 (seconds)
122
-
I 00:00:09.624434 executorch:stats.h:161] Sampling time over 127 tokens: 0.110000 (seconds)
116
+
I 00:00:00.003002 executorch:main.cpp:69] Resetting threadpool with num threads = 5
117
+
I 00:00:00.009985 executorch:runner.cpp:59] Creating LLaMa runner: model_path=instruct_llama3_1B_kv_sdpa_xnn_qe_4_128_1024_embedding_4bit.pte, tokenizer_path=tokenizer.model
118
+
I 00:00:03.587895 executorch:runner.cpp:88] Reading metadata from model
119
+
I 00:00:03.587950 executorch:runner.cpp:113] Metadata: use_sdpa_with_kv_cache = 1
120
+
I 00:00:03.587954 executorch:runner.cpp:113] Metadata: use_kv_cache = 1
121
+
I 00:00:03.587957 executorch:runner.cpp:113] Metadata: get_vocab_size = 128256
122
+
I 00:00:03.587961 executorch:runner.cpp:113] Metadata: get_bos_id = 128000
123
+
I 00:00:03.587963 executorch:runner.cpp:113] Metadata: get_max_seq_len = 1024
124
+
I 00:00:03.587966 executorch:runner.cpp:113] Metadata: enable_dynamic_shape = 1
125
+
I 00:00:03.587969 executorch:runner.cpp:120] eos_id = 128009
126
+
I 00:00:03.587970 executorch:runner.cpp:120] eos_id = 128001
127
+
I 00:00:03.587972 executorch:runner.cpp:120] eos_id = 128006
128
+
I 00:00:03.587973 executorch:runner.cpp:120] eos_id = 128007
129
+
I 00:00:03.587976 executorch:runner.cpp:168] Doing a warmup run...
130
+
I 00:00:03.887806 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
131
+
132
+
I 00:00:04.325286 executorch:text_token_generator.h:118]
133
+
Reached to the end of generation
134
+
I 00:00:04.325299 executorch:runner.cpp:267] Warmup run finished!
135
+
I 00:00:04.325305 executorch:runner.cpp:174] RSS after loading model: 1269.320312 MiB (0 if unsupported)
136
+
<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>I 00:00:04.509909 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
137
+
"
138
+
139
+
I 00:00:04.510943 executorch:runner.cpp:243] RSS after prompt prefill: 1269.320312 MiB (0 if unsupported)
140
+
I'm doing well, thanks! I'm always up for helping out with any question or task you'd like assistance with. I'm a large language model, so I can provide information on a wide range of topics. What can I help you with today?<|eot_id|>
141
+
I 00:00:05.882562 executorch:text_token_generator.h:118]
142
+
Reached to the end of generation
143
+
144
+
I 00:00:05.882573 executorch:runner.cpp:257] RSS after finishing text generation: 1269.320312 MiB (0 if unsupported)
0 commit comments