You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/smartphones-and-mobile/Build-Llama3-Chat-Android-App-Using-Executorch-And-XNNPACK/5-Run-Benchmark-on-Android.md
Use the Llama runner to execute the model on the phone with the `adb` command:
108
108
109
109
```bash
110
-
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_128_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1 --cpu_threads=5
110
+
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1 --cpu_threads=5
111
111
```
112
112
113
113
The output should look something like this.
114
114
115
115
```
116
-
I 00:00:00.003002 executorch:main.cpp:69] Resetting threadpool with num threads = 5
117
-
I 00:00:00.009985 executorch:runner.cpp:59] Creating LLaMa runner: model_path=instruct_llama3_1B_kv_sdpa_xnn_qe_4_128_1024_embedding_4bit.pte, tokenizer_path=tokenizer.model
118
-
I 00:00:03.587895 executorch:runner.cpp:88] Reading metadata from model
119
-
I 00:00:03.587950 executorch:runner.cpp:113] Metadata: use_sdpa_with_kv_cache = 1
120
-
I 00:00:03.587954 executorch:runner.cpp:113] Metadata: use_kv_cache = 1
121
-
I 00:00:03.587957 executorch:runner.cpp:113] Metadata: get_vocab_size = 128256
122
-
I 00:00:03.587961 executorch:runner.cpp:113] Metadata: get_bos_id = 128000
123
-
I 00:00:03.587963 executorch:runner.cpp:113] Metadata: get_max_seq_len = 1024
124
-
I 00:00:03.587966 executorch:runner.cpp:113] Metadata: enable_dynamic_shape = 1
125
-
I 00:00:03.587969 executorch:runner.cpp:120] eos_id = 128009
126
-
I 00:00:03.587970 executorch:runner.cpp:120] eos_id = 128001
127
-
I 00:00:03.587972 executorch:runner.cpp:120] eos_id = 128006
128
-
I 00:00:03.587973 executorch:runner.cpp:120] eos_id = 128007
129
-
I 00:00:03.587976 executorch:runner.cpp:168] Doing a warmup run...
130
-
I 00:00:03.887806 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
116
+
I 00:00:00.003316 executorch:main.cpp:69] Resetting threadpool with num threads = 5
117
+
I 00:00:00.009329 executorch:runner.cpp:59] Creating LLaMa runner: model_path=llama3_1B_kv_sdpa_xnn_qe_4_64_1024_embedding_4bit.pte, tokenizer_path=tokenizer.model
118
+
I 00:00:03.569399 executorch:runner.cpp:88] Reading metadata from model
119
+
I 00:00:03.569451 executorch:runner.cpp:113] Metadata: use_sdpa_with_kv_cache = 1
120
+
I 00:00:03.569455 executorch:runner.cpp:113] Metadata: use_kv_cache = 1
121
+
I 00:00:03.569459 executorch:runner.cpp:113] Metadata: get_vocab_size = 128256
122
+
I 00:00:03.569461 executorch:runner.cpp:113] Metadata: get_bos_id = 128000
123
+
I 00:00:03.569464 executorch:runner.cpp:113] Metadata: get_max_seq_len = 1024
124
+
I 00:00:03.569466 executorch:runner.cpp:113] Metadata: enable_dynamic_shape = 1
125
+
I 00:00:03.569469 executorch:runner.cpp:120] eos_id = 128009
126
+
I 00:00:03.569470 executorch:runner.cpp:120] eos_id = 128001
127
+
I 00:00:03.569471 executorch:runner.cpp:120] eos_id = 128006
128
+
I 00:00:03.569473 executorch:runner.cpp:120] eos_id = 128007
129
+
I 00:00:03.569475 executorch:runner.cpp:168] Doing a warmup run...
130
+
I 00:00:03.838634 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
131
131
132
-
I 00:00:04.325286 executorch:text_token_generator.h:118]
132
+
I 00:00:03.892268 executorch:text_token_generator.h:118]
133
133
Reached to the end of generation
134
-
I 00:00:04.325299 executorch:runner.cpp:267] Warmup run finished!
135
-
I 00:00:04.325305 executorch:runner.cpp:174] RSS after loading model: 1269.320312 MiB (0 if unsupported)
136
-
<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>I 00:00:04.509909 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
137
-
"
134
+
I 00:00:03.892281 executorch:runner.cpp:267] Warmup run finished!
135
+
I 00:00:03.892286 executorch:runner.cpp:174] RSS after loading model: 1269.445312 MiB (0 if unsupported)
136
+
<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>I 00:00:04.076905 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
138
137
139
-
I 00:00:04.510943 executorch:runner.cpp:243] RSS after prompt prefill: 1269.320312 MiB (0 if unsupported)
140
-
I'm doing well, thanks! I'm always up for helping out with any question or task you'd like assistance with. I'm a large language model, so I can provide information on a wide range of topics. What can I help you with today?<|eot_id|>
141
-
I 00:00:05.882562 executorch:text_token_generator.h:118]
138
+
139
+
I 00:00:04.078027 executorch:runner.cpp:243] RSS after prompt prefill: 1269.445312 MiB (0 if unsupported)
140
+
I'm doing great, thanks! I'm always happy to help, communicate, and provide helpful responses. I'm a bit of a cookie (heh) when it comes to delivering concise and precise answers. What can I help you with today?<|eot_id|>
141
+
I 00:00:05.399304 executorch:text_token_generator.h:118]
142
142
Reached to the end of generation
143
143
144
-
I 00:00:05.882573 executorch:runner.cpp:257] RSS after finishing text generation: 1269.320312 MiB (0 if unsupported)
0 commit comments