You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/smartphones-and-mobile/Build-Llama3-Chat-Android-App-Using-Executorch-And-XNNPACK/5-Run-Benchmark-on-Android.md
+22-53Lines changed: 22 additions & 53 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,14 +15,14 @@ Cross-compile Llama runner to run on Android using the steps below.
15
15
Set the environment variable to point to the Android NDK.
For Llama 2, you need to convert the `tokenizer.model` into a `.bin` file.
104
-
{{% /notice %}}
105
99
106
100
### 3. Run the model
107
101
108
102
Use the Llama runner to execute the model on the phone with the `adb` command:
109
103
110
104
```bash
111
-
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3.2_bl256_maxlen1024.pte --tokenizer_path tokenizer.model --prompt \"Once upon a time\" --seq_len 120"
105
+
adb shell "cd /data/local/tmp/llama && ./llama_main --model_path llama3_1B_kv_sdpa_xnn_qe_4_128_1024_embedding_4bit.pte --tokenizer_path tokenizer.model --prompt "<|start_header_id|>system<|end_header_id|>\nYour name is Cookie. you are helpful, polite, precise, concise, honest, good at writing. You always give precise and brief answers up to 32 words<|eot_id|><|start_header_id|>user<|end_header_id|>\nHey Cookie! how are you today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>" --warmup=1
112
106
```
113
107
114
108
The output should look something like this.
115
109
116
110
```
117
-
I 00:00:00.014047 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
118
-
I 00:00:00.014534 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
119
-
I 00:00:00.014587 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 4
120
-
I 00:00:00.014606 executorch:main.cpp:69] Resetting threadpool with num threads = 4
121
-
I 00:00:00.023634 executorch:runner.cpp:65] Creating LLaMa runner: model_path=llama3.2_bl256_maxlen1024.pte, tokenizer_path=tokenizer.model
122
-
I 00:00:03.949516 executorch:runner.cpp:94] Reading metadata from model
123
-
I 00:00:03.949598 executorch:runner.cpp:119] Metadata: get_vocab_size = 128256
124
-
I 00:00:03.949607 executorch:runner.cpp:119] Metadata: get_bos_id = 128000
125
-
I 00:00:03.949611 executorch:runner.cpp:119] Metadata: use_sdpa_with_kv_cache = 1
126
-
I 00:00:03.949614 executorch:runner.cpp:119] Metadata: get_n_eos = 1
127
-
I 00:00:03.949618 executorch:runner.cpp:119] Metadata: append_eos_to_prompt = 0
128
-
I 00:00:03.949621 executorch:runner.cpp:119] Metadata: get_max_seq_len = 1024
129
-
I 00:00:03.949624 executorch:runner.cpp:119] Metadata: enable_dynamic_shape = 1
130
-
I 00:00:03.949626 executorch:runner.cpp:119] Metadata: use_kv_cache = 1
131
-
I 00:00:03.949629 executorch:runner.cpp:119] Metadata: get_n_bos = 1
132
-
I 00:00:03.949632 executorch:runner.cpp:126] eos_id = 128009
133
-
I 00:00:03.949634 executorch:runner.cpp:126] eos_id = 128001
134
-
I 00:00:03.949702 executorch:runner.cpp:180] RSS after loading model: 1223.152344 MiB (0 if unsupported)
135
-
Once upon a timeI 00:00:04.050916 executorch:text_prefiller.cpp:53] Prefill token result numel(): 128256
136
-
,I 00:00:04.052155 executorch:runner.cpp:249] RSS after prompt prefill: 1223.152344 MiB (0 if unsupported)
137
-
I was a business traveler, and I thought that a business trip was the greatest joy in the world.
138
-
I had a lovely time abroad in the States, which was, to me, the most wonderful thing in the world. And then I went back to my country to take care of my family and my business. It was the most wonderful thing in the world to me.
139
-
I 00:00:06.953141 executorch:runner.cpp:263] RSS after finishing text generation: 1223.152344 MiB (0 if unsupported)
140
-
I 00:00:06.954332 executorch:stats.h:97] Prompt Tokens: 5 Generated Tokens: 114
141
-
I 00:00:06.954380 executorch:stats.h:103] Model Load Time: 3.926000 (seconds)
142
-
I 00:00:06.954407 executorch:stats.h:113] Total inference time: 3.003000 (seconds) Rate: 37.962038 (tokens/second)
143
-
I 00:00:06.954446 executorch:stats.h:121] Prompt evaluation: 0.102000 (seconds) Rate: 49.019608 (tokens/second)
0 commit comments