You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/qualcomm/oss_scripts/llama/README.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,7 @@ This file provides you the instructions to run LLM Decoder model with different
8
8
4. QWEN2.5 0.5B
9
9
5. QWEN3 0.6B / 1.7B
10
10
6. Phi4-mini-instruct
11
+
7. SMOLLM2 135M
11
12
12
13
We offer the following modes to execute the model:
13
14
@@ -74,6 +75,12 @@ Default example using hybrid mode
74
75
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -s ${SERIAL_NUM} -m ${SOC_MODEL} --temperature 0 --model_mode hybrid --max_seq_len 1024 --prefill_ar_len 128 --ptq 16a8w --enable_masked_softmax --r3 --decoder_model qwen2_5 --prompt "I would like to learn python, could you teach me with a simple example?"
75
76
```
76
77
78
+
#### SMOLLM2
79
+
Default example using hybrid mode.
80
+
```bash
81
+
python examples/qualcomm/oss_scripts/llama/llama.py -b build-android -H mlgtw-linux -s ${SERIAL_NUM} -m ${SOC_MODEL} --ptq 16a8w --tokenizer_bin tokenizer.bin --decoder_model smollm2 --model_mode hybrid --prefill_ar_len 128 --max_seq_len 1024 --prompt "I would like to learn python, could you teach me with a simple example?"
82
+
```
83
+
77
84
### KV Cache update mechanism
78
85
We have two distinct mechanisms for updating the key-value (KV) cache, which can be selected at runtime. Shift Pointer and Smart Mask.
0 commit comments