Skip to content

Commit ad63a2d

Browse files
authored
Create README.md
1 parent 3137889 commit ad63a2d

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
## Summary
2+
Phi-4-mini Instruct (3.8B) is a newly released version of the popular Phi-4 model developed by Microsoft.
3+
4+
## Instructions
5+
6+
Phi-4-mini uses the same example code as Llama, while the checkpoint, model params, and tokenizer are different. Please see the [Llama README page](../llama/README.md) for details.
7+
8+
All commands for exporting and running Llama on various backends should also be applicable to Phi-4-mini, by swapping the following args:
9+
```
10+
--model phi-4-mini
11+
--params examples/models/phi-4-mini/config.json
12+
--checkpoint <path-to-meta-checkpoint>
13+
```
14+
15+
### Generate the Checkpoint
16+
The original checkpoint can be obtained from HuggingFace:
17+
```
18+
huggingface-cli download microsoft/Phi-4-mini-instruct
19+
```
20+
21+
We then convert it to Meta's checkpoint format:
22+
```
23+
python examples/models/phi-4-mini/convert_weights.py <path-to-checkpoint-dir> <output-path>
24+
```
25+
26+
### Example export and run
27+
Here is an basic example for exporting and running Phi-4-mini, although please refer to [Llama README page](../llama/README.md) for more advanced usage.
28+
29+
Export to XNNPack, no quantization:
30+
```
31+
# No quantization
32+
# Set these paths to point to the downloaded files
33+
PHI_CHECKPOINT=path/to/checkpoint.pth
34+
35+
python -m examples.models.llama.export_llama \
36+
--model phi-4-mini \
37+
--checkpoint "${PHI_CHECKPOINT=path/to/checkpoint.pth:?}" \
38+
--params examples/models/phi-4-mini/config.json \
39+
-kv \
40+
--use_sdpa_with_kv_cache \
41+
-d fp32 \
42+
-X \
43+
--metadata '{"get_bos_id":151643, "get_eos_ids":[151643]}' \
44+
--output_name="phi-4-mini.pte"
45+
--verbose
46+
```
47+
48+
Run using the executor runner:
49+
```
50+
# Currently a work in progress, just need to enable HuggingFace json tokenizer in C++.
51+
# In the meantime, can run with an example Python runner with pybindings:
52+
53+
python -m examples.models.llama.runner.native
54+
--model phi-4-mini
55+
--pte <path-to-pte>
56+
-kv
57+
--tokenizer <path-to-tokenizer>/tokenizer.json
58+
--tokenizer_config <path-to_tokenizer>/tokenizer_config.json
59+
--prompt "What is in a california roll?"
60+
--params examples/models/phi-4-mini/config.json
61+
--max_len 64
62+
--temperature 0
63+
```

0 commit comments

Comments
 (0)