You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- If your model shares embedding/unembedding weights (like Llama1B and Llama3B do), you can add `--use_shared_embedding` to take advantage of this and reduce memory. When this option is enabled, you can specify whether embeddings are quantized with weight zeros or not by specifying a third argument. For example, `-E "torchao:4,32,true"` means that the embedding is quantized to 4-bits with group_size=32 and uses weight zeros (this is the default behavior if you simply use `-E "torchao:4,32"`), whereas `-E "torchao:4,32,false"` means that the embedding is quantized to 4-bits with group_size=32, but is quantized with scales-only. If `--use_shared_embedding` is specified, the unembedding (i.e., the final linear layer) is quantized in the same way, but also uses 8-bit dynamically quantized activations.
417
+
- To do channelwise quantization, specify group_size to 0. This works for both linear and embedding layers.
418
+
415
419
Once the model is exported, we need to build ExecuTorch and the runner with the low-bit kernels.
416
420
417
421
The first step is to install ExecuTorch (the same as step 3.1 above):
Phi-4-mini Instruct (3.8B) is a newly released version of the popular Phi-4 model developed by Microsoft.
3
+
4
+
## Instructions
5
+
6
+
Phi-4-mini uses the same example code as Llama, while the checkpoint, model params, and tokenizer are different. Please see the [Llama README page](../llama/README.md) for details.
7
+
8
+
All commands for exporting and running Llama on various backends should also be applicable to Phi-4-mini, by swapping the following args:
9
+
```
10
+
--model phi_4_mini
11
+
--params examples/models/phi-4-mini/config.json
12
+
--checkpoint <path-to-meta-checkpoint>
13
+
```
14
+
15
+
### Generate the Checkpoint
16
+
The original checkpoint can be obtained from HuggingFace:
Here is an basic example for exporting and running Phi-4-mini, although please refer to [Llama README page](../llama/README.md) for more advanced usage.
28
+
29
+
Export to XNNPack, no quantization:
30
+
```
31
+
# No quantization
32
+
# Set these paths to point to the downloaded files
0 commit comments