You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/models/llama/README.md
+6-36Lines changed: 6 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Here are supported models:
6
6
- Llama 3.2 1B and 3B
7
7
- Llama 3.1 8B
8
8
- Llama 3 8B
9
-
- Llama 2 7B
9
+
-[Llama 2 7B](../llama2/README.md)
10
10
11
11
Pretrained models are not included in this repo. Users are suggested to download them [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
12
12
@@ -22,7 +22,7 @@ Please note that the models are subject to the [Llama 2 Acceptable Use Policy](h
22
22
23
23
# Results
24
24
25
-
Since Llama 2 7B or Llama 3 8B model needs at least 4-bit quantization to fit even within some of the highend phones, results presented here correspond to 4-bit groupwise post-training quantized model.
25
+
Since Llama 3 8B model needs at least 4-bit quantization to fit even within some of the highend phones, results presented here correspond to 4-bit groupwise post-training quantized model.
26
26
27
27
For Llama 3.2 1B/3B, we validated the models by running them in their original bf16 datatype and unquantized on both Android and iOS phones. The 3B version required high-end phones with larger RAMs to fit the model.
28
28
@@ -53,7 +53,6 @@ Below are the results for two different groupsizes, with max_seq_length 2048, an
Note that groupsize less than 128 was not enabled, since such models were still too large. This is because our current efforts have focused on enabling FP32 and support for FP16 is under way. What this implies for model size is that 1) embedding table is in FP32 and 2) quantized weights scales are FP32.
@@ -80,8 +79,6 @@ SpinQuant can generate quantized weights that are [compatible with ExecuTorch](h
80
79
81
80
For Llama 3 8B and Llama3.1 8B, we have verified so far on iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S24+ and OnePlus 12 (with 16GB RAM).
82
81
83
-
We have verified running Llama 2 7B [mobile applications](#step-6-build-mobile-apps) efficiently on select devices including the iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S22 and S24, and OnePlus 12.
84
-
85
82
## Performance
86
83
87
84
### Llama 3.2 1B and 3B
@@ -97,29 +94,21 @@ Llama 3.2 1B and 3B performance was measured on the OnePlus 12 device. The perfo
97
94
### Llama3 8B and Llama3.1 8B
98
95
Llama 3 8B performance was measured on the Samsung Galaxy S22, S24, and OnePlus 12 devices. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-5-run-benchmark-on).
99
96
100
-
Note that since Llama3's vocabulary size is 4x that of Llama2, we had to quantize embedding lookup table as well. For these results embedding lookup table was groupwise quantized with 4-bits and group size of 32.
97
+
Due to Llama3's vocabulary size, we had to quantize embedding lookup table as well. For these results embedding lookup table was groupwise quantized with 4-bits and group size of 32.
Llama 2 7B performance was measured on the Samsung Galaxy S22, S24, and OnePlus 12 devices. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-5-run-benchmark-on).
### Option D: Download and export Llama 2 7B model
212
-
213
-
You can export and run the original Llama 2 7B model.
214
-
215
-
1. Llama 2 pretrained parameters can be downloaded from [Meta's official website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) or from [Hugging Face](https://huggingface.co/meta-llama/Llama-2-7b).
216
-
217
-
2. Edit `params.json` file. Replace `"vocab_size": -1` with `"vocab_size": 32000`. This is a short-term workaround.
### Option E: Download models from Hugging Face and convert from safetensor format to state dict
200
+
### Option D: Download models from Hugging Face and convert from safetensor format to state dict
229
201
230
202
231
203
You can also download above models from [Hugging Face](https://huggingface.co/). Since ExecuTorch starts from a PyTorch model, a script like below can be used to convert the Hugging Face safetensors format to PyTorch's state dict. It leverages the utils provided by [TorchTune](https://github.com/pytorch/torchtune).
@@ -348,8 +320,6 @@ Note for Mac users: There's a known linking issue with Xcode 15.1. Refer to the
For Llama2, please see the [Llama README page](../llama/README.md) for details.
2
+
For Llama enablement, please see the [Llama README page](../llama/README.md) for complete details.
3
+
4
+
This page contains Llama2 specific instructions and information.
5
+
6
+
7
+
## Enablement
8
+
9
+
We have verified running Llama 2 7B [mobile applications](#step-6-build-mobile-apps) efficiently on select devices including the iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S22 and S24, and OnePlus 12.
10
+
11
+
Since Llama 2 7B needs at least 4-bit quantization to fit even within some of the highend phones, results presented here correspond to 4-bit groupwise post-training quantized model.
12
+
13
+
## Results
14
+
15
+
### Llama2 7B
16
+
Llama 2 7B performance was measured on the Samsung Galaxy S22, S24, and OnePlus 12 devices. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-5-run-benchmark-on).
Below are the results for two different groupsizes, with max_seq_length 2048, and limit 1000, based on WikiText perplexity using [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness).
You can export and run the original Llama 2 7B model.
33
+
34
+
1. Llama 2 pretrained parameters can be downloaded from [Meta's official website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) or from [Hugging Face](https://huggingface.co/meta-llama/Llama-2-7b).
35
+
36
+
2. Edit `params.json` file. Replace `"vocab_size": -1` with `"vocab_size": 32000`. This is a short-term workaround.
0 commit comments