diff --git a/examples/models/llama2/Android3_2_1B_bf16.gif b/examples/models/llama2/Android3_2_1B_bf16.gif new file mode 100644 index 00000000000..abe2b8278f6 Binary files /dev/null and b/examples/models/llama2/Android3_2_1B_bf16.gif differ diff --git a/examples/models/llama2/README.md b/examples/models/llama2/README.md index 4bc315f4bf0..4cd0ed7686d 100644 --- a/examples/models/llama2/README.md +++ b/examples/models/llama2/README.md @@ -24,13 +24,21 @@ Please note that the models are subject to the [Llama 2 Acceptable Use Policy](h Since Llama 2 7B or Llama 3 8B model needs at least 4-bit quantization to fit even within some of the highend phones, results presented here correspond to 4-bit groupwise post-training quantized model. -
-
+
+
- Running Llama3.1 8B on Android phone + Llama3.1 8B, 4bit quantized on Android phone - + |
+
+ + Llama3.2 1B, unquantized, bf16 on Android phone. + |
+