Skip to content

Commit a379d13

Browse files
committed
Updated llama LP for new breaking changes
1 parent eed291f commit a379d13

File tree

1 file changed

+0
-19
lines changed

1 file changed

+0
-19
lines changed

content/learning-paths/servers-and-cloud-computing/llama-cpu/llama-chatbot.md

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -163,25 +163,6 @@ Each quantization method has a unique approach to quantizing parameters. The dee
163163

164164
In this guide, you will not use any other quantization methods, because Arm has not made kernel optimizations for other quantization types.
165165

166-
##
167-
168-
To see improvements for Arm optimized kernels, you need to generate a new weights file with rearranged Q4_0 weights. As of [llama.cpp commit 0f1a39f3](https://github.com/ggerganov/llama.cpp/commit/0f1a39f3), Arm has contributed code for three types of GEMV/GEMM kernels corresponding to three processor types:
169-
170-
* AWS Graviton2, where you only have NEON support (you will see less improvement for these GEMV/GEMM kernels),
171-
* AWS Graviton3, where the GEMV/GEMM kernels exploit both SVE 256 and MATMUL INT8 support, and
172-
* AWS Graviton4, where the GEMV/GEMM kernels exploit NEON/SVE 128 and MATMUL_INT8 support
173-
174-
To re-quantize optimally for Graviton3, run
175-
176-
```bash
177-
./llama-quantize --allow-requantize dolphin-2.9.4-llama3.1-8b-Q4_0.gguf dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf Q4_0_8_8
178-
```
179-
180-
This will output a new file, `dolphin-2.9.4-llama3.1-8b-Q4_0_8_8.gguf`, which contains reconfigured weights that allow `llama-cli` to use SVE 256 and MATMUL_INT8 support.
181-
182-
{{% notice Note %}}
183-
This requantization is optimal only for Graviton3. For Graviton2, requantization should optimally be done in `Q4_0_4_4` format, and for Graviton4, `Q4_0_4_8` is the optimal requantization format.
184-
{{% /notice %}}
185166

186167
## Run the pre-quantized Llama-3.1-8B LLM model weights on your Arm-based server
187168

0 commit comments

Comments
 (0)