Skip to content

Commit 4bfdca9

Browse files
authored
Update README.md
1 parent f050eea commit 4bfdca9

File tree

1 file changed

+1
-16
lines changed

1 file changed

+1
-16
lines changed

examples/openvino/llama/README.md

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,7 @@ python -m executorch.extension.llm.export.export_llm \
2525
```
2626

2727
### Compress Model Weights and Export
28-
OpenVINO backend also offers Quantization support for llama models when exporting the model. The different quantization modes that are offered are INT4 groupwise & per-channel weights compression and INT8 per-channel weights compression. It can be achieved using the `--pt2e_quantize opevnino_4wo` flag. For modifying the group size `--group_size` can be used. By default group size 128 is used to achieve optimal performance with the NPU.
29-
30-
```
31-
LLAMA_CHECKPOINT=<path/to/model/folder>/consolidated.00.pth
32-
LLAMA_PARAMS=<path/to/model/folder>/params.json
33-
LLAMA_TOKENIZER=<path/to/model/folder>/tokenizer.model
34-
35-
python -m executorch.extension.llm.export.export_llm \
36-
--config llama3_2_ov_4wo.yaml \
37-
+backend.openvino.device="CPU" \
38-
+base.model_class="llama3_2" \
39-
+pt2e_quantize opevnino_4wo \
40-
+base.checkpoint="${LLAMA_CHECKPOINT:?}" \
41-
+base.params="${LLAMA_PARAMS:?}" \
42-
+base.tokenizer_path="${LLAMA_TOKENIZER:?}"
43-
```
28+
OpenVINO backend also offers Quantization support for llama models when exporting the model. The different quantization modes that are offered are INT4 groupwise & per-channel weights compression and INT8 per-channel weights compression. It can be achieved by setting `pt2e_quantize` option in `llama3_2_ov_4wo.yaml` file under `quantization`. Set this parameter to `openvino_4wo` for INT4 or `openvino_8wo` for INT8 weight compression. It is set to `openvino_4wo` in `llama3_2_ov_4wo.yaml` file by default. For modifying the group size, set `group_size` option in `llama3_2_ov_4wo.yaml` file under `quantization`. By default group size 128 is used to achieve optimal performance with the NPU.
4429

4530
## Build OpenVINO C++ Runtime with Llama Runner:
4631
First, build the backend libraries by executing the script below in `<executorch_root>/backends/openvino/scripts` folder:

0 commit comments

Comments
 (0)