Skip to content

Commit 35f1d84

Browse files
authored
Update README.md
1 parent be85af8 commit 35f1d84

File tree

1 file changed

+4
-49
lines changed

1 file changed

+4
-49
lines changed

examples/openvino/README.md

Lines changed: 4 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,10 @@ Below is the layout of the `examples/openvino` directory, which includes the nec
99
```
1010
examples/openvino
1111
├── README.md # Documentation for examples (this file)
12-
└── aot_optimize_and_infer.py # Example script to export and execute models
12+
├── aot_optimize_and_infer.py # Example script to export and execute models
13+
└── llama
14+
├── README.md # Documentation for Llama example
15+
└── llama3_2_ov_4wo.yaml # Configuration file for exporting Llama3.2 with OpenVINO backend
1316
```
1417

1518
# Build Instructions for Examples
@@ -183,51 +186,3 @@ Run inference with a given model for 10 iterations:
183186
--model_path=model.pte \
184187
--num_executions=10
185188
```
186-
187-
# Export Llama with OpenVINO Backend
188-
189-
## Download the Model
190-
Follow the [instructions](../../examples/models/llama#step-2-prepare-model) to download the required model files. Export Llama with OpenVINO backend is only verified with Llama-3.2-1B variants at this time.
191-
192-
## Environment Setup
193-
Follow the [instructions](../../backends/openvino/README.md) of **Prerequisites** and **Setup** in `backends/openvino/README.md` to set up the OpenVINO backend.
194-
195-
## Export the model:
196-
Execute the commands below to export the model. Update the model file paths to match the location where your model is downloaded.
197-
198-
```
199-
LLAMA_CHECKPOINT=<path/to/model/folder>/consolidated.00.pth
200-
LLAMA_PARAMS=<path/to/model/folder>/params.json
201-
LLAMA_TOKENIZER=<path/to/model/folder>/tokenizer.model
202-
203-
python -u -m examples.models.llama.export_llama \
204-
--model "llama3_2" \
205-
--checkpoint "${LLAMA_CHECKPOINT:?}" \
206-
--params "${LLAMA_PARAMS:?}" \
207-
-kv \
208-
--openvino \
209-
-d fp32 \
210-
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
211-
--output_name="llama.pte" \
212-
--verbose \
213-
--disable_dynamic_shape \
214-
--tokenizer_path "${LLAMA_TOKENIZER:?}" \
215-
--nncf_compression
216-
```
217-
218-
## Build OpenVINO C++ Runtime with Llama Runner:
219-
First, build the backend libraries by executing the script below in `<executorch_root>/backends/openvino/scripts` folder:
220-
```bash
221-
./openvino_build.sh
222-
```
223-
Then, build the llama runner by executing the script below (with `--llama_runner` argument) also in `<executorch_root>/backends/openvino/scripts` folder:
224-
```bash
225-
./openvino_build.sh --llama_runner
226-
```
227-
The executable is saved in `<executorch_root>/cmake-out/examples/models/llama/llama_main`
228-
229-
## Execute Inference Using Llama Runner
230-
Update the model tokenizer file path to match the location where your model is downloaded and replace the prompt.
231-
```
232-
./cmake-out/examples/models/llama/llama_main --model_path=llama.pte --tokenizer_path=<path/to/model/folder>/tokenizer.model --prompt="Your custom prompt"
233-
```

0 commit comments

Comments
 (0)