|
1 | 1 |
|
2 | | -LLAMA_CHECKPOINT=<model_directory>/consolidated.00.pth |
3 | | -LLAMA_PARAMS=<model_directory>/params.json |
4 | | -LLAMA_TOKENIZER=<model_directory>/tokenizer.model |
| 2 | +# Export Llama with OpenVINO Backend |
5 | 3 |
|
6 | | -python -m extension.llm.export.export_llm \ |
| 4 | +## Download the Model |
| 5 | +Follow the [instructions](../../examples/models/llama#step-2-prepare-model) to download the required model files. Export Llama with OpenVINO backend is only verified with Llama-3.2-1B variants at this time. |
| 6 | + |
| 7 | +## Environment Setup |
| 8 | +Follow the [instructions](../../backends/openvino/README.md) of **Prerequisites** and **Setup** in `backends/openvino/README.md` to set up the OpenVINO backend. |
| 9 | + |
| 10 | +## Export the model: |
| 11 | +Navigate into `<executorch_root>/examples/openvino/llama` and execute the commands below to export the model. Update the model file paths to match the location where your model is downloaded. |
| 12 | + |
| 13 | +``` |
| 14 | +LLAMA_CHECKPOINT=<path/to/model/folder>/consolidated.00.pth |
| 15 | +LLAMA_PARAMS=<path/to/model/folder>/params.json |
| 16 | +LLAMA_TOKENIZER=<path/to/model/folder>/tokenizer.model |
| 17 | +
|
| 18 | +python -m executorch.extension.llm.export.export_llm \ |
7 | 19 | --config llama3_2_ov_4wo_config.yaml \ |
8 | 20 | +base.model_class="llama3_2" \ |
9 | 21 | +base.checkpoint="${LLAMA_CHECKPOINT:?}" \ |
10 | 22 | +base.params="${LLAMA_PARAMS:?}" \ |
11 | | - +base.tokenizer_path="${LLAMA_TOKENIZER:?}" \ |
| 23 | + +base.tokenizer_path="${LLAMA_TOKENIZER:?}" |
| 24 | +``` |
| 25 | + |
| 26 | +## Build OpenVINO C++ Runtime with Llama Runner: |
| 27 | +First, build the backend libraries by executing the script below in `<executorch_root>/backends/openvino/scripts` folder: |
| 28 | +```bash |
| 29 | +./openvino_build.sh --cpp_runtime |
| 30 | +``` |
| 31 | +Then, build the llama runner by executing the script below (with `--llama_runner` argument) also in `<executorch_root>/backends/openvino/scripts` folder: |
| 32 | +```bash |
| 33 | +./openvino_build.sh --llama_runner |
| 34 | +``` |
| 35 | +The executable is saved in `<executorch_root>/cmake-out/examples/models/llama/llama_main` |
| 36 | + |
| 37 | +## Execute Inference Using Llama Runner |
| 38 | +Update the model tokenizer file path to match the location where your model is downloaded and replace the prompt. |
| 39 | +``` |
| 40 | +./cmake-out/examples/models/llama/llama_main --model_path=llama3_2.pte --tokenizer_path=<path/to/model/folder>/tokenizer.model --prompt="Your custom prompt" |
| 41 | +``` |
0 commit comments