You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/mediatek/README.md
+60-37Lines changed: 60 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,44 +26,61 @@ examples/mediatek
26
26
# Examples Build Instructions
27
27
28
28
## Environment Setup
29
-
- Follow the instructions of **Prerequisites** and **Setup**in `backends/mediatek/scripts/README.md`.
29
+
- Follow the instructions in `backends/mediatek/README.md` to build the backend library `libneuron_backend.so`.
30
30
31
-
- Build required libraries by `backends/mediatek/scripts/mtk_build.sh` before building examples.
32
-
33
-
## Build MediaTek Examples
34
-
1. Build the backend and the examples by exedcuting the script:
31
+
## Build MediaTek Runners
32
+
1. Build the mediatek model runner by executing the script:
35
33
```bash
36
34
./mtk_build_examples.sh
37
35
```
36
+
This will generate the required runners in `executorch/cmake-android-out/examples/mediatek/`
38
37
39
-
## LLaMa Example Instructions
38
+
## Model Export Instructions
40
39
##### Note: Verify that localhost connection is available before running AoT Flow
41
-
1. Exporting Models to `.pte`
42
-
- In the `examples/mediatek directory`, run:
40
+
1. Download Required Files
41
+
- Download the model files from the official Hugging Face website, and move the files to the respective folder in `examples/mediatek/models/llm_models/weights/`**EXCEPT** the `config.json` file.
42
+
- The `config.json` file is already included in the model folders, which may include some modifications required for the model exportation.
43
+
- Include the calibration data (if any) under `aot_utils/llm_utils/prompts/`
-`model_name` = Depends on model family. Check respective `shell_scripts/export_<model_family>.sh` for info.
48
52
-`num_chunks` = 4
49
53
-`prompt_num_tokens` = 128
50
-
-`cache_size` = 1024
51
-
-`calibration_set_name` = None
54
+
-`cache_size` = 512
55
+
-`calibration_data_file` = None
56
+
-`precision` = A16W4
57
+
-`platform` = DX4
58
+
52
59
- Argument Explanations/Options:
53
-
-`model_name`: llama2/llama3
54
-
<sub>**Note: Currently Only Tested on Llama2 7B Chat and Llama3 8B Instruct.**</sub>
55
-
-`num_chunks`: Number of chunks to split the model into. Each chunk contains the same number of decoder layers. Will result in `num_chunks` number of `.pte` files being generated. Typical values are 1, 2 and 4.
60
+
-`model_name`: View list 'Available model names' below.
61
+
-`num_chunks`: Number of chunks to split the model into. Each chunk contains the same number of decoder layers. Typical values are 1, 2 and 4.
56
62
-`prompt_num_tokens`: Number of tokens (> 1) consumed each forward pass for the prompt processing stage.
57
63
-`cache_size`: Cache Size.
58
-
-`calibration_set_name`: Name of calibration dataset with extension that is found inside the `aot_utils/llm_utils/prompts` directory. Example: `alpaca.txt`. If `"None"`, will use dummy data to calibrate.
64
+
-`calibration_data_file`: Name of calibration dataset with extension that is found inside the `aot_utils/llm_utils/prompts/` directory. Example: `alpaca.txt`. If `"None"`, will use dummy data to calibrate.
65
+
-`precision`: Quantization precision for the model. Available options are `["A16W4", "A16W8", "A16W16", "A8W4", "A8W8"]`
66
+
-`platform`: The platform of the device. `DX4` for Mediatek Dimensity 9400 and `DX3` for Mediatek Dimensity 9300.
59
67
<sub>**Note: Export script example only tested on `.txt` file.**</sub>
60
68
61
-
2.`.pte` files will be generated in `examples/mediatek/pte`
62
-
- Users should expect `num_chunks*2` number of pte files (half of them for prompt and half of them for generation).
63
-
- Generation `.pte` files have "`1t`" in their names.
64
-
- Additionally, an embedding bin file will be generated in the weights folder where the `config.json` can be found in. [`examples/mediatek/models/llm_models/weights/<model_name>/embedding_<model_config_folder>_fp32.bin`]
3.`.pte` files will be generated in `examples/mediatek/pte/`
80
+
- Users should expect `num_chunks` number of pte files.
81
+
- An embedding bin file will be generated in the weights folder where the `config.json` can be found in. [`examples/mediatek/models/llm_models/weights/<model_name>/embedding_<model_config_folder>_fp32.bin`]
65
82
- eg. For `llama3-8B-instruct`, embedding bin generated in `examples/mediatek/models/llm_models/weights/llama3-8B-instruct/`
66
-
- AoT flow will take roughly 2.5 hours (114GB RAM for `num_chunks=4`) to complete (Results will vary by device/hardware configurations)
83
+
- AoT flow will take around 30 minutes to 2.5 hours to complete (Results will vary depending on device/hardware configurations and model sizes)
To set up the build environment for the `mtk_executor_runner`:
80
-
81
-
1. Navigate to the `backends/mediatek/scripts` directory within the repository.
82
-
2. Follow the detailed build steps provided in that location.
83
-
3. Upon successful completion of the build steps, the `mtk_executor_runner` binary will be generated.
84
-
85
94
## Deploying and Running on the Device
86
95
87
96
### Pushing Files to the Device
88
97
89
-
Transfer the `.pte` model files and the `mtk_executor_runner`binary to your Android device using the following commands:
98
+
Transfer the directory containing the `.pte` model files, the `run_<model_name>_sample.sh` script, the `embedding_<model_config_folder>_fp32.bin`, the tokenizer file, the `mtk_llama_executor_runner`binary and the 3 `.so` files to your Android device using the following commands:
90
99
91
100
```bash
92
-
adb push mtk_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
93
-
adb push <MODEL_NAME>.pte <PHONE_PATH, e.g. /data/local/tmp>
101
+
adb push mtk_llama_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
102
+
adb push examples/mediatek/executor_runner/run_<model_name>_sample.sh <PHONE_PATH, e.g. /data/local/tmp>
103
+
adb push embedding_<model_config_folder>_fp32.bin <PHONE_PATH, e.g. /data/local/tmp>
104
+
adb push tokenizer.model <PHONE_PATH, e.g. /data/local/tmp>
105
+
adb push <PTE_DIR><PHONE_PATH, e.g. /data/local/tmp>
94
106
```
95
107
96
-
Make sure to replace `<MODEL_NAME>` with the actual name of your model file. And, replace the `<PHONE_PATH>` with the desired detination on the device.
108
+
Make sure to replace `<PTE_DIR>` with the actual name of your directory containing pte files. And, replace the `<PHONE_PATH>` with the desired detination on the device.
109
+
110
+
At this point your phone directory should have the following files:
111
+
- libneuron_backend.so
112
+
- libneuronusdk_adapter.mtk.so
113
+
- libneuron_buffer_allocator.so
114
+
- mtk_llama_executor_runner
115
+
- <PTE_DIR>
116
+
- tokenizer.json / tokenizer.model(for llama3) / tokenizer.bin(for phi3 and gemma2)
117
+
- embedding_<model_config_folder>_fp32.bin
118
+
- run_<model_name>_sample.sh
97
119
98
120
##### Note: For oss models, please push additional files to your Android device
99
121
```bash
@@ -107,12 +129,13 @@ for i in input*bin; do adb push "$i" <PHONE_PATH, e.g. /data/local/tmp>; done;
107
129
Execute the model on your Android device by running:
#### Note: The `mtk_llama_executor_runner` is applicable to the models listed in `examples/mediatek/models/llm_models/weights/`.
112
137
113
-
In the command above, replace `<MODEL_NAME>` with the name of your model file and `<ITER_TIMES>` with the desired number of iterations to run the model.
114
-
115
-
##### Note: For llama models, please use `mtk_llama_executor_runner`. Refer to `examples/mediatek/executor_runner/run_llama3_sample.sh` for reference.
138
+
##### Note: For non-LLM models, please run `adb shell "/data/local/tmp/mtk_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --iteration <ITER_TIMES>"`.
116
139
##### Note: For oss models, please use `mtk_oss_executor_runner`.
0 commit comments