Skip to content

Commit 0b4fe31

Browse files
Support qwen phi gemma whisper (#14294)
### Summary Updated README and remove NeuronAdapter.h
1 parent 0b3227f commit 0b4fe31

File tree

1 file changed

+60
-37
lines changed

1 file changed

+60
-37
lines changed

examples/mediatek/README.md

Lines changed: 60 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -26,44 +26,61 @@ examples/mediatek
2626
# Examples Build Instructions
2727

2828
## Environment Setup
29-
- Follow the instructions of **Prerequisites** and **Setup** in `backends/mediatek/scripts/README.md`.
29+
- Follow the instructions in `backends/mediatek/README.md` to build the backend library `libneuron_backend.so`.
3030

31-
- Build required libraries by `backends/mediatek/scripts/mtk_build.sh` before building examples.
32-
33-
## Build MediaTek Examples
34-
1. Build the backend and the examples by exedcuting the script:
31+
## Build MediaTek Runners
32+
1. Build the mediatek model runner by executing the script:
3533
```bash
3634
./mtk_build_examples.sh
3735
```
36+
This will generate the required runners in `executorch/cmake-android-out/examples/mediatek/`
3837

39-
## LLaMa Example Instructions
38+
## Model Export Instructions
4039
##### Note: Verify that localhost connection is available before running AoT Flow
41-
1. Exporting Models to `.pte`
42-
- In the `examples/mediatek directory`, run:
40+
1. Download Required Files
41+
- Download the model files from the official Hugging Face website, and move the files to the respective folder in `examples/mediatek/models/llm_models/weights/` **EXCEPT** the `config.json` file.
42+
- The `config.json` file is already included in the model folders, which may include some modifications required for the model exportation.
43+
- Include the calibration data (if any) under `aot_utils/llm_utils/prompts/`
44+
45+
2. Exporting Models to `.pte`
46+
- In the `examples/mediatek/ directory`, run:
4347
```bash
44-
source shell_scripts/export_llama.sh <model_name> <num_chunks> <prompt_num_tokens> <cache_size> <calibration_set_name>
48+
source shell_scripts/export_<model_family>.sh <model_name> <num_chunks> <prompt_num_tokens> <cache_size> <calibration_data_file> <precision> <platform>
4549
```
4650
- Defaults:
47-
- `model_name` = llama3
51+
- `model_name` = Depends on model family. Check respective `shell_scripts/export_<model_family>.sh` for info.
4852
- `num_chunks` = 4
4953
- `prompt_num_tokens` = 128
50-
- `cache_size` = 1024
51-
- `calibration_set_name` = None
54+
- `cache_size` = 512
55+
- `calibration_data_file` = None
56+
- `precision` = A16W4
57+
- `platform` = DX4
58+
5259
- Argument Explanations/Options:
53-
- `model_name`: llama2/llama3
54-
<sub>**Note: Currently Only Tested on Llama2 7B Chat and Llama3 8B Instruct.**</sub>
55-
- `num_chunks`: Number of chunks to split the model into. Each chunk contains the same number of decoder layers. Will result in `num_chunks` number of `.pte` files being generated. Typical values are 1, 2 and 4.
60+
- `model_name`: View list 'Available model names' below.
61+
- `num_chunks`: Number of chunks to split the model into. Each chunk contains the same number of decoder layers. Typical values are 1, 2 and 4.
5662
- `prompt_num_tokens`: Number of tokens (> 1) consumed each forward pass for the prompt processing stage.
5763
- `cache_size`: Cache Size.
58-
- `calibration_set_name`: Name of calibration dataset with extension that is found inside the `aot_utils/llm_utils/prompts` directory. Example: `alpaca.txt`. If `"None"`, will use dummy data to calibrate.
64+
- `calibration_data_file`: Name of calibration dataset with extension that is found inside the `aot_utils/llm_utils/prompts/` directory. Example: `alpaca.txt`. If `"None"`, will use dummy data to calibrate.
65+
- `precision`: Quantization precision for the model. Available options are `["A16W4", "A16W8", "A16W16", "A8W4", "A8W8"]`
66+
- `platform`: The platform of the device. `DX4` for Mediatek Dimensity 9400 and `DX3` for Mediatek Dimensity 9300.
5967
<sub>**Note: Export script example only tested on `.txt` file.**</sub>
6068

61-
2. `.pte` files will be generated in `examples/mediatek/pte`
62-
- Users should expect `num_chunks*2` number of pte files (half of them for prompt and half of them for generation).
63-
- Generation `.pte` files have "`1t`" in their names.
64-
- Additionally, an embedding bin file will be generated in the weights folder where the `config.json` can be found in. [`examples/mediatek/models/llm_models/weights/<model_name>/embedding_<model_config_folder>_fp32.bin`]
69+
- Available model names:
70+
- Llama:
71+
- llama3.2-3b, llama3.2-1b, llama3, llama2
72+
- Qwen:
73+
- Qwen3-4B, Qwen3-1.7B, Qwen2-7B-Instruct, Qwen2.5-3B, Qwen2.5-0.5B-Instruct, Qwen2-1.5B-Instruct
74+
- Gemma:
75+
- gemma2, gemma3
76+
- Phi:
77+
- phi3.5, phi4
78+
79+
3. `.pte` files will be generated in `examples/mediatek/pte/`
80+
- Users should expect `num_chunks` number of pte files.
81+
- An embedding bin file will be generated in the weights folder where the `config.json` can be found in. [`examples/mediatek/models/llm_models/weights/<model_name>/embedding_<model_config_folder>_fp32.bin`]
6582
- eg. For `llama3-8B-instruct`, embedding bin generated in `examples/mediatek/models/llm_models/weights/llama3-8B-instruct/`
66-
- AoT flow will take roughly 2.5 hours (114GB RAM for `num_chunks=4`) to complete (Results will vary by device/hardware configurations)
83+
- AoT flow will take around 30 minutes to 2.5 hours to complete (Results will vary depending on device/hardware configurations and model sizes)
6784

6885
### oss
6986
1. Exporting Model to `.pte`
@@ -74,26 +91,31 @@ bash shell_scripts/export_oss.sh <model_name>
7491
- `model_name`: deeplabv3/edsr/inceptionv3/inceptionv4/mobilenetv2/mobilenetv3/resnet18/resnet50/dcgan/wav2letter/vit_b_16/mobilebert/emformer_rnnt/bert/distilbert
7592

7693
# Runtime
77-
## Environment Setup
78-
79-
To set up the build environment for the `mtk_executor_runner`:
80-
81-
1. Navigate to the `backends/mediatek/scripts` directory within the repository.
82-
2. Follow the detailed build steps provided in that location.
83-
3. Upon successful completion of the build steps, the `mtk_executor_runner` binary will be generated.
84-
8594
## Deploying and Running on the Device
8695

8796
### Pushing Files to the Device
8897

89-
Transfer the `.pte` model files and the `mtk_executor_runner` binary to your Android device using the following commands:
98+
Transfer the directory containing the `.pte` model files, the `run_<model_name>_sample.sh` script, the `embedding_<model_config_folder>_fp32.bin`, the tokenizer file, the `mtk_llama_executor_runner` binary and the 3 `.so` files to your Android device using the following commands:
9099

91100
```bash
92-
adb push mtk_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
93-
adb push <MODEL_NAME>.pte <PHONE_PATH, e.g. /data/local/tmp>
101+
adb push mtk_llama_executor_runner <PHONE_PATH, e.g. /data/local/tmp>
102+
adb push examples/mediatek/executor_runner/run_<model_name>_sample.sh <PHONE_PATH, e.g. /data/local/tmp>
103+
adb push embedding_<model_config_folder>_fp32.bin <PHONE_PATH, e.g. /data/local/tmp>
104+
adb push tokenizer.model <PHONE_PATH, e.g. /data/local/tmp>
105+
adb push <PTE_DIR> <PHONE_PATH, e.g. /data/local/tmp>
94106
```
95107

96-
Make sure to replace `<MODEL_NAME>` with the actual name of your model file. And, replace the `<PHONE_PATH>` with the desired detination on the device.
108+
Make sure to replace `<PTE_DIR>` with the actual name of your directory containing pte files. And, replace the `<PHONE_PATH>` with the desired detination on the device.
109+
110+
At this point your phone directory should have the following files:
111+
- libneuron_backend.so
112+
- libneuronusdk_adapter.mtk.so
113+
- libneuron_buffer_allocator.so
114+
- mtk_llama_executor_runner
115+
- <PTE_DIR>
116+
- tokenizer.json / tokenizer.model(for llama3) / tokenizer.bin(for phi3 and gemma2)
117+
- embedding_<model_config_folder>_fp32.bin
118+
- run_<model_name>_sample.sh
97119

98120
##### Note: For oss models, please push additional files to your Android device
99121
```bash
@@ -107,12 +129,13 @@ for i in input*bin; do adb push "$i" <PHONE_PATH, e.g. /data/local/tmp>; done;
107129
Execute the model on your Android device by running:
108130

109131
```bash
110-
adb shell "/data/local/tmp/mtk_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --iteration <ITER_TIMES>"
132+
adb shell
133+
cd <PHONE_PATH>
134+
sh run_<model_name>_sample.sh
111135
```
136+
#### Note: The `mtk_llama_executor_runner` is applicable to the models listed in `examples/mediatek/models/llm_models/weights/`.
112137

113-
In the command above, replace `<MODEL_NAME>` with the name of your model file and `<ITER_TIMES>` with the desired number of iterations to run the model.
114-
115-
##### Note: For llama models, please use `mtk_llama_executor_runner`. Refer to `examples/mediatek/executor_runner/run_llama3_sample.sh` for reference.
138+
##### Note: For non-LLM models, please run `adb shell "/data/local/tmp/mtk_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --iteration <ITER_TIMES>"`.
116139
##### Note: For oss models, please use `mtk_oss_executor_runner`.
117140
```bash
118141
adb shell "/data/local/tmp/mtk_oss_executor_runner --model_path /data/local/tmp/<MODEL_NAME>.pte --input_list /data/local/tmp/input_list.txt --output_folder /data/local/tmp/output_<MODEL_NAME>"

0 commit comments

Comments
 (0)