|
| 1 | +--- |
| 2 | +title: Build the MNN Command-line ViT Demo |
| 3 | +weight: 5 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +In this section, you will use the model to benchmark performance with and without KleidiAI kernels. You will need to compile library files to run the optimized inference. |
| 10 | + |
| 11 | +## Prepare an example image |
| 12 | + |
| 13 | +You will use an image to run a command-line prompt. In this learning path, the tiger below will be used as an example. You can save this image or provide one of your own. Re-name the image to `example.png` in order to use the commands in the following sections. |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +Use ADB to load the image onto your phone: |
| 18 | + |
| 19 | +```bash |
| 20 | +adb push example.png /data/local/tmp/ |
| 21 | +``` |
| 22 | + |
| 23 | +## Build binaries for command-line inference |
| 24 | + |
| 25 | +Navigate to the MNN project you cloned in the previous section. Create a build directory and run the script. The first time, you will build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`. |
| 26 | + |
| 27 | +```bash |
| 28 | +cd $HOME/MNN/project/android |
| 29 | +mkdir build_64 && cd build_64 |
| 30 | + |
| 31 | +../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=FALSE \ |
| 32 | + -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \ |
| 33 | + -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \ |
| 34 | + -DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true" |
| 35 | +``` |
| 36 | +{{% notice Note %}} |
| 37 | +If your NDK toolchain isn't set up correctly, you may run into issues with the above script. Make note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`. |
| 38 | + |
| 39 | +```bash |
| 40 | +export ANDROID_NDK_HOME=<path-to>/ndk/28.0.12916984 |
| 41 | + |
| 42 | +export CMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_HOME/build/cmake/android.toolchain.cmake |
| 43 | +export ANDROID_NDK=$ANDROID_NDK_HOME |
| 44 | +``` |
| 45 | +{{% /notice %}} |
| 46 | + |
| 47 | +Push the files to your mobile device. Then, enter a shell on the phone using ADB. |
| 48 | + |
| 49 | +```bash |
| 50 | +adb push *so llm_demo tools/cv/*so /data/local/tmp/ |
| 51 | +adb shell |
| 52 | +``` |
| 53 | + |
| 54 | +The following commands should be run in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the `llm_demo` file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt. |
| 55 | + |
| 56 | +```bash |
| 57 | +cd /data/local/tmp/ |
| 58 | +chmod +x llm_demo |
| 59 | +export LD_LIBRARY_PATH=$PWD |
| 60 | +echo "<img>./example.png</img>Describe the content of the image." > prompt |
| 61 | +``` |
| 62 | + |
| 63 | +Finally, run an inference on the model with the following command. |
| 64 | + |
| 65 | +```bash |
| 66 | +./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt |
| 67 | +``` |
| 68 | + |
| 69 | +If the launch is successful, you should see the following output, with the performance benchmark at the end. |
| 70 | + |
| 71 | +```output |
| 72 | +config path is models/Qwen-VL-2B-convert-4bit-per_channel/config.json |
| 73 | +tokenizer_type = 3 |
| 74 | +prompt file is prompt |
| 75 | +The image features a tiger standing in a grassy field, with its front paws raised and its eyes fixed on something or someone behind it. The tiger's stripes are clearly visible against the golden-brown background of the grass. The tiger appears to be alert and ready for action, possibly indicating a moment of tension or anticipation in the scene. |
| 76 | +
|
| 77 | +################################# |
| 78 | +prompt tokens num = 243 |
| 79 | +decode tokens num = 70 |
| 80 | + vision time = 5.76 s |
| 81 | + audio time = 0.00 s |
| 82 | +prefill time = 1.26 s |
| 83 | + decode time = 2.02 s |
| 84 | +prefill speed = 192.28 tok/s |
| 85 | + decode speed = 34.73 tok/s |
| 86 | +################################## |
| 87 | +``` |
| 88 | + |
| 89 | +## Enable KleidiAI and re-run inference |
| 90 | + |
| 91 | +The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. From the `build_64` directory, run: |
| 92 | +```bash |
| 93 | +../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=TRUE \ |
| 94 | +-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \ |
| 95 | +-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \ |
| 96 | +-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true" |
| 97 | +``` |
| 98 | + |
| 99 | +The next step is to update the files on your phone. Start by removing the ones used in the previous step. Then, push the new ones with the same command as before. |
| 100 | + |
| 101 | +```bash |
| 102 | +adb shell "cd /data/local/tmp; rm -rf *so llm_demo tools/cv/*so" |
| 103 | +adb push *so llm_demo tools/cv/*so /data/local/tmp/ |
| 104 | +adb shell |
| 105 | +``` |
| 106 | + |
| 107 | +In the new ADB shell, preform the same steps as in the previous section. |
| 108 | + |
| 109 | +```bash |
| 110 | +cd /data/local/tmp/ |
| 111 | +chmod +x llm_demo |
| 112 | +export LD_LIBRARY_PATH=$PWD |
| 113 | +./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt |
| 114 | +``` |
| 115 | + |
| 116 | +The same output should be displayed, with the benchmark printed at the end: |
| 117 | +```output |
| 118 | +################################# |
| 119 | +prompt tokens num = 243 |
| 120 | +decode tokens num = 70 |
| 121 | + vision time = 2.91 s |
| 122 | + audio time = 0.00 s |
| 123 | +prefill time = 0.91 s |
| 124 | + decode time = 1.56 s |
| 125 | +prefill speed = 266.13 tok/s |
| 126 | + decode speed = 44.96 tok/s |
| 127 | +################################## |
| 128 | +``` |
| 129 | + |
| 130 | +This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels. |
| 131 | + |
| 132 | +| Benchmark | Without KleidiAI | With KleidiAI | |
| 133 | +|---------------------|------------------|---------------| |
| 134 | +| Vision Process Time | 5.76 s | 2.91 s | |
| 135 | +| Prefill Speed | 192.28 tok/s | 266.13 tok/s | |
| 136 | +| Decode Speed | 34.73 tok/s | 44.96 tok/s | |
| 137 | + |
| 138 | +The prefill speed describes how fast the model processes the input prompt. The decode speed corresponds to the rate at which the model generates new tokens after the input is processed |
| 139 | + |
| 140 | +This shows the advantages of using Arm optimized kernels for your ViT use-cases. |
0 commit comments