Skip to content

Commit 2f12f8a

Browse files
authored
Merge pull request #1962 from annietllnd/review
Update Audiogen LP
2 parents 56ae90a + a74a167 commit 2f12f8a

File tree

6 files changed

+86
-50
lines changed

6 files changed

+86
-50
lines changed

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ See the [CMake install guide](/install-guides/cmake/) for troubleshooting instru
7474

7575
### Install Bazel
7676

77-
Bazel is an open-source build tool which we will use to build LiteRT libraries.
77+
Bazel is an open-source build tool which you will use to build LiteRT libraries.
7878

7979
{{< tabpane code=true >}}
8080
{{< tab header="Linux">}}

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,11 @@ Download and copy the configuration file `model_config.json` and the model itsel
2626
ls $WORKSPACE/model_config.json $WORKSPACE/model.ckpt
2727
```
2828

29-
You can see more information about this model [here](https://huggingface.co/stabilityai/stable-audio-open-small).
29+
You can learn more about this model [here](https://huggingface.co/stabilityai/stable-audio-open-small).
3030

31-
A good prompt for this model can include:
31+
### Good prompting practices
32+
33+
A good prompt for this audio generation model can include:
3234

3335
* Music genre and subgenre.
3436
* Musical elements (texture, rhythm and articulation).

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,29 +5,33 @@ weight: 4
55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8+
In this section, you will learn about the audio generation model. You will then clone a repository to run conversion steps, which is needed to generate the inference application.
89

910
## Stable Audio Open Small
1011

12+
The open-sourced model includes three main parts. They are described in the table below, and come together through the pipeline shown in the image.
13+
1114
|Submodule|Description|
1215
|------|------|
1316
|Conditioners| Includes a T5-based text encoder for the input prompt and a numerical duration encoder. These components convert the inputs into embeddings passed to the DiT model. |
1417
|Diffusion Transformer (DiT)| Denoises random noise over multiple steps to produce structured latent audio, guided by conditioner embeddings. |
1518
|AutoEncoder| Compresses audio waveforms into a latent representation for processing by the DiT model, and decompresses the output back into audio. |
1619

17-
The submodules work together to provide the pipeline as shown below:
20+
1821
![Model structure#center](./model.png)
1922

20-
As part of this section, we will explore two different conversion routes, to convert the submodules to [LiteRT](https://ai.google.dev/edge/litert) format.
23+
In this section, you will explore two different conversion routes, to convert the submodules to [LiteRT](https://ai.google.dev/edge/litert) format. Both methods will be run using Python wrapper scripts from the examples repository.
2124

22-
1. ONNX --> LiteRT using the onnx2tf tool. This is the traditional two-step approach (PyTorch --> ONNX--> LiteRT). We will use it to convert the Conditioners submodule.
25+
1. **ONNX to LiteRT**: using the `onnx2tf` tool. This is the traditional two-step approach (PyTorch -> ONNX -> LiteRT). You will use it to convert the Conditioners submodule.
2326

24-
2. PyTorch --> LiteRT using the Google AI Edge Torch tool. We will use this tool to convert the DiT and AutoEncoder submodules.
27+
2. **PyTorch to LiteRT**: using the Google AI Edge Torch tool. You will use this tool to convert the DiT and AutoEncoder submodules.
2528

26-
### Create virtual environment and install dependencies
29+
30+
## Download the sample code
2731

2832
The Conditioners submodule is made of the T5Encoder model. You will use the ONNX to TFLite conversion for this submodule.
2933

30-
To avoid dependency issues, create a virtual environment. In this guide, we will use `virtualenv`:
34+
To avoid dependency issues, create a virtual environment. For example, you can use the following command:
3135

3236
```bash
3337
cd $WORKSPACE
@@ -43,7 +47,7 @@ git clone https://github.com/ARM-software/ML-examples.git
4347
cd ML-examples/kleidiai-examples/audiogen/
4448
```
4549

46-
We now install the needed python packages for this, including *onnx2tf* and *ai_edge_litert*
50+
Install the needed Python packages for this, including *onnx2tf* and *ai_edge_litert*
4751

4852
```bash
4953
bash install_requirements.sh
@@ -72,9 +76,9 @@ pip install triton==3.2.0
7276

7377
### Convert Conditioners Submodule
7478

75-
The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT.
79+
The Conditioners submodule is based on the T5Encoder model. First, convert it to ONNX, then to LiteRT.
7680

77-
For this conversion we include the following steps:
81+
For this conversion, the following steps are needed:
7882
1. Load the Conditioners submodule from the Stable Audio Open Small model configuration and checkpoint.
7983
2. Export the Conditioners submodule to ONNX via *torch.onnx.export()*.
8084
3. Convert the resulting ONNX file to LiteRT using *onnx2tf*.
@@ -87,26 +91,27 @@ python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config
8791

8892
After successful conversion, you now have a `tflite_conditioners` directory containing models with different precisions (e.g., float16, float32).
8993

90-
We will be using the float32.tflite model for on-device inference.
94+
You will be using the float32.tflite model for on-device inference.
9195

9296
### Convert DiT and AutoEncoder
9397

94-
To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to tflite using three main steps:
98+
To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to `.tflite` using three main steps:
9599

96100
1. Model re-authoring.
97101
2. Quantization.
98102
3. Conversion.
99103

100-
Convert the DiT and AutoEncoder submodules using the provided python script:
104+
Convert the DiT and AutoEncoder submodules using the provided Python script:
105+
101106
```bash
102107
python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
103108
```
104109

105110
After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory.
106111

107-
More detailed explanation of the above scripts is available [here](https://github.com/ARM-software/ML-examples/blob/main/kleidiai-examples/audiogen/scripts/README.md)
112+
A more detailed explanation of the above scripts is available [here](https://github.com/ARM-software/ML-examples/blob/main/kleidiai-examples/audiogen/scripts/README.md)
108113

109-
For easier access, we add all needed models to one directory:
114+
For easy access, add all needed models to one directory:
110115

111116
```bash
112117
export LITERT_MODELS_PATH=$WORKSPACE/litert-models

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md

Lines changed: 27 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## LiteRT
1010

11-
LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI.
11+
LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. Designed for low-latency, resource-efficient execution, LiteRT is optimized for mobile and embedded environments — making it a natural fit for Arm CPUs running models lite Stable Audio Open Small. You will build the runtime using the framework using the Bazel build tool.
1212

1313
## Build LiteRT libraries
1414

@@ -20,16 +20,15 @@ git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
2020
cd tensorflow_src
2121
```
2222

23-
We will use a specific commit of tensorflow for build so you can checkout and set the `TF_SRC_PATH`:
23+
Check out the specified commit of TensorFlow, and set the `TF_SRC_PATH`:
2424
```bash
2525
git checkout 84dd28bbc29d75e6a6d917eb2998e4e8ea90ec56
2626
export TF_SRC_PATH=$(pwd)
2727
```
2828

29-
We can use `bazel` to build LiteRT libraries, first we use configure script to create a custom configuration for this:
30-
31-
You can now create a custom TFLite build for android:
29+
A script is available to configure the `bazel` build environment. Run it to create a custom TFLite build for Android:
3230

31+
{{% notice Reminder %}}
3332
Ensure the `NDK_PATH` variable is set to your previously installed Android NDK:
3433
{{< tabpane code=true >}}
3534
{{< tab header="Linux">}}
@@ -41,27 +40,31 @@ export NDK_PATH=~/Library/Android/android-ndk-r25b
4140
export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin
4241
{{< /tab >}}
4342
{{< /tabpane >}}
44-
Now you can configure TensorFlow. Here you can set the custom build parameters needed as follows:
43+
{{% /notice %}}
44+
45+
The configuration script is interactive. Run it using the command below, and use the table to set the parameters for this Learning Path use-case.
4546

46-
```bash { output_lines = "2-17" }
47+
```bash
4748
python3 ./configure.py
48-
Please specify the location of python. [Default is $WORKSPACE/bin/python3]:
49-
Please input the desired Python library path to use. Default is [$WORKSPACE/lib/python3.10/site-packages]
50-
Do you wish to build TensorFlow with ROCm support? [y/N]: n
51-
Do you wish to build TensorFlow with CUDA support? [y/N]: n
52-
Do you want to use Clang to build TensorFlow? [Y/n]: n
53-
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: y
54-
Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]:
55-
Please specify the (min) Android NDK API level to use. [Available levels: [16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33]] [Default is 21]: 30
56-
Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]:
57-
Please specify the Android SDK API level to use. [Available levels: ['31', '33', '34', '35']] [Default is 35]:
58-
Please specify an Android build tools version to use. [Available versions: ['30.0.3', '34.0.0', '35.0.0']] [Default is 35.0.0]:
59-
Do you wish to build TensorFlow with iOS support? [y/N]: n
60-
61-
Configuration finished
6249
```
6350

64-
Once the bazel configuration is complete, you can build TFLite as follows:
51+
|Question|Input|
52+
|---|---|
53+
|Please specify the location of python. [Default is $WORKSPACE/bin/python3]:| Enter (default) |
54+
|Please input the desired Python library path to use[$WORKSPACE/lib/python3.10/site-packages] | Enter |
55+
|Do you wish to build TensorFlow with ROCm support? [y/N]|N (No)|
56+
|Do you wish to build TensorFlow with CUDA support?|N|
57+
|Do you want to use Clang to build TensorFlow? [Y/n]|N|
58+
|Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]|y (Yes) |
59+
|Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]| Enter |
60+
|Please specify the (min) Android NDK API level to use. [Default is 21] | 27 |
61+
|Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]| Enter |
62+
|Please specify the Android SDK API level to use. [Default is 35]| Enter |
63+
|Please specify an Android build tools version to use. [Default is 35.0.0]| Enter |
64+
|Do you wish to build TensorFlow with iOS support? [y/N]:| n |
65+
66+
Once the Bazel configuration is complete, you can build TFLite as follows:
67+
6568
```console
6669
bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so \
6770
--define tflite_with_xnnpack=true \
@@ -70,15 +73,15 @@ bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so
7073
--define tflite_with_xnnpack_qu8=true
7174
```
7275

73-
We also build flatbuffers used by the application in the next steps:
76+
The final step is to build flatbuffers used by the application:
7477
```
7578
cd $WORKSPACE/tensorflow_src
7679
mkdir flatc-native-build && cd flatc-native-build
7780
cmake ../tensorflow/lite/tools/cmake/native_tools/flatbuffers
7881
cmake --build .
7982
```
8083

81-
With flatbuffers and LiteRT built, we can now build our application for Android device.
84+
With flatbuffers and LiteRT built, you can now build the application for Android devices.
8285

8386

8487

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,17 @@ layout: learningpathall
88

99
## Create and build a simple program
1010

11-
You'll now build a simple program that runs inference on all three submodules directly on an Android device.
11+
As a final step, you'll now build a simple program that runs inference on all three submodules directly on an Android device.
1212

1313
The program takes a text prompt as input and generates an audio file as output.
14+
1415
```bash
1516
cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app
1617
mkdir build && cd build
1718
```
1819

19-
Ensure the NDK path is set correctly and build with cmake:
20+
Ensure the NDK path is set correctly and build with `cmake`:
21+
2022
```bash
2123
cmake -DCMAKE_TOOLCHAIN_FILE=$NDK_PATH/build/cmake/android.toolchain.cmake \
2224
-DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
@@ -31,15 +33,30 @@ make -j
3133
After the example application builds successfully, a binary file named `audiogen` is created.
3234

3335
A SentencePiece model is a type of subword tokenizer which is used by the audiogen application, you’ll need to download the *spiece.model* file from:
36+
3437
```bash
35-
https://huggingface.co/google-t5/t5-base/tree/main
38+
cd $WORKSPACE
39+
wget https://huggingface.co/google-t5/t5-base/tree/main
3640
```
37-
we will save this model in `WORKSPACE` for ease of access
41+
42+
Verify this model was downloaded to your `WORKSPACE`.
43+
3844
```text
39-
cp spiece.model $WORKSPACE
45+
ls $WORKSPACE/spiece.model
4046
```
4147

42-
Now use adb (Android Debug Bridge) to push all necessary files into the `audiogen` folder on Android device:
48+
Connect your Android device to your development machine using a cable. adb (Android Debug Bridge) is available as part of the Android SDK. You should see your device on running the following command.
49+
50+
```bash
51+
adb devices
52+
```
53+
54+
```output
55+
<DEVICE ID> device
56+
```
57+
58+
Note that you may have to approve the connection on your phone for this to work. Now, use `adb` to push all necessary files into the `audiogen` folder on Android device:
59+
4360
```bash
4461
cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app/build
4562
adb shell mkdir -p /data/local/tmp/app
@@ -51,15 +68,24 @@ adb push $WORKSPACE/spiece.model /data/local/tmp/app
5168
adb push ${TF_SRC_PATH}/bazel-bin/tensorflow/lite/libtensorflowlite.so /data/local/tmp/app
5269
```
5370

54-
Finally, run the program on your Android device:
55-
```
71+
Start a new shell to access the device's system from your development machine:
72+
73+
```bash
5674
adb shell
75+
```
76+
77+
Finally, run the program on your Android device. Play around with the advice from [Download the model](../2-testing-model) section.
78+
79+
```bash
5780
cd /data/local/tmp/app
5881
LD_LIBRARY_PATH=. ./audiogen . "warm arpeggios on house beats 120BPM with drums effect" 4
5982
exit
6083
```
6184

6285
The successful execution of the app will create `output.wav` of your chosen audio defined by the prompt, you can pull it back to your host machine and enjoy!
86+
6387
```bash
6488
adb pull /data/local/tmp/app/output.wav
6589
```
90+
91+
You should now have gained hands-on experience running the Stable Audio Open Small model with LiteRT on Arm-based devices. This includes setting up the environment, optimizing the model for on-device inference, and understanding how efficient runtimes like LiteRT make low-latency generative AI possible at the edge. You’re now better equipped to explore and deploy AI-powered audio applications on mobile and embedded platforms.

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ minutes_to_complete: 30
66
who_is_this_for: This is an introductory topic for developers looking to deploy the Stable Audio Open Small text-to-audio model using LiteRT on an Android device.
77

88
learning_objectives:
9-
- Deploy the Stable Audio Open Small model on Android using LiteRT.
9+
- Download and learn about the Stable Audio Open Small.
1010
- Create a simple application to generate audio.
1111
- Compile the application for an Arm CPU.
1212
- Run the application on an Android smartphone and generate an audio snippet.

0 commit comments

Comments
 (0)