Skip to content

Commit adaf33b

Browse files
authored
Merge pull request #1957 from annietllnd/review
Add Stable Audio Open Small LP
2 parents eccf114 + 8cacaac commit adaf33b

File tree

10 files changed

+529
-0
lines changed

10 files changed

+529
-0
lines changed

assets/contributors.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,4 +85,5 @@ Yiyang Fan,Arm,,,,
8585
Julien Jayat,Arm,,,,
8686
Geremy Cohen,Arm,geremyCohen,geremyinanutshell,,
8787
Barbara Corriero,Arm,,,,
88+
Nina Drozd,Arm,,ninadrozd,,
8889
Jun He,Arm,JunHe77,jun-he-91969822,,
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: Set up your development environment
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Identify software requirements
10+
11+
In this Learning Path, you'll learn how to convert the Stable Audio Open Small model to the LiteRT (.tflite) format, then build a simple test program to generate audio on a mobile device.
12+
13+
Your first task is to prepare a development environment with the required software:
14+
15+
- Android NDK: version r25b or newer.
16+
- Python: version 3.10 or newer (tested with 3.10).
17+
- CMake: version 3.16.0 or newer (tested with 3.28.1).
18+
- [Arm GNU Toolchain](/install-guides/gcc/arm-gnu).
19+
20+
### Create workspace directory
21+
22+
Create a separate directory for all dependencies and repositories that this Learning Path uses.
23+
24+
Export the `WORKSPACE` variable to point to this directory, which you will use in the following steps:
25+
26+
```bash
27+
mkdir my-workspace
28+
export WORKSPACE=$PWD/my-workspace
29+
```
30+
31+
### Install Python 3.10
32+
33+
Download and install [Python version 3.10](https://www.python.org/downloads/release/python-3100/) using the following commands:
34+
35+
{{< tabpane code=true >}}
36+
{{< tab header="Linux">}}
37+
sudo add-apt-repository ppa:deadsnakes/ppa
38+
sudo apt update
39+
sudo apt install python3.10 python3.10-venv python3.10-pip
40+
{{< /tab >}}
41+
{{< tab header="MacOS">}}
42+
brew install [email protected]
43+
brew link [email protected] --force
44+
{{< /tab >}}
45+
{{< /tabpane >}}
46+
47+
You can verify the installation and check the version with:
48+
49+
```console
50+
python3.10 --version
51+
```
52+
53+
### Install CMake
54+
55+
CMake is an open-source tool that automates the build process for software projects, helping to generate platform-specific build configurations.
56+
57+
{{< tabpane code=true >}}
58+
{{< tab header="Linux">}}
59+
sudo apt update
60+
sudo apt install cmake
61+
{{< /tab >}}
62+
{{< tab header="MacOS">}}
63+
brew install cmake
64+
{{< /tab >}}
65+
{{< /tabpane >}}
66+
67+
You can verify the installation and check the version with:
68+
69+
```console
70+
cmake --version
71+
```
72+
73+
See the [CMake install guide](/install-guides/cmake/) for troubleshooting instructions.
74+
75+
### Install Bazel
76+
77+
Bazel is an open-source build tool which we will use to build LiteRT libraries.
78+
79+
{{< tabpane code=true >}}
80+
{{< tab header="Linux">}}
81+
cd $WORKSPACE
82+
wget https://github.com/bazelbuild/bazel/releases/download/7.4.1/bazel-7.4.1-installer-linux-x86_64.sh
83+
sudo bash bazel-7.4.1-installer-linux-x86_64.sh
84+
{{< /tab >}}
85+
{{< tab header="MacOS">}}
86+
brew install bazel@7
87+
{{< /tab >}}
88+
{{< /tabpane >}}
89+
90+
### Install Android NDK
91+
92+
To run the model on Android, install Android Native Development Kit (Android NDK):
93+
94+
{{< tabpane code=true >}}
95+
{{< tab header="Linux">}}
96+
cd $WORKSPACE
97+
wget https://dl.google.com/android/repository/android-ndk-r25b-linux.zip
98+
unzip android-ndk-r25b-linux.zip
99+
{{< /tab >}}
100+
{{< tab header="MacOS">}}
101+
brew install --cask android-studio temurin
102+
{{< /tab >}}
103+
{{< /tabpane >}}
104+
105+
For easier access and execution of Android NDK tools, add these to the `PATH` and set the `ANDROID_NDK` variable:
106+
107+
{{< tabpane code=true >}}
108+
{{< tab header="Linux">}}
109+
export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/
110+
export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
111+
{{< /tab >}}
112+
{{< tab header="MacOS">}}
113+
export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/
114+
export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin
115+
export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin
116+
{{< /tab >}}
117+
{{< /tabpane >}}
118+
119+
Now that your development environment is ready and all pre-requisites installed, you can test the Audio Stable Open model.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: Download and test the model
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Download the model
10+
11+
Stable Audio Open is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts.
12+
13+
[Log in](https://huggingface.co/login) to HuggingFace and navigate to the model landing page:
14+
15+
```bash
16+
https://huggingface.co/stabilityai/stable-audio-open-small
17+
```
18+
19+
You may need to fill out a form with your contact information to use the model:
20+
21+
![Agree to share contact information#center](./contact-information.png)
22+
23+
Download and copy the configuration file `model_config.json` and the model itself, `model.ckpt`, to your workspace directory, and verify they exist by running the command:
24+
25+
```bash
26+
ls $WORKSPACE/model_config.json $WORKSPACE/model.ckpt
27+
```
28+
29+
## Test the model
30+
31+
To test the model, use the Stable Audio demo site, which lets you experiment directly through a web-based interface:
32+
33+
```bash
34+
https://stableaudio.com/
35+
```
36+
37+
Use the UI to enter a prompt. A good prompt can include:
38+
39+
* Music genre and subgenre.
40+
* Musical elements (texture, rhythm and articulation).
41+
* Musical atmosphere (mood and emotion).
42+
* Tempo, using beats per minute (BPM).
43+
44+
The order of prompt parameters matters. For more information, see the [Prompt structure user guide](https://stableaudio.com/user-guide/prompt-structure).
45+
46+
You can explore training and inference code for audio generation models in the [Stable Audio Tools repository](https://github.com/Stability-AI/stable-audio-tools).
47+
48+
Now that you've downloaded and tested the model, continue to the next section to convert the model to LiteRT.
49+
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
---
2+
title: Convert Open Stable Audio Small model to LiteRT
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Stable Audio Open Small Model
10+
11+
|Submodule|Description|
12+
|------|------|
13+
|Conditioners| Includes a T5-based text encoder for the input prompt and a numerical duration encoder. These components convert the inputs into embeddings passed to the DiT model. |
14+
|Diffusion Transformer (DiT)| Denoises random noise over multiple steps to produce structured latent audio, guided by conditioner embeddings. |
15+
|AutoEncoder| Compresses audio waveforms into a latent representation for processing by the DiT model, and decompresses the output back into audio. |
16+
17+
The submodules work together to provide the pipeline as shown below:
18+
![Model structure#center](./model.png)
19+
20+
As part of this section, you will covert each of the three submodules into [LiteRT](https://ai.google.dev/edge/litert) format, using two separate conversion routes:
21+
1. Conditioners submodule - ONNX to LiteRT using [onnx2tf](https://github.com/PINTO0309/onnx2tf) tool.
22+
2. DiT and AutoEncoder submodules - PyTorch to LiteRT using Google AI Edge Torch tool.
23+
24+
### Create virtual environment and install dependencies
25+
26+
The Conditioners submodule is made of the T5Encoder model. You will use the ONNX to TFLite conversion for this submodule.
27+
28+
To avoid dependency issues, create a virtual environment. In this guide, we will use `virtualenv`:
29+
30+
```bash
31+
cd $WORKSPACE
32+
python3.10 -m venv env
33+
source env/bin/activate
34+
```
35+
36+
Clone the examples repository:
37+
38+
```bash
39+
cd $WORKSPACE
40+
git clone https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/audiogen
41+
cd audio-stale-open-litert
42+
```
43+
44+
We now install the needed python packages for this, including *onnx2tf* and *ai_edge_litert*
45+
46+
```bash
47+
bash install_requirements.sh
48+
```
49+
50+
{{% notice %}}
51+
52+
If you are using GPU on your machine, you may notice the following error:
53+
```text
54+
Traceback (most recent call last):
55+
File "$WORKSPACE/env/lib/python3.10/site-packages/torch/_inductor/runtime/hints.py",
56+
line 46, in <module> from triton.backends.compiler import AttrsDescriptor
57+
ImportError: cannot import name 'AttrsDescriptor' from 'triton.backends.compiler'
58+
($WORKSPACE/env/lib/python3.10/site-packages/triton/backends/compiler.py)
59+
.
60+
ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler'
61+
($WORKSPACE/env/lib/python3.10/site-packages/triton/compiler/compiler.py)
62+
```
63+
64+
Install the following dependency and rerun the script:
65+
```bash
66+
pip install triton==3.2.0
67+
bash install_requirements.sh
68+
```
69+
70+
{{% /notice %}}
71+
72+
### Convert Conditioners Submodule
73+
74+
The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT.
75+
76+
For this conversion we include the following steps:
77+
1. Load the Conditioners submodule from the Stable Audio Open model configuration and checkpoint.
78+
2. Export the Conditioners submodule to ONNX via *torch.onnx.export()*.
79+
3. Convert the resulting ONNX file to LiteRT using *onnx2tf*.
80+
81+
You can use the provided script to convert the Conditioners submodule:
82+
83+
```bash
84+
python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
85+
```
86+
87+
After successful conversion, you now have a `conditioners.onnx` model in your current directory.
88+
89+
### Convert DiT and AutoEncoder
90+
91+
To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to tflite using three main steps:
92+
93+
1. Model re-authoring.
94+
2. Quantization.
95+
3. Conversion.
96+
97+
Convert the DiT and AutoEncoder submodules using the provided python script:
98+
```bash
99+
CUDA_VISIBLE_DEVICES="" python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
100+
```
101+
102+
After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory and can deactivate the virtual environment:
103+
104+
```bash
105+
deactivate
106+
```
107+
108+
For easier access, we add all needed models to one directory:
109+
110+
```bash
111+
export LITERT_MODELS_PATH=$WORKSPACE/litert-models
112+
mkdir $LITERT_MODELS_PATH
113+
cp conditioners_tflite/conditioners_float32.tflite $LITERT_MODELS_PATH
114+
cp dit_model.tflite $LITERT_MODELS_PATH
115+
cp autoencoder_model.tflite $LITERT_MODELS_PATH
116+
```
117+
118+
With all three submodules converted to LiteRT format, you're ready to build LiteRT and run the model on a mobile device in the next step.
119+
120+
121+
122+
123+
124+
125+
126+
127+
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
title: Build LiteRT
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## LiteRT
10+
11+
LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI.
12+
13+
## Build LiteRT libraries
14+
15+
Clone the repository and get the latest modules
16+
17+
```console
18+
cd $WORKSPACE
19+
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
20+
cd tensorflow_src
21+
```
22+
23+
We will use a specific commit of tensorflow for build so you can checkout and set the `TF_SRC_PATH`:
24+
```bash
25+
git checkout 84dd28bbc29d75e6a6d917eb2998e4e8ea90ec56
26+
export TF_SRC_PATH=$(pwd)
27+
```
28+
29+
We can use `bazel` to build LiteRT libraries, first we use configure script to create a custom configuration for this:
30+
31+
You can now create a custom TFLite build for android:
32+
33+
Ensure the `ANDROID_NDK` variable is set to your previously installed Android NDK:
34+
{{< tabpane code=true >}}
35+
{{< tab header="Linux">}}
36+
export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/
37+
export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
38+
{{< /tab >}}
39+
{{< tab header="MacOS">}}
40+
export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''"
41+
export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/
42+
export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin
43+
export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin
44+
{{< /tab >}}
45+
{{< /tabpane >}}
46+
47+
Set the TensorFlow version
48+
49+
```bash
50+
export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''"
51+
```
52+
53+
54+
Now you can configure TensorFlow. Here you can set the custom build parameters needed as follows:
55+
56+
```bash { output_lines = "2-14" }
57+
python3 ./configure.py
58+
Please specify the location of python. [Default is $WORKSPACE/bin/python3]:
59+
Please input the desired Python library path to use. Default is [$WORKSPACE/lib/python3.10/site-packages]
60+
Do you wish to build TensorFlow with ROCm support? [y/N]: n
61+
Do you wish to build TensorFlow with CUDA support? [y/N]: n
62+
Do you want to use Clang to build TensorFlow? [Y/n]: n
63+
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: y
64+
Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]: /home/user/Workspace/tools/ndk/android-ndk-r25b
65+
Please specify the (min) Android NDK API level to use. [Available levels: [16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33]] [Default is 21]: 30
66+
Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]:
67+
Please specify the Android SDK API level to use. [Available levels: ['31', '33', '34', '35']] [Default is 35]:
68+
Please specify an Android build tools version to use. [Available versions: ['30.0.3', '34.0.0', '35.0.0']] [Default is 35.0.0]:
69+
```
70+
71+
Once the bazel configuration is complete, you can build TFLite as follows:
72+
```console
73+
bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so \
74+
--define tflite_with_xnnpack=true \
75+
--define=xnn_enable_arm_i8mm=true \
76+
--define tflite_with_xnnpack_qs8=true \
77+
--define tflite_with_xnnpack_qu8=true
78+
```
79+
80+
This will produce a `libtensorflowlite.so` shared library for android with XNNPack enabled, which we will use to build the example next.
81+
82+
83+
84+
85+
86+
87+
88+

0 commit comments

Comments
 (0)