Skip to content

Commit a55861e

Browse files
authored
Merge pull request #1964 from ArmDeveloperEcosystem/main
Prod update for stable audio
2 parents c0bcf6d + a1827f5 commit a55861e

File tree

9 files changed

+185
-202
lines changed

9 files changed

+185
-202
lines changed

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Your first task is to prepare a development environment with the required softwa
1919

2020
### Create workspace directory
2121

22-
Create a separate directory for all dependencies and repositories that this Learning Path uses.
22+
Create a separate directory for all the dependencies and repositories that this Learning Path uses.
2323

2424
Export the `WORKSPACE` variable to point to this directory, which you will use in the following steps:
2525

@@ -74,7 +74,7 @@ See the [CMake install guide](/install-guides/cmake/) for troubleshooting instru
7474

7575
### Install Bazel
7676

77-
Bazel is an open-source build tool which we will use to build LiteRT libraries.
77+
Bazel is an open-source build tool which you will use to build LiteRT libraries.
7878

7979
{{< tabpane code=true >}}
8080
{{< tab header="Linux">}}
@@ -98,22 +98,24 @@ wget https://dl.google.com/android/repository/android-ndk-r25b-linux.zip
9898
unzip android-ndk-r25b-linux.zip
9999
{{< /tab >}}
100100
{{< tab header="MacOS">}}
101-
brew install --cask android-studio temurin
101+
wget https://dl.google.com/android/repository/android-ndk-r25b-darwin.zip
102+
unzip android-ndk-r25b-darwin
103+
mv android-ndk-r25b-darwin ~/Library/Android/android-ndk-r25b
102104
{{< /tab >}}
103105
{{< /tabpane >}}
104106

105-
For easier access and execution of Android NDK tools, add these to the `PATH` and set the `ANDROID_NDK` variable:
107+
For easier access and execution of Android NDK tools, add these to the `PATH` and set the `NDK_PATH` variable:
106108

107109
{{< tabpane code=true >}}
108110
{{< tab header="Linux">}}
109-
export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/
110-
export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
111+
export NDK_PATH=$WORKSPACE/android-ndk-r25b/
112+
export PATH=$NDK_PATH/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
111113
{{< /tab >}}
112114
{{< tab header="MacOS">}}
113-
export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/
114-
export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin
115+
export NDK_PATH=~/Library/Android/android-ndk-r25b
116+
export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin
115117
export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin
116118
{{< /tab >}}
117119
{{< /tabpane >}}
118120

119-
Now that your development environment is ready and all pre-requisites installed, you can test the Audio Stable Open model.
121+
Now that your development environment is ready and all the prerequisites are installed, you can move on to test the Stable Audio Open Small model.

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ layout: learningpathall
88

99
## Download the model
1010

11-
Stable Audio Open is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts.
11+
Stable Audio Open Small is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts.
1212

1313
[Log in](https://huggingface.co/login) to HuggingFace and navigate to the model landing page:
1414

1515
```bash
16-
https://huggingface.co/stabilityai/stable-audio-open-small
16+
https://huggingface.co/stabilityai/stable-audio-open-small/tree/main
1717
```
1818

1919
You may need to fill out a form with your contact information to use the model:
@@ -26,15 +26,11 @@ Download and copy the configuration file `model_config.json` and the model itsel
2626
ls $WORKSPACE/model_config.json $WORKSPACE/model.ckpt
2727
```
2828

29-
## Test the model
29+
You can learn more about this model [here](https://huggingface.co/stabilityai/stable-audio-open-small).
3030

31-
To test the model, use the Stable Audio demo site, which lets you experiment directly through a web-based interface:
31+
### Good prompting practices
3232

33-
```bash
34-
https://stableaudio.com/
35-
```
36-
37-
Use the UI to enter a prompt. A good prompt can include:
33+
A good prompt for the Stable Audio Open Small model can include the following elements:
3834

3935
* Music genre and subgenre.
4036
* Musical elements (texture, rhythm and articulation).
@@ -45,5 +41,5 @@ The order of prompt parameters matters. For more information, see the [Prompt st
4541

4642
You can explore training and inference code for audio generation models in the [Stable Audio Tools repository](https://github.com/Stability-AI/stable-audio-tools).
4743

48-
Now that you've downloaded and tested the model, continue to the next section to convert the model to LiteRT.
44+
Now that you've downloaded the model, you're ready to convert it to LiteRT format in the next step.
4945

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md

Lines changed: 32 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,37 @@
11
---
2-
title: Convert Open Stable Audio Small model to LiteRT
2+
title: Convert Stable Audio Open Small model to LiteRT
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8+
In this section, you will learn about the audio generation model. You will then clone a repository that contains the scripts required to convert the model submodules into LiteRT format and generate the inference application.
89

9-
## Stable Audio Open Small Model
10+
## Stable Audio Open Small
11+
12+
The open-source model consists of three main submodules. They are described in the table below, and come together through the pipeline shown in the image.
1013

1114
|Submodule|Description|
1215
|------|------|
1316
|Conditioners| Includes a T5-based text encoder for the input prompt and a numerical duration encoder. These components convert the inputs into embeddings passed to the DiT model. |
1417
|Diffusion Transformer (DiT)| Denoises random noise over multiple steps to produce structured latent audio, guided by conditioner embeddings. |
1518
|AutoEncoder| Compresses audio waveforms into a latent representation for processing by the DiT model, and decompresses the output back into audio. |
1619

17-
The submodules work together to provide the pipeline as shown below:
20+
1821
![Model structure#center](./model.png)
1922

20-
As part of this section, you will covert each of the three submodules into [LiteRT](https://ai.google.dev/edge/litert) format, using two separate conversion routes:
21-
1. Conditioners submodule - ONNX to LiteRT using [onnx2tf](https://github.com/PINTO0309/onnx2tf) tool.
22-
2. DiT and AutoEncoder submodules - PyTorch to LiteRT using Google AI Edge Torch tool.
23+
In this section, you will explore two different conversion routes, to convert the submodules to [LiteRT](https://ai.google.dev/edge/litert) format. Both methods will be run using Python wrapper scripts from the examples repository.
24+
25+
1. **ONNX to LiteRT**: using the `onnx2tf` tool. This is the traditional two-step approach (PyTorch -> ONNX -> LiteRT). You will use it to convert the Conditioners submodule.
26+
27+
2. **PyTorch to LiteRT**: using the Google AI Edge Torch tool. You will use this tool to convert the DiT and AutoEncoder submodules.
28+
2329

24-
### Create virtual environment and install dependencies
30+
## Download the sample code
2531

2632
The Conditioners submodule is made of the T5Encoder model. You will use the ONNX to TFLite conversion for this submodule.
2733

28-
To avoid dependency issues, create a virtual environment. In this guide, we will use `virtualenv`:
34+
To avoid dependency issues, create a virtual environment. For example, you can use the following command:
2935

3036
```bash
3137
cd $WORKSPACE
@@ -37,11 +43,11 @@ Clone the examples repository:
3743

3844
```bash
3945
cd $WORKSPACE
40-
git clone https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/audiogen
41-
cd audio-stale-open-litert
46+
git clone https://github.com/ARM-software/ML-examples.git
47+
cd ML-examples/kleidiai-examples/audiogen/
4248
```
4349

44-
We now install the needed python packages for this, including *onnx2tf* and *ai_edge_litert*
50+
Install the required Python packages for this, including *onnx2tf* and *ai_edge_litert*
4551

4652
```bash
4753
bash install_requirements.sh
@@ -61,20 +67,19 @@ ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler
6167
($WORKSPACE/env/lib/python3.10/site-packages/triton/compiler/compiler.py)
6268
```
6369

64-
Install the following dependency and rerun the script:
70+
Reinstall the following dependency:
6571
```bash
6672
pip install triton==3.2.0
67-
bash install_requirements.sh
6873
```
6974

7075
{{% /notice %}}
7176

7277
### Convert Conditioners Submodule
7378

74-
The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT.
79+
The Conditioners submodule is based on the T5Encoder model. First, convert it to ONNX, then to LiteRT.
7580

76-
For this conversion we include the following steps:
77-
1. Load the Conditioners submodule from the Stable Audio Open model configuration and checkpoint.
81+
For this conversion, the following steps are required:
82+
1. Load the Conditioners submodule from the Stable Audio Open Small model configuration and checkpoint.
7883
2. Export the Conditioners submodule to ONNX via *torch.onnx.export()*.
7984
3. Convert the resulting ONNX file to LiteRT using *onnx2tf*.
8085

@@ -84,28 +89,29 @@ You can use the provided script to convert the Conditioners submodule:
8489
python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
8590
```
8691

87-
After successful conversion, you now have a `conditioners.onnx` model in your current directory.
92+
After successful conversion, you now have a `tflite_conditioners` directory containing models with different precisions (e.g., float16, float32).
93+
94+
You will be using the float32.tflite model for on-device inference.
8895

8996
### Convert DiT and AutoEncoder
9097

91-
To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to tflite using three main steps:
98+
To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to `.tflite` using three main steps:
9299

93100
1. Model re-authoring.
94101
2. Quantization.
95102
3. Conversion.
96103

97-
Convert the DiT and AutoEncoder submodules using the provided python script:
104+
Convert the DiT and AutoEncoder submodules using the provided Python script:
105+
98106
```bash
99-
CUDA_VISIBLE_DEVICES="" python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
107+
python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
100108
```
101109

102-
After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory and can deactivate the virtual environment:
110+
After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory.
103111

104-
```bash
105-
deactivate
106-
```
112+
A more detailed explanation of the above scripts is available [here](https://github.com/ARM-software/ML-examples/blob/main/kleidiai-examples/audiogen/scripts/README.md).
107113

108-
For easier access, we add all needed models to one directory:
114+
For easy access, add all the required models to one directory:
109115

110116
```bash
111117
export LITERT_MODELS_PATH=$WORKSPACE/litert-models
@@ -115,7 +121,7 @@ cp dit_model.tflite $LITERT_MODELS_PATH
115121
cp autoencoder_model.tflite $LITERT_MODELS_PATH
116122
```
117123

118-
With all three submodules converted to LiteRT format, you're ready to build LiteRT and run the model on a mobile device in the next step.
124+
With all three submodules now converted to LiteRT format, you're ready to build the runtime and run Stable Audio Open Small directly on an Android device in the next step.
119125

120126

121127

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md

Lines changed: 37 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## LiteRT
1010

11-
LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI.
11+
LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. Designed for low-latency, resource-efficient execution, LiteRT is optimized for mobile and embedded environments — making it a natural fit for Arm CPUs running models like Stable Audio Open Small. You’ll build the runtime using the Bazel build tool.
1212

1313
## Build LiteRT libraries
1414

@@ -20,55 +20,51 @@ git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
2020
cd tensorflow_src
2121
```
2222

23-
We will use a specific commit of tensorflow for build so you can checkout and set the `TF_SRC_PATH`:
23+
Check out the specified commit of TensorFlow, and set the `TF_SRC_PATH`:
2424
```bash
2525
git checkout 84dd28bbc29d75e6a6d917eb2998e4e8ea90ec56
2626
export TF_SRC_PATH=$(pwd)
2727
```
2828

29-
We can use `bazel` to build LiteRT libraries, first we use configure script to create a custom configuration for this:
29+
A script is available to configure the `bazel` build environment. Run it to create a custom TFLite build for Android:
3030

31-
You can now create a custom TFLite build for android:
32-
33-
Ensure the `ANDROID_NDK` variable is set to your previously installed Android NDK:
31+
{{% notice Reminder %}}
32+
Ensure the `NDK_PATH` variable is set to your previously installed Android NDK:
3433
{{< tabpane code=true >}}
3534
{{< tab header="Linux">}}
36-
export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/
37-
export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
35+
export NDK_PATH=$WORKSPACE/android-ndk-r25b/
36+
export PATH=$NDK_PATH/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
3837
{{< /tab >}}
3938
{{< tab header="MacOS">}}
40-
export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''"
41-
export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/
42-
export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin
43-
export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin
39+
export NDK_PATH=~/Library/Android/android-ndk-r25b
40+
export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin
4441
{{< /tab >}}
4542
{{< /tabpane >}}
43+
{{% /notice %}}
4644

47-
Set the TensorFlow version
45+
The configuration script is interactive. Run it using the command below, and use the table to set the parameters for this Learning Path use-case.
4846

4947
```bash
50-
export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''"
51-
```
52-
53-
54-
Now you can configure TensorFlow. Here you can set the custom build parameters needed as follows:
55-
56-
```bash { output_lines = "2-14" }
5748
python3 ./configure.py
58-
Please specify the location of python. [Default is $WORKSPACE/bin/python3]:
59-
Please input the desired Python library path to use. Default is [$WORKSPACE/lib/python3.10/site-packages]
60-
Do you wish to build TensorFlow with ROCm support? [y/N]: n
61-
Do you wish to build TensorFlow with CUDA support? [y/N]: n
62-
Do you want to use Clang to build TensorFlow? [Y/n]: n
63-
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: y
64-
Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]: /home/user/Workspace/tools/ndk/android-ndk-r25b
65-
Please specify the (min) Android NDK API level to use. [Available levels: [16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33]] [Default is 21]: 30
66-
Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]:
67-
Please specify the Android SDK API level to use. [Available levels: ['31', '33', '34', '35']] [Default is 35]:
68-
Please specify an Android build tools version to use. [Available versions: ['30.0.3', '34.0.0', '35.0.0']] [Default is 35.0.0]:
6949
```
7050

71-
Once the bazel configuration is complete, you can build TFLite as follows:
51+
|Question|Input|
52+
|---|---|
53+
|Please specify the location of python. [Default is $WORKSPACE/bin/python3]:| Enter (default) |
54+
|Please input the desired Python library path to use[$WORKSPACE/lib/python3.10/site-packages] | Enter |
55+
|Do you wish to build TensorFlow with ROCm support? [y/N]|N (No)|
56+
|Do you wish to build TensorFlow with CUDA support?|N|
57+
|Do you want to use Clang to build TensorFlow? [Y/n]|N|
58+
|Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]|y (Yes) |
59+
|Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]| Enter |
60+
|Please specify the (min) Android NDK API level to use. [Default is 21] | 27 |
61+
|Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]| Enter |
62+
|Please specify the Android SDK API level to use. [Default is 35]| Enter |
63+
|Please specify an Android build tools version to use. [Default is 35.0.0]| Enter |
64+
|Do you wish to build TensorFlow with iOS support? [y/N]:| n |
65+
66+
Once the Bazel configuration is complete, you can build TFLite as follows:
67+
7268
```console
7369
bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so \
7470
--define tflite_with_xnnpack=true \
@@ -77,7 +73,15 @@ bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so
7773
--define tflite_with_xnnpack_qu8=true
7874
```
7975

80-
This will produce a `libtensorflowlite.so` shared library for android with XNNPack enabled, which we will use to build the example next.
76+
The final step is to build flatbuffers used by the application:
77+
```
78+
cd $WORKSPACE/tensorflow_src
79+
mkdir flatc-native-build && cd flatc-native-build
80+
cmake ../tensorflow/lite/tools/cmake/native_tools/flatbuffers
81+
cmake --build .
82+
```
83+
84+
Now that LiteRT and FlatBuffers are built, you're ready to compile and deploy the Stable Audio Open Small inference application on your Android device.
8185

8286

8387

0 commit comments

Comments
 (0)