Skip to content

Commit 56ae90a

Browse files
authored
Merge pull request #1960 from NinaARM/dev/audiogen-updates
Small updates to align with public codebase structure
2 parents adaf33b + 32ee034 commit 56ae90a

File tree

6 files changed

+63
-77
lines changed

6 files changed

+63
-77
lines changed

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -98,22 +98,24 @@ wget https://dl.google.com/android/repository/android-ndk-r25b-linux.zip
9898
unzip android-ndk-r25b-linux.zip
9999
{{< /tab >}}
100100
{{< tab header="MacOS">}}
101-
brew install --cask android-studio temurin
101+
wget https://dl.google.com/android/repository/android-ndk-r25b-darwin.zip
102+
unzip android-ndk-r25b-darwin
103+
mv android-ndk-r25b-darwin ~/Library/Android/android-ndk-r25b
102104
{{< /tab >}}
103105
{{< /tabpane >}}
104106

105-
For easier access and execution of Android NDK tools, add these to the `PATH` and set the `ANDROID_NDK` variable:
107+
For easier access and execution of Android NDK tools, add these to the `PATH` and set the `NDK_PATH` variable:
106108

107109
{{< tabpane code=true >}}
108110
{{< tab header="Linux">}}
109-
export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/
110-
export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
111+
export NDK_PATH=$WORKSPACE/android-ndk-r25b/
112+
export PATH=$NDK_PATH/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
111113
{{< /tab >}}
112114
{{< tab header="MacOS">}}
113-
export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/
114-
export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin
115+
export NDK_PATH=~/Library/Android/android-ndk-r25b
116+
export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin
115117
export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin
116118
{{< /tab >}}
117119
{{< /tabpane >}}
118120

119-
Now that your development environment is ready and all pre-requisites installed, you can test the Audio Stable Open model.
121+
Now that your development environment is ready and all pre-requisites installed, you can test the Audio Stable Open Small model.

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ layout: learningpathall
88

99
## Download the model
1010

11-
Stable Audio Open is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts.
11+
Stable Audio Open Small is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts.
1212

1313
[Log in](https://huggingface.co/login) to HuggingFace and navigate to the model landing page:
1414

1515
```bash
16-
https://huggingface.co/stabilityai/stable-audio-open-small
16+
https://huggingface.co/stabilityai/stable-audio-open-small/tree/main
1717
```
1818

1919
You may need to fill out a form with your contact information to use the model:
@@ -26,15 +26,9 @@ Download and copy the configuration file `model_config.json` and the model itsel
2626
ls $WORKSPACE/model_config.json $WORKSPACE/model.ckpt
2727
```
2828

29-
## Test the model
29+
You can see more information about this model [here](https://huggingface.co/stabilityai/stable-audio-open-small).
3030

31-
To test the model, use the Stable Audio demo site, which lets you experiment directly through a web-based interface:
32-
33-
```bash
34-
https://stableaudio.com/
35-
```
36-
37-
Use the UI to enter a prompt. A good prompt can include:
31+
A good prompt for this model can include:
3832

3933
* Music genre and subgenre.
4034
* Musical elements (texture, rhythm and articulation).
@@ -45,5 +39,5 @@ The order of prompt parameters matters. For more information, see the [Prompt st
4539

4640
You can explore training and inference code for audio generation models in the [Stable Audio Tools repository](https://github.com/Stability-AI/stable-audio-tools).
4741

48-
Now that you've downloaded and tested the model, continue to the next section to convert the model to LiteRT.
42+
Now that you've downloaded the model, continue to the next section to convert the model to LiteRT.
4943

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
## Stable Audio Open Small Model
9+
## Stable Audio Open Small
1010

1111
|Submodule|Description|
1212
|------|------|
@@ -17,9 +17,11 @@ layout: learningpathall
1717
The submodules work together to provide the pipeline as shown below:
1818
![Model structure#center](./model.png)
1919

20-
As part of this section, you will covert each of the three submodules into [LiteRT](https://ai.google.dev/edge/litert) format, using two separate conversion routes:
21-
1. Conditioners submodule - ONNX to LiteRT using [onnx2tf](https://github.com/PINTO0309/onnx2tf) tool.
22-
2. DiT and AutoEncoder submodules - PyTorch to LiteRT using Google AI Edge Torch tool.
20+
As part of this section, we will explore two different conversion routes, to convert the submodules to [LiteRT](https://ai.google.dev/edge/litert) format.
21+
22+
1. ONNX --> LiteRT using the onnx2tf tool. This is the traditional two-step approach (PyTorch --> ONNX--> LiteRT). We will use it to convert the Conditioners submodule.
23+
24+
2. PyTorch --> LiteRT using the Google AI Edge Torch tool. We will use this tool to convert the DiT and AutoEncoder submodules.
2325

2426
### Create virtual environment and install dependencies
2527

@@ -37,8 +39,8 @@ Clone the examples repository:
3739

3840
```bash
3941
cd $WORKSPACE
40-
git clone https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/audiogen
41-
cd audio-stale-open-litert
42+
git clone https://github.com/ARM-software/ML-examples.git
43+
cd ML-examples/kleidiai-examples/audiogen/
4244
```
4345

4446
We now install the needed python packages for this, including *onnx2tf* and *ai_edge_litert*
@@ -61,10 +63,9 @@ ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler
6163
($WORKSPACE/env/lib/python3.10/site-packages/triton/compiler/compiler.py)
6264
```
6365

64-
Install the following dependency and rerun the script:
66+
Reinstall the following dependency:
6567
```bash
6668
pip install triton==3.2.0
67-
bash install_requirements.sh
6869
```
6970

7071
{{% /notice %}}
@@ -74,7 +75,7 @@ bash install_requirements.sh
7475
The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT.
7576

7677
For this conversion we include the following steps:
77-
1. Load the Conditioners submodule from the Stable Audio Open model configuration and checkpoint.
78+
1. Load the Conditioners submodule from the Stable Audio Open Small model configuration and checkpoint.
7879
2. Export the Conditioners submodule to ONNX via *torch.onnx.export()*.
7980
3. Convert the resulting ONNX file to LiteRT using *onnx2tf*.
8081

@@ -84,7 +85,9 @@ You can use the provided script to convert the Conditioners submodule:
8485
python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
8586
```
8687

87-
After successful conversion, you now have a `conditioners.onnx` model in your current directory.
88+
After successful conversion, you now have a `tflite_conditioners` directory containing models with different precisions (e.g., float16, float32).
89+
90+
We will be using the float32.tflite model for on-device inference.
8891

8992
### Convert DiT and AutoEncoder
9093

@@ -96,14 +99,12 @@ To convert the DiT and AutoEncoder submodules, use the [Generative API](https://
9699

97100
Convert the DiT and AutoEncoder submodules using the provided python script:
98101
```bash
99-
CUDA_VISIBLE_DEVICES="" python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
102+
python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
100103
```
101104

102-
After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory and can deactivate the virtual environment:
105+
After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory.
103106

104-
```bash
105-
deactivate
106-
```
107+
More detailed explanation of the above scripts is available [here](https://github.com/ARM-software/ML-examples/blob/main/kleidiai-examples/audiogen/scripts/README.md)
107108

108109
For easier access, we add all needed models to one directory:
109110

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -30,42 +30,35 @@ We can use `bazel` to build LiteRT libraries, first we use configure script to c
3030

3131
You can now create a custom TFLite build for android:
3232

33-
Ensure the `ANDROID_NDK` variable is set to your previously installed Android NDK:
33+
Ensure the `NDK_PATH` variable is set to your previously installed Android NDK:
3434
{{< tabpane code=true >}}
3535
{{< tab header="Linux">}}
36-
export ANDROID_NDK=$WORKSPACE/android-ndk-r25b/
37-
export PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
36+
export NDK_PATH=$WORKSPACE/android-ndk-r25b/
37+
export PATH=$NDK_PATH/toolchains/llvm/prebuilt/linux-x86_64/bin/:$PATH
3838
{{< /tab >}}
3939
{{< tab header="MacOS">}}
40-
export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''"
41-
export ANDROID_NDK=~/Library/Android/sdk/ndk/27.0.12077973/
42-
export PATH=$PATH:$ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin
43-
export PATH=$PATH:~/Library/Android/sdk/cmdline-tools/latest/bin
40+
export NDK_PATH=~/Library/Android/android-ndk-r25b
41+
export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin
4442
{{< /tab >}}
4543
{{< /tabpane >}}
46-
47-
Set the TensorFlow version
48-
49-
```bash
50-
export TF_CXX_FLAGS="-DTF_MAJOR_VERSION=0 -DTF_MINOR_VERSION=0 -DTF_PATCH_VERSION=0 -DTF_VERSION_SUFFIX=''"
51-
```
52-
53-
5444
Now you can configure TensorFlow. Here you can set the custom build parameters needed as follows:
5545

56-
```bash { output_lines = "2-14" }
46+
```bash { output_lines = "2-17" }
5747
python3 ./configure.py
5848
Please specify the location of python. [Default is $WORKSPACE/bin/python3]:
5949
Please input the desired Python library path to use. Default is [$WORKSPACE/lib/python3.10/site-packages]
6050
Do you wish to build TensorFlow with ROCm support? [y/N]: n
6151
Do you wish to build TensorFlow with CUDA support? [y/N]: n
6252
Do you want to use Clang to build TensorFlow? [Y/n]: n
6353
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: y
64-
Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]: /home/user/Workspace/tools/ndk/android-ndk-r25b
54+
Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]:
6555
Please specify the (min) Android NDK API level to use. [Available levels: [16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33]] [Default is 21]: 30
6656
Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]:
6757
Please specify the Android SDK API level to use. [Available levels: ['31', '33', '34', '35']] [Default is 35]:
6858
Please specify an Android build tools version to use. [Available versions: ['30.0.3', '34.0.0', '35.0.0']] [Default is 35.0.0]:
59+
Do you wish to build TensorFlow with iOS support? [y/N]: n
60+
61+
Configuration finished
6962
```
7063
7164
Once the bazel configuration is complete, you can build TFLite as follows:
@@ -77,7 +70,15 @@ bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so
7770
--define tflite_with_xnnpack_qu8=true
7871
```
7972
80-
This will produce a `libtensorflowlite.so` shared library for android with XNNPack enabled, which we will use to build the example next.
73+
We also build flatbuffers used by the application in the next steps:
74+
```
75+
cd $WORKSPACE/tensorflow_src
76+
mkdir flatc-native-build && cd flatc-native-build
77+
cmake ../tensorflow/lite/tools/cmake/native_tools/flatbuffers
78+
cmake --build .
79+
```
80+
81+
With flatbuffers and LiteRT built, we can now build our application for Android device.
8182
8283
8384

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md

Lines changed: 9 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ You'll now build a simple program that runs inference on all three submodules di
1212

1313
The program takes a text prompt as input and generates an audio file as output.
1414
```bash
15-
cd $WORKSPACE/audio-stale-open-litert/app
15+
cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app
1616
mkdir build && cd build
1717
```
1818

@@ -26,34 +26,22 @@ cmake -DCMAKE_TOOLCHAIN_FILE=$NDK_PATH/build/cmake/android.toolchain.cmake \
2626
-DFLATBUFFER_INCLUDE_PATH=$TF_SRC_PATH/flatc-native-build/flatbuffers/include \
2727
..
2828

29-
cmake --build . -j1
29+
make -j
3030
```
31+
After the example application builds successfully, a binary file named `audiogen` is created.
3132

32-
Since the tokenizer used in the audiogen application is based on SentencePiece, you’ll need to download the spiece.model file from:
33+
A SentencePiece model is a type of subword tokenizer which is used by the audiogen application, you’ll need to download the *spiece.model* file from:
3334
```bash
3435
https://huggingface.co/google-t5/t5-base/tree/main
3536
```
36-
we will save this model in `WORKSPACE` for ease of access.
37+
we will save this model in `WORKSPACE` for ease of access
3738
```text
38-
cp spiece.moel $WORKSPACE
39+
cp spiece.model $WORKSPACE
3940
```
40-
After the SAO example builds successfully, a binary file named `audiogen_main` is created.
4141

42-
Now use adb (Android Debug Bridge) to push the necessary files to the device:
43-
44-
```bash
45-
adb shell
46-
```
47-
48-
Create a directory for all the required resources:
42+
Now use adb (Android Debug Bridge) to push all necessary files into the `audiogen` folder on Android device:
4943
```bash
50-
cd /data/local/tmp
51-
mkdir audiogen
52-
exit
53-
```
54-
Push all necessary files into the `audiogen` folder on Android:
55-
```bash
56-
cd $WORKSPACE/audio-stale-open-litert/app/build
44+
cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app/build
5745
adb shell mkdir -p /data/local/tmp/app
5846
adb push audiogen /data/local/tmp/app
5947
adb push $LITERT_MODELS_PATH/conditioners_float32.tflite /data/local/tmp/app
@@ -68,6 +56,7 @@ Finally, run the program on your Android device:
6856
adb shell
6957
cd /data/local/tmp/app
7058
LD_LIBRARY_PATH=. ./audiogen . "warm arpeggios on house beats 120BPM with drums effect" 4
59+
exit
7160
```
7261

7362
The successful execution of the app will create `output.wav` of your chosen audio defined by the prompt, you can pull it back to your host machine and enjoy!

content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,10 @@ who_is_this_for: This is an introductory topic for developers looking to deploy
77

88
learning_objectives:
99
- Deploy the Stable Audio Open Small model on Android using LiteRT.
10-
- Build a simple program to generate audio.
11-
- Compile the application and for an Arm CPU.
10+
- Create a simple application to generate audio.
11+
- Compile the application for an Arm CPU.
1212
- Run the application on an Android smartphone and generate an audio snippet.
1313

14-
1514
prerequisites:
1615
- A Linux-based x86 or macOS development machine with at least 8 GB of RAM (tested on Ubuntu 20.04.4 LTS with x86_64).
1716
- A [HuggingFace](https://huggingface.co/) account.
@@ -38,9 +37,9 @@ operatingsystems:
3837

3938
further_reading:
4039
- resource:
41-
title: Introducing Stable Audio 2.0
42-
link: https://stability.ai/news/stable-audio-2-0
43-
type: documentation
40+
title: Stability AI and Arm Collaborate to Release Stable Audio Open Small, Enabling Real-World Deployment for On-Device Audio Generation
41+
link: https://stability.ai/news/stability-ai-and-arm-release-stable-audio-open-small-enabling-real-world-deployment-for-on-device-audio-control
42+
type: blog
4443
- resource:
4544
title: Stability AI optimized its audio generation model to run on Arm chips
4645
link: https://techcrunch.com/2025/03/03/stability-ai-optimized-its-audio-generation-model-to-run-on-arm-chips/

0 commit comments

Comments
 (0)