Merge pull request #1962 from annietllnd/review

pareenaverma · web-flow · commit 2f12f8a84c9c · 2025-05-15T08:04:20.000-05:00
Update Audiogen LP
diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/1-prerequisites.md
@@ -74,7 +74,7 @@ See the [CMake install guide](/install-guides/cmake/) for troubleshooting instru
 
 ### Install Bazel
 
-Bazel is an open-source build tool which we will use to build LiteRT libraries.
+Bazel is an open-source build tool which you will use to build LiteRT libraries.
 
 {{< tabpane code=true >}}
   {{< tab header="Linux">}}
diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/2-testing-model.md
@@ -26,9 +26,11 @@ Download and copy the configuration file `model_config.json` and the model itsel
 ls $WORKSPACE/model_config.json $WORKSPACE/model.ckpt
 ```
 
-You can see more information about this model [here](https://huggingface.co/stabilityai/stable-audio-open-small).
+You can learn more about this model [here](https://huggingface.co/stabilityai/stable-audio-open-small).
 
-A good prompt for this model can include:
+### Good prompting practices
+
+A good prompt for this audio generation model can include:
 
 * Music genre and subgenre.
 * Musical elements (texture, rhythm and articulation).
diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/3-converting-model.md
@@ -5,29 +5,33 @@ weight: 4
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
+In this section, you will learn about the audio generation model. You will then clone a repository to run conversion steps, which is needed to generate the inference application.
 
 ## Stable Audio Open Small
 
+The open-sourced model includes three main parts. They are described in the table below, and come together through the pipeline shown in the image.
+
 |Submodule|Description|
 |------|------|
 |Conditioners| Includes a T5-based text encoder for the input prompt and a numerical duration encoder. These components convert the inputs into embeddings passed to the DiT model. |
 |Diffusion Transformer (DiT)| Denoises random noise over multiple steps to produce structured latent audio, guided by conditioner embeddings. |
 |AutoEncoder| Compresses audio waveforms into a latent representation for processing by the DiT model, and decompresses the output back into audio. |
 
-The submodules work together to provide the pipeline as shown below:
+
 ![Model structure#center](./model.png)
 
-As part of this section, we will explore two different conversion routes, to convert the submodules to [LiteRT](https://ai.google.dev/edge/litert) format.
+In this section, you will explore two different conversion routes, to convert the submodules to [LiteRT](https://ai.google.dev/edge/litert) format. Both methods will be run using Python wrapper scripts from the examples repository.
 
-1. ONNX --> LiteRT using the onnx2tf tool. This is the traditional two-step approach (PyTorch --> ONNX--> LiteRT). We will use it to convert the Conditioners submodule.
+1. **ONNX to LiteRT**: using the `onnx2tf` tool. This is the traditional two-step approach (PyTorch -> ONNX -> LiteRT). You will use it to convert the Conditioners submodule.
 
-2. PyTorch --> LiteRT using the Google AI Edge Torch tool. We will use this tool to convert the DiT and AutoEncoder submodules.
+2. **PyTorch to LiteRT**: using the Google AI Edge Torch tool. You will use this tool to convert the DiT and AutoEncoder submodules.
 
-### Create virtual environment and install dependencies
+
+## Download the sample code
 
 The Conditioners submodule is made of the T5Encoder model. You will use the ONNX to TFLite conversion for this submodule.
 
-To avoid dependency issues, create a virtual environment. In this guide, we will use `virtualenv`:
+To avoid dependency issues, create a virtual environment. For example, you can use the following command:
 
 ```bash
 cd $WORKSPACE
@@ -43,7 +47,7 @@ git clone https://github.com/ARM-software/ML-examples.git
 cd ML-examples/kleidiai-examples/audiogen/
 ```
 
-We now install the needed python packages for this, including *onnx2tf* and *ai_edge_litert*
+Install the needed Python packages for this, including *onnx2tf* and *ai_edge_litert*
 
 ```bash
 bash install_requirements.sh
@@ -72,9 +76,9 @@ pip install triton==3.2.0
 
 ### Convert Conditioners Submodule
 
-The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT.
+The Conditioners submodule is based on the T5Encoder model. First, convert it to ONNX, then to LiteRT.
 
-For this conversion we include the following steps:
+For this conversion, the following steps are needed:
 1. Load the Conditioners submodule from the Stable Audio Open Small model configuration and checkpoint.
 2. Export the Conditioners submodule to ONNX via *torch.onnx.export()*.
 3. Convert the resulting ONNX file to LiteRT using *onnx2tf*.
@@ -87,26 +91,27 @@ python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config
 
 After successful conversion, you now have a `tflite_conditioners` directory containing models with different precisions (e.g., float16, float32).
 
-We will be using the float32.tflite model for on-device inference.
+You will be using the float32.tflite model for on-device inference.
 
 ### Convert DiT and AutoEncoder
 
-To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to tflite using three main steps:
+To convert the DiT and AutoEncoder submodules, use the [Generative API](https://github.com/google-ai-edge/ai-edge-torch/tree/main/ai_edge_torch/generative/) provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to `.tflite` using three main steps:
 
 1. Model re-authoring.
 2. Quantization.
 3. Conversion.
 
-Convert the DiT and AutoEncoder submodules using the provided python script:
+Convert the DiT and AutoEncoder submodules using the provided Python script:
+
 ```bash
 python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
 ```
 
 After successful conversion, you now have `dit_model.tflite` and `autoencoder_model.tflite` models in your current directory.
 
-More detailed explanation of the above scripts is available [here](https://github.com/ARM-software/ML-examples/blob/main/kleidiai-examples/audiogen/scripts/README.md)
+A more detailed explanation of the above scripts is available [here](https://github.com/ARM-software/ML-examples/blob/main/kleidiai-examples/audiogen/scripts/README.md)
 
-For easier access, we add all needed models to one directory:
+For easy access, add all needed models to one directory:
 
 ```bash
 export LITERT_MODELS_PATH=$WORKSPACE/litert-models
diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/4-building-litert.md
@@ -8,7 +8,7 @@ layout: learningpathall
 
 ## LiteRT
 
-LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI.
+LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. Designed for low-latency, resource-efficient execution, LiteRT is optimized for mobile and embedded environments — making it a natural fit for Arm CPUs running models lite Stable Audio Open Small. You will build the runtime using the framework using the Bazel build tool.
 
 ## Build LiteRT libraries
 
@@ -20,16 +20,15 @@ git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
 cd tensorflow_src
 ```
 
-We will use a specific commit of tensorflow for build so you can checkout and set the `TF_SRC_PATH`:
+Check out the specified commit of TensorFlow, and set the `TF_SRC_PATH`:
 ```bash
 git checkout 84dd28bbc29d75e6a6d917eb2998e4e8ea90ec56
 export TF_SRC_PATH=$(pwd)
 ```
 
-We can use `bazel` to build LiteRT libraries, first we use configure script to create a custom configuration for this:
-
-You can now create a custom TFLite build for android:
+A script is available to configure the `bazel` build environment. Run it to create a custom TFLite build for Android:
 
+{{% notice Reminder %}}
 Ensure the `NDK_PATH` variable is set to your previously installed Android NDK:
 {{< tabpane code=true >}}
   {{< tab header="Linux">}}
@@ -41,27 +40,31 @@ export NDK_PATH=~/Library/Android/android-ndk-r25b
 export PATH=$PATH:$NDK_PATH/toolchains/llvm/prebuilt/darwin-x86_64/bin
   {{< /tab >}}
 {{< /tabpane >}}
-Now you can configure TensorFlow. Here you can set the custom build parameters needed as follows:
+{{% /notice  %}}
+
+The configuration script is interactive. Run it using the command below, and use the table to set the parameters for this Learning Path use-case.
 
-```bash { output_lines = "2-17" }
+```bash
 python3 ./configure.py
-Please specify the location of python. [Default is $WORKSPACE/bin/python3]:
-Please input the desired Python library path to use. Default is [$WORKSPACE/lib/python3.10/site-packages]
-Do you wish to build TensorFlow with ROCm support? [y/N]: n
-Do you wish to build TensorFlow with CUDA support? [y/N]: n
-Do you want to use Clang to build TensorFlow? [Y/n]: n
-Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: y
-Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]:
-Please specify the (min) Android NDK API level to use. [Available levels: [16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33]] [Default is 21]: 30
-Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]:
-Please specify the Android SDK API level to use. [Available levels: ['31', '33', '34', '35']] [Default is 35]:
-Please specify an Android build tools version to use. [Available versions: ['30.0.3', '34.0.0', '35.0.0']] [Default is 35.0.0]:
-Do you wish to build TensorFlow with iOS support? [y/N]: n
-
-Configuration finished
 ```
 
-Once the bazel configuration is complete, you can build TFLite as follows:
+|Question|Input|
+|---|---|
+|Please specify the location of python. [Default is $WORKSPACE/bin/python3]:| Enter (default) |
+|Please input the desired Python library path to use[$WORKSPACE/lib/python3.10/site-packages] | Enter |
+|Do you wish to build TensorFlow with ROCm support? [y/N]|N (No)|
+|Do you wish to build TensorFlow with CUDA support?|N|
+|Do you want to use Clang to build TensorFlow? [Y/n]|N|
+|Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]|y (Yes) |
+|Please specify the home path of the Android NDK to use. [Default is /home/user/Android/Sdk/ndk-bundle]| Enter |
+|Please specify the (min) Android NDK API level to use. [Default is 21] | 27 |
+|Please specify the home path of the Android SDK to use. [Default is /home/user/Android/Sdk]| Enter |
+|Please specify the Android SDK API level to use.  [Default is 35]| Enter |
+|Please specify an Android build tools version to use.  [Default is 35.0.0]| Enter |
+|Do you wish to build TensorFlow with iOS support? [y/N]:| n |
+
+Once the Bazel configuration is complete, you can build TFLite as follows:
+
 ```console
 bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so \
     --define tflite_with_xnnpack=true \
@@ -70,15 +73,15 @@ bazel build -c opt --config android_arm64 //tensorflow/lite:libtensorflowlite.so
     --define tflite_with_xnnpack_qu8=true
 ```
 
-We also build flatbuffers used by the application in the next steps:
+The final step is to build flatbuffers used by the application:
 ```
 cd $WORKSPACE/tensorflow_src
 mkdir flatc-native-build && cd flatc-native-build
 cmake ../tensorflow/lite/tools/cmake/native_tools/flatbuffers
 cmake --build .
 ```
 
-With flatbuffers and LiteRT built, we can now build our application for Android device.
+With flatbuffers and LiteRT built, you can now build the application for Android devices.
 
 
 
diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/5-creating-simple-program.md
@@ -8,15 +8,17 @@ layout: learningpathall
 
 ## Create and build a simple program
 
-You'll now build a simple program that runs inference on all three submodules directly on an Android device.
+As a final step, you'll now build a simple program that runs inference on all three submodules directly on an Android device.
 
 The program takes a text prompt as input and generates an audio file as output.
+
 ```bash
 cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app
 mkdir build && cd build
 ```
 
-Ensure the NDK path is set correctly and build with cmake:
+Ensure the NDK path is set correctly and build with `cmake`:
+
 ```bash
 cmake -DCMAKE_TOOLCHAIN_FILE=$NDK_PATH/build/cmake/android.toolchain.cmake \
       -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
@@ -31,15 +33,30 @@ make -j
 After the example application builds successfully, a binary file named `audiogen` is created.
 
 A SentencePiece model is a type of subword tokenizer which is used by the audiogen application, you’ll need to download the *spiece.model* file from:
+
 ```bash
-https://huggingface.co/google-t5/t5-base/tree/main
+cd $WORKSPACE
+wget https://huggingface.co/google-t5/t5-base/tree/main
 ```
-we will save this model in `WORKSPACE` for ease of access
+
+Verify this model was downloaded to your `WORKSPACE`.
+
 ```text
-cp spiece.model $WORKSPACE
+ls $WORKSPACE/spiece.model
 ```
 
-Now use adb (Android Debug Bridge) to push all necessary files into the `audiogen` folder on Android device:
+Connect your Android device to your development machine using a cable. adb (Android Debug Bridge) is available as part of the Android SDK. You should see your device on running the following command.
+
+```bash
+adb devices
+```
+
+```output
+<DEVICE ID>     device
+```
+
+Note that you may have to approve the connection on your phone for this to work. Now, use `adb` to push all necessary files into the `audiogen` folder on Android device:
+
 ```bash
 cd $WORKSPACE/ML-examples/kleidiai-examples/audiogen/app/build
 adb shell mkdir -p /data/local/tmp/app
@@ -51,15 +68,24 @@ adb push $WORKSPACE/spiece.model /data/local/tmp/app
 adb push ${TF_SRC_PATH}/bazel-bin/tensorflow/lite/libtensorflowlite.so /data/local/tmp/app
 ```
 
-Finally, run the program on your Android device:
-```
+Start a new shell to access the device's system from your development machine:
+
+```bash
 adb shell
+```
+
+Finally, run the program on your Android device. Play around with the advice from [Download the model](../2-testing-model) section.
+
+```bash
 cd /data/local/tmp/app
 LD_LIBRARY_PATH=. ./audiogen . "warm arpeggios on house beats 120BPM with drums effect" 4
 exit
 ```
 
 The successful execution of the app will create `output.wav` of your chosen audio defined by the prompt, you can pull it back to your host machine and enjoy!
+
 ```bash
 adb pull /data/local/tmp/app/output.wav
 ```
+
+You should now have gained hands-on experience running the Stable Audio Open Small model with LiteRT on Arm-based devices. This includes setting up the environment, optimizing the model for on-device inference, and understanding how efficient runtimes like LiteRT make low-latency generative AI possible at the edge. You’re now better equipped to explore and deploy AI-powered audio applications on mobile and embedded platforms.
diff --git a/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md b/content/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt/_index.md
@@ -6,7 +6,7 @@ minutes_to_complete: 30
 who_is_this_for: This is an introductory topic for developers looking to deploy the Stable Audio Open Small text-to-audio model using LiteRT on an Android device.
 
 learning_objectives:
-    - Deploy the Stable Audio Open Small model on Android using LiteRT.
+    - Download  and learn about the Stable Audio Open Small.
     - Create a simple application to generate audio.
     - Compile the application for an Arm CPU.
     - Run the application on an Android smartphone and generate an audio snippet.