Skip to content

TexasInstruments-Sandbox/audioai-modelzoo

Repository files navigation

AudioAI-ModelZoo

A collection of optimized Deep Neural Network (DNN) models for Audio Tasks on TI EdgeAI processors. Models are converted from PyTorch and TensorFlow into embedded-friendly formats optimized for TI SoCs.

Notice: The models in this repository are being made available for experimentation and development - they are not meant for deployment in production.

System Requirements

  • Processors: AM62A
  • TIDL Version: 11_01_06_00

Quick Start

Git pull the project

On the Linux command line on the target (AM62A)

mkdir -p ~/tidl && cd ~/tidl
git clone https://github.com/TexasInstruments-Sandbox/audioai-modelzoo.git
cd audioai-modelzoo

Download Models and Model Artifacts

./download_models.sh -y
./download_artifacts.sh -y

Both scripts provide interactive menus to select and download models.

Docker Image Setup

This repository uses a two-stage Docker build process (see docker folder). The base image contains all dependencies and is pre-built and available from GitHub Container Registry. The TI-specific image adds processor-specific libraries on top of the base.

Pull the pre-built base image and build the TI image:

docker pull ghcr.io/texasinstruments-sandbox/audioai-base:11.1.0
docker tag ghcr.io/texasinstruments-sandbox/audioai-base:11.1.0 audioai-base:11.1.0
cd docker
./docker_build_ti.sh

If you want to build the base image from scratch instead of pulling it, run ./docker_build_base.sh before building the TI image.

Start Jupyter Server

Launch the container

~/tidl/audioai-modelzoo/docker/docker_run.sh

Inside container, start Jupyter Lab

./jupyter_lab.sh

The script will display a highlighted access URL. Open it in your browser to access Jupyter Lab with three inference notebooks pre-loaded in tabs.

JupyterLab running inside the Docker container

Pre-Trained Models

Models are located in the models folder.

Speech Enhancement (Audio-to-Audio)

GTCRN

Inference in Jupyter Notebook: inference/gtcrn_se/gtcrn_inference.ipynb

Sound Classification (Audio-to-Class)

VGGish11

Inference in Jupyter Notebook: inference/vggish11_sc/vggish_inference.ipynb

Python script version: Below should be run inside the Docker container.

cd ~/tidl/audioai-modelzoo/inference/vggish11_sc
python3 vggish_infer_audio.py --audio-file sample_wav/139951-9-0-9.wav

YAMNet

Inference in Jupyter Notebook: inference/yamnet_sc/yamnet_inference.ipynb

Python script version: Below should be run inside the Docker container.

cd ~/tidl/audioai-modelzoo/inference/yamnet_sc
python3 yamnet_infer_audio.py --audio-file samples/miaow_16k.wav

Performance Benchmarks

Model Input Audio (sec) Inference Time (ms) Real-Time Factor
GTCRN (FP32) 9.77 679.90 0.070
VGGish11 (INT8) 4.00 8.88 0.002
YAMNet (INT8) 6.73 (7 patches) 17.53 total 0.003

Note: Real-Time Factor (RTF) = Processing Time / Audio Duration. RTF < 1.0 means faster than real-time. Performance metrics may vary depending on system conditions.

Model References

About

Audio AI Model Zoo

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages