Skip to content

birdnet-team/birdnet-stm32

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

birdnet-stm32

This repository contains code and resources for training a tiny audio classification model for bioacoustics. The model is designed to run on the STM32N6570-DK development board.

STM32N6570-DK board

(Image source: EBV Electronik)

STM32N6570-DK user manual: DM00570145

Setup (Ubuntu)

Clone this repository and navigate to the project directory:

git clone https://github.com/birdnet-team/birdnet-stm32.git
cd birdnet-stm32

We assume you have Python 3.12 installed. If not, you can install it using:

sudo apt install python3.12 python3.12-venv python3.12-dev

Then, create a virtual environment and activate it:

python3.12 -m venv venv
source venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Training

Deploying a model to the STM32N6570-DK is quite involved and requires us to:

  • download and prepare the dataset
  • train a model
  • convert the model
  • deploy the model to the STM32N6570-DK

Download and Prepare the Dataset

We'll use a subset of the iNatSounds dataset, which is available here: iNatSounds on GitHub

After downloading, sort files into folders with folder names as labels (i.e., species names) based on the train and test annotations. Therefore, this repo assumes that your data is structured as follows:

data/
├── train/
│   ├── species1/
|   |  ├── file1.wav
│   |  ├── file2.wav
│   |  └── ...
│   ├── species2/
│   └── ...
└── test/
    ├── species1/
    ├── species2/
    └── ... 

Each folder contains .wav files of the respective species. Since we're training a tiny model, we won't be able to fit all iNatSounds classes into our model. Thus, we will use a subset of species. However, it's your decision which species to use.

Train the Model

This repo comes with a pre-trained and already quantized checkpoint (checkpoints/birdnet_stm32n6_100.tflite) that you can use to test the model conversion and deployment process. To train a custom model, run train.py with the desired arguments.

The script will:

  • Split your training data into train and validation sets (--val_split).
  • Split audio files into fixed-length chunks (--chunk_duration) up to a max duration (--max_duration).
  • Generate spectrograms using a selectable frontend:
    • precomputed/librosa: mel spectrograms (magnitude, power=1.0) computed in utils/audio.py (mag_scale applied there).
    • hybrid: linear magnitude spectrogram (|STFT|) provided to the model; model applies fixed mel and optional magnitude scaling.
    • raw/tf: raw audio to model; model does windowed DFT -> linear magnitude -> fixed mel -> optional magnitude scaling.
  • Build a compact DS-CNN model with width scaling (--alpha) and depth multiplier (--depth_multiplier).
  • Optionally apply mixup augmentation (--mixup_alpha, --mixup_probability).
  • Train with cosine LR and early stopping; save the best model to .keras (--checkpoint_path).
  • Save a companion _model_config.json and _labels.txt.

Example:

python train.py \
  --data_path_train data/train \
  --val_split 0.2 \
  --audio_frontend hybrid \
  --mag_scale pcen \
  --checkpoint_path checkpoints/my_stm32_model.keras

Arguments:

  • --data_path_train: Path to your training dataset (required)
  • --max_samples: Max files per class for training (default: None = all)
  • --sample_rate: Audio sample rate (default: 22050)
  • --num_mels: Number of mel bins (default: 64)
  • --spec_width: Spectrogram width (frames) (default: 256)
  • --fft_length: FFT length for STFT/linear spec (default: 512)
  • --chunk_duration: Chunk duration in seconds (default: 3)
  • --max_duration: Max seconds to load per file (default: 60)
  • --audio_frontend: precomputed, hybrid, raw, librosa, or tf (default: hybrid)
  • --mag_scale: Magnitude compression: pcen | pwl | db | none (default: pwl)
  • --embeddings_size: Embedding channels before head (default: 256)
  • --alpha: Model width scaling (default: 1.0)
  • --depth_multiplier: Repeats per stage (default: 1)
  • --frontend_trainable: If set, make audio frontend trainable (mel_mixer/raw mixer/PCEN/PWL) (default: True)
  • --mixup_alpha: Mixup alpha (0 disables, default: 0.2)
  • --mixup_probability: Fraction of batch to mix (default: 0.25)
  • --batch_size: Batch size (default: 64)
  • --epochs: Number of epochs (default: 50)
  • --learning_rate: Initial LR (default: 0.001)
  • --val_split: Validation split (default: 0.2)
  • --checkpoint_path: Path to save best model (.keras)

Notes:

  • The model saves:
    • checkpoints/my_stm32_model.keras
    • checkpoints/my_stm32_model_model_config.json (conversion metadata)
    • checkpoints/my_stm32_model_labels.txt (class names)
  • Noise classes can be added by placing files under folders named 'noise', 'silence', 'background', or 'other'; these are treated as negatives.

Model conversion & validation

Run convert.py to convert the trained .keras model to a quantized TFLite model (float32 IO).

The script will:

  • Load the trained .keras model (with AudioFrontendLayer).
  • Read _model_config.json to reconstruct shapes/modes.
  • Build a representative dataset from training data (no SNR filtering this time).
  • Convert to TFLite with PTQ and save _quantized.tflite.
  • Optionally validate TFLite vs. Keras outputs on representative samples.

Example:

python convert.py \
  --checkpoint_path checkpoints/my_stm32_model.keras \
  --model_config  checkpoints/my_stm32_model_model_config.json \
  --data_path_train data/train \
  --validate \
  --validate_samples 50

Arguments:

  • --checkpoint_path: Path to the trained Keras model (.keras, required)
  • --model_config: Path to _model_config.json (optional; inferred if omitted)
  • --output_path: Path to save the .tflite (optional; inferred as _quantized.tflite)
  • --data_path_train: Path to training data for representative dataset (optional; random data used if omitted)
  • --reps_per_file: Representative samples to draw per file (default: 4)
  • --num_samples: Number of representative samples (default: 1024)
  • --validate: If set, runs a Keras vs. TFLite validation pass after conversion
  • --validate_samples: Max samples to use for validation (default: 128)

Notes:

  • Validation compares float32 Keras outputs to float32 TFLite outputs on the same inputs using cosine similarity, MSE, MAE, and Pearson r.
  • For meaningful validation, pass --data_path_train so samples come from real audio; otherwise random inputs will be used.
  • Validation samples are saved as .npz and can be used to validate the model on the STM32N6570-DK later with stedgeai validate --valinput <file>.npz.

Model testing

Use test.py to evaluate a trained model on a folder of test audio files. It loads a Keras (.keras) or quantized TFLite (.tflite) model, prepares chunks per file using the saved model_config, runs batched inference, pools chunk scores to file-level, and reports metrics.

Example (TFLite):

python test.py \
  --model_path checkpoints/my_stm32_model_quantized.tflite \
  --data_path_test data/test \
  --pooling avg \
  --batch_size 16

The script reads _model_config.json to reconstruct the audio frontend and chunking parameters, runs inference on non-overlapping chunks up to a max duration (default: 60s) per file and predicts score on file-level.

Pooling methods (per-file)

  • avg: arithmetic mean over chunk scores.
  • max: maximum over chunk scores.
  • lme: log-mean-exponential pooling, log(mean(exp(beta*s))) / beta with beta=10 (fixed).

Metrics

  • roc-auc: micro ROC-AUC over all classes.
  • cmAP: class-macro Average Precision (mean AP over classes with positives).
  • mAP: micro Average Precision (AP over all decisions).
  • precision, recall, f1: computed at a 0.5 threshold on file-level scores.
  • The script also prints top-10 and bottom-10 classes by AP and can save per-file predictions to CSV.

Arguments:

  • --model_path: Path to .keras or .tflite (required).
  • --model_config: Path to _model_config.json (optional; inferred from --model_path).
  • --data_path_test: Test data root with class subfolders (required).
  • --max_files: Optional cap on files per class (default: -1 = all).
  • --batch_size: Chunk inference batch size (default: 16).
  • --pooling: avg | max | lme (default: avg).
  • --overlap: Chunk overlap in seconds (max. chunk_duration - 0.1)
  • --save_csv: Optional CSV path for per-file predictions.

The STM deployment process

In order to deploy the model to the STM32N6570-DK, we will use STM's X-CUBE-AI framework, which provides tools for converting and deploying machine learning models on STM32 microcontrollers. The workflow involves several steps:

  1. Generate the model files using the STM32Cube.AI CLI tool.
  2. Load the model onto the board using the N6 loader script.
  3. Validate the model on the STM32N6570-DK to ensure it works as expected.
stm32_model_validation

(Image source: STM32ai)

Generate the Model Files

First, we'll use the STM32Cube.AI CLI to convert the trained model to a format suitable for deployment on the STM32N6570-DK. You can download the STM32Cube.AI CLI from the STMicroelectronics website.

After downloading, you should have x-cube-ai-linux-v10.2.0.zip, unzip and locate the CLI tool which is typically found in the Utilities/linux directory.

unzip x-cube-ai-linux-v10.2.0.zip X-CUBE-AI.10.2.0
cd X-CUBE-AI.10.2.0
unzip stedgeai-linux-10.2.0.zip

This should be your directory structure after unzipping both zips:

X-CUBE-AI.10.2.0/
├── STMicroelectronics.X-CUBE-AI.10.2.0.pack
├── Utilities/
│   ├── linux/
│   │   └── stedgeai  <-- CLI tool
│   |   └── ...
│   ├── windows/
│   └── ...
└── Middlewares/
└── ...

Now, we need to run model conversion using the CLI tool. Make sure you have your trained and converted model saved as a .tflite file.

Navigate to /path/to/X-CUBE-AI.10.2.0/Utilities/linux and run the stedgeai command to generate the model files for STM32N6570-DK:

cd Utilities/linux
./stedgeai generate \
  --model  /path/to/birdnet-stm32/checkpoints/my_stm32_model_quantized.tflite \
  --target stm32n6 \ 
  --st-neural-art \
  --output /path/to/birdnet-stm32/validation/st_ai_output \
  --workspace /path/to/birdnet-stm32/validation/st_ai_ws \
  --verbose

If you encounter the error arm-none-eabi-gcc: error: unrecognized -mcpu target: cortex-m55, it means you need to install the most recent ARM toolchain. You can do this by downloading the ARM toolchain from ARM Developer or using a package manager:

wget https://developer.arm.com/-/media/Files/downloads/gnu/14.3.rel1/binrel/arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi.tar.xz
tar xf arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi.tar.xz
export PATH=$PWD/arm-gnu-toolchain-14.3.rel1-x86_64-arm-none-eabi/bin:$PATH

Make sure you have the correct arm-none-eabi-gcc compiler installed and available in your PATH. You can check this by running:

arm-none-eabi-gcc --version

Note: After conversion, the tool will generate a network_generate_report.txt in the output folder which you can consult to get some basic metrics on model computer requirements. If you run the command above with analyze instead of generate, it will analyze the model and provide more detailed information about its size, memory usage, and performance.

Load the Model onto the STM32N6570-DK

Connect your STM32N6570-DK board to your computer and ensure it is recognized by running:

ls /dev/ttyACM*

If you see a device like /dev/ttyACM0, you can proceed with the validation.

Install STM32CubeProgrammer

Next, install STM32CubeProgrammer on your computer. You can download it from the STMicroelectronics website. Unzip and run ./SetupSTM32CubeProgrammer-2.20.0.linux to install it. This will launch a GUI installer. Follow the instructions to complete the installation.

Verify the installation by navigating to the installation directory and running the command:

<path-to-install-dir>/STM32Cube/STM32CubeProgrammer/bin/STM32_Programmer_CLI --version

Now, add the STM32CubeProgrammer CLI to your PATH:

export PATH=$PATH:/<path-to-install-dir>/STM32Cube/STM32CubeProgrammer/bin

Add your user to the plugdev and dialout groups:

sudo usermod -aG plugdev $USER
sudo usermod -aG dialout $USER

Install STMicroelectronics udev rules: If you haven't already, copy the rules file:

sudo cd <path-to-install-dir>/STM32Cube/STM32CubeProgrammer/Drivers/rules/
sudo cp *.* /etc/udev/rules.d
sudo udevadm control --reload-rules
sudo udevadm trigger

Unplug and replug your STM32N6570-DK board to apply the new rules, reboot your computer, or log out and log back in.

Check if the board is connected and recognized by the STM32CubeProgrammer CLI:

STM32_Programmer_CLI --list

If everything is set up correctly, you should see your STM32N6570-DK board listed.

Install STM32CubeIDE

If you haven't installed STM32CubeIDE yet, you can download it from the STMicroelectronics website. Unzip and run the installer with ./st-stm32cubeide_1.19.0_25607_20250703_0907_amd64.sh. Follow the installation instructions for your platform.

Setting paths for N6 loader script

Create a config_n6l.json file and copy the lines below; change the paths to point to your generated network.cand the NPU_Validation project in the X-CUBE-AI.10.2.0/Projects/STM32N6570-DK/Applications directory.

{	
  "network.c": "/path/to/birdnet-stm32/validation/st_ai_output/network.c",
  "project_path": "/path/to/Code/X-CUBE-AI.10.2.0/Projects/STM32N6570-DK/Applications/NPU_Validation",
  "project_build_conf": "N6-DK",
  "skip_external_flash_programming": false,
  "skip_ram_data_programming": false,
  "objcopy_binary_path": "/usr/bin/arm-none-eabi-objcopy"
}

Update the config.json in the X-CUBE-AI.10.2.0/scripts/N6_scripts directory to point to your STM32CubeIDE installation path:

{
  "compiler_type": "gcc",
  "cubeide_path":"/path/to/stm32cubeide"
}

Set board to DEV mode

  • disconnect the board from USB
  • set BOOT0 to right
  • set BOOT1 to left
  • set JP2 to position 1-2
  • reconnect the board to USB

See the image below for reference:

Set STM32N6570-DK to dev mode

(Image source: ST Community)

Running n6_loader.py

Navigate to the validation directory in this repo and run the n6_loader.py script from the X-CUBE-AI scipt directory and pass the config_n6l.json file as an argument:

python <path-to-install-dir>/X-CUBE-AI.10.2.0/scripts/N6_scripts/n6_loader.py --n6-loader-config /path/to/birdnet-stm32/config_n6l.json

If the build fails, check the n6_loader.log and compile.log files in the validation directory for errors. If you encounter issues, ensure that the paths in config.json and config_n6l.json are correct and that the necessary tools are installed.

If successful, ouputs should look like this:

XXX  __main__ -- Preparing compiler GCC
XXX  __main__ -- Setting a breakpoint in main.c at line 137 (before the infinite loop)
XXX  __main__ -- Copying network.c to project: -> /path/to/X-CUBE-AI.10.2.0/Projects/STM32N6570-DK/Applications/NPU_Validation/X-CUBE-AI/App/network.c
XXX  __main__ -- Extracting information from the c-file
XXX  __main__ -- Converting memory files in results/<model>/generation/ to Intel-hex with proper offsets
XXX  __main__ -- arm-none-eabi-objcopy --change-addresses 0x71000000 -Ibinary -Oihex network_atonbuf.xSPI2.raw network_atonbuf.xSPI2.hex
XXX  __main__ -- Resetting the board...
XXX  __main__ -- Flashing memory xSPI2 -- 1 659.665 kB
XXX  __main__ -- Building project (conf= N6-DK)
XXX  __main__ -- Loading internal memories & Running the program
XXX  __main__ -- Start operation achieved successfully

Validate the Model on STM32N6570-DK

Now, we can finally validate the model on the STM32N6570-DK, you can use the validate command after navigating to the X-CUBE-AI.10.2.0/Utilities/linux directory:

./stedgeai validate \
  --model  /path/to/birdnet-stm32/checkpoints/my_stm32_model_quantized.tflite \
  --target stm32n6 \ 
  --mode target \
  --desc serial:921600 \
  --output /path/to/birdnet-stm32/validation/st_ai_output \
  --workspace /path/to/birdnet-stm32/validation/st_ai_ws \
  --valinput /path/to/birdnet-stm32/checkpoints/my_stm32_model_quantized_validation_data.npz \
  --classifier \
  --verbose

Make sure to pass the quantized tflite file you converted earlier, the validation script will validate on-device outputs vs. the reference model.

You might have to run sudo chmod a+rw /dev/ttyACM0 to give your user permission to access the serial port.

Note: STM provides a "Getting Started" guide for the STM32N6, which you can find here in case you need more detailed instructions on setting up the board and running the validation.

If everything is set up correctly, the validate command will run inference on the STM32N6570-DK and print the results to the console. After the validation is complete, you should see a network_validate_report.txt file in the validation/st_ai_output directory with the validation results.

For more command line options, visit the ST Edge AI documentation.

Build and deploy demo application

This repo comes with a pre-trained model (birdnet_stm32n6_100.tflite) that was trained on the 100 most common species of the North-Eastern U.S., Central Europe, and Brazil and achieves a ROC-AUC of 0.84 on iNatSounds data. The model uses a hybrid audio frontend and expects 257x256 pixel power spectrograms with a fft length of 512, hop length of 258 for 3-second audio chunks at 22050 Hz sample rate (more details in the birdnet_stm32n6_100_model_config.json file).

The model is already quantized and can be flashed with ./stedgeai as described above. Inference for a single chunk takes about 3.3 ms on the STM32N6570-DK (which is ~900x real-time).

The demo application is still TODO.

Here is a rough outline of the steps you would typically follow to build a demo application:

  • record audio using the on-board microphone
  • run the fft on a 512-sample frame and accumulate into a ring buffer
  • run inference every second on the last 3 seconds of audio drawn from the ring buffer
  • map prediction scores to labels using the labels.txt file
  • log the top-5 predictions to the serial console

License

  • Source Code & models: The source code and models for this project are licensed under the MIT License.
  • STM tools and scripts: The STM tools and scripts used in this project are licensed under varying licenses, please refer to the respective documentation for details.
  • Citation: Feel free to use the code or models in your research. If you do, please cite as:
@article{kahl2021birdnet,
  title={BirdNET: A deep learning solution for avian diversity monitoring},
  author={Kahl, Stefan and Wood, Connor M and Eibl, Maximilian and Klinck, Holger},
  journal={Ecological Informatics},
  volume={61},
  pages={101236},
  year={2021},
  publisher={Elsevier}
}

Funding

Our work in the K. Lisa Yang Center for Conservation Bioacoustics is made possible by the generosity of K. Lisa Yang to advance innovative conservation technologies to inspire and inform the conservation of wildlife and habitats.

The development of BirdNET is supported by the German Federal Ministry of Research, Technology and Space (FKZ 01|S22072), the German Federal Ministry for the Environment, Climate Action, Nature Conservation and Nuclear Safety (FKZ 67KI31040E), the German Federal Ministry of Economic Affairs and Energy (FKZ 16KN095550), the Deutsche Bundesstiftung Umwelt (project 39263/01) and the European Social Fund.

Partners

BirdNET is a joint effort of partners from academia and industry. Without these partnerships, this project would not have been possible. Thank you!

Logos of all partners

About

Code for training and deployment of a tiny acoustic model for the STM32N6

Topics

Resources

License

Stars

Watchers

Forks

Languages