Using UNet1D to learn how to remove non-snore noises from snoring audio signals
This repository contains all the necessary scripts to prepare, preprocess, and optionally train a UNet1D model for snore source separation or denoising. The approach involves:
- Splitting and organizing raw data into
Dataset/Raw(viaprepareDataset.py). - Downsampling, normalizing, and augmenting the data for training (via
preprocessDataset.py). - Training a UNet1D to remove noise from snoring signals (denoising).
We used kaggle data as starting point and we applied minimal preprocessing involving:
- normalization
- downsampling to 16kHz
We chose 2 well-known architectures for this tasks:
- UNet1D [1]
- CNNAutoEncoder [2]
No significant changes were applied to the original architectures.
Later too lazy now
Later too
Our strategy yields the following results
| Model | Similarity of Denoised | Binary Classification |
|---|---|---|
| UNet1D [1] | 76.88 | 100 |
| CNNAutoEncoder [2] | 67.50+- | 100 |
| WaveUNet1D [3] | 83.17 | 100 |
| ResUNet1D [4] | xx.xx+-xx | 100 |
| AttentionUNet1D [5] | xx.xx+-xx | 100 |
- Download dataset from the Snoring Kaggle page
- Place your dataset where you prefer
- Create a virtual environment through
python3 -m venv venvthensource venv/bin/activate - Download dependencies by
pip install -r requirements.txt - Optionally, create a wandb account and change the key
wandb_entityonconfig.jsonfile accordingly.
All the results were tested on a single NVIDIA RTX A5000 GPU.
python prepareDataset.py
python preprocessDataset.py
python train_all.py
python inference_all.py
[1]
[2]