Non-Vocal Sound Classifier

A GPT/Whisper-based system for identification and transcription of non-vocal sounds, such as sirens, falling objects, collisions, vehicle engines, etc.

Description

This project is part of a research work aimed at developing a system capable of transforming non-vocal sounds into textual descriptions. We use a model based on OpenAI's Whisper, fine-tuned to identify different categories of environmental sounds.

Project Structure

audio-classifier/
├── src/
│ ├── ml/ # Machine learning modules
| | ├── trained_model/ # Whisper model
│ ├── backend/ # FastAPI API
│ └── frontend/ # Web interface (Flask)
├── data/
| ├── report/ # Project documentation
│ ├── sounds/ # Training data
└── reports/

Features

Web interface for uploading or recording audio
REST API for processing and classifying audio
Support for .wav files
Automatic processing to 16kHz frequency
30-second limit per audio
Model trained to identify various categories of non-vocal sounds

Requirements

Python 3.8+
PyTorch
Whisper
FastAPI
Flask
Other dependencies specified in requirements.txt

Download and Installation

Source Code

Clone the repository:

git clone https://github.com/rubemalmeida/audio-classifier.git
cd audio-classifier

Install dependencies: pip install -r requirements.txt

Trained Model

The trained model files are not included in the repository due to their size. Download them from Google Drive:

Download the trained model: Google Drive Link
Extract the downloaded zip file
Place the model files in src/ml/ directory

Research Report

To download out research report:

Access the PDF directly: relatorio.pdf
Or find it in the data/reports/ directory after cloning the repository

Running the Application

Backend API

Start the FastAPI backend server:

python -m src.backend.main

The API will be available at http://localhost:8000.

Frontend Web Interface

In a separate terminal, start the Flask frontend server:

python -m src.frontend.app

The web interface will be available at http://localhost:5000

Usage

Ensure both backend and frontend servers are running
Open your web browser and navigate to http://localhost:5000
Upload an audio file (.wav format) or record a new one using your microphone
Click on "Classify Sound"
View the classification results showing the detected sound type and confidence level

Model Training

To train a new model:

# Method 1: Using the Jupyter notebook
jupyter notebook src/ml/train.ipynb

# Method 2: Using the training script
python src/ml/train.py --audio_dir "data/sounds" --model_size "small" --epochs 10

Results

Some of the results obtained from the training process are shown below:

Figure 1: Accuracy rate per epoch

Figure 2: Loss rate per epoch

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data/reports		data/reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Non-Vocal Sound Classifier

Description

Project Structure

Features

Requirements

Download and Installation

Source Code

Trained Model

Research Report

Running the Application

Backend API

Frontend Web Interface

Usage

Model Training

Results

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

rubemalmeida/audio-classifier

Folders and files

Latest commit

History

Repository files navigation

Non-Vocal Sound Classifier

Description

Project Structure

Features

Requirements

Download and Installation

Source Code

Trained Model

Research Report

Running the Application

Backend API

Frontend Web Interface

Usage

Model Training

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages