Skip to content

Deep learning model for converting non-speech audio into text descriptions using fine-tuned Whisper. Transform environmental sounds, machine noises, and audio events into meaningful text output.

License

Notifications You must be signed in to change notification settings

clickrapha/Whisper-for-background-sounds

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Audio to Text Classifier

A deep learning model based on Whisper that converts non-speech audio into textual descriptions.

Overview

This project uses OpenAI's Whisper model, fine-tuned to classify and describe non-speech audio signals into text. It can be used for various applications such as environmental sound classification, machine sound analysis, and audio event detection.

Installation

git clone https://github.com/rubemalmeida/audio-classifier
cd audio-classifier
pip install -r requirements.txt

Usage

Training

Place your audio files in folders named after their categories under the sounds directory:

sounds/
├── categoryX/
│ ├── sound1.wav
│ └── sound2.wav
└── category2/
  ├── sound3.wav
  └── sound4.wav

Then run:

python src/train.py

Inference

python src/infer.py

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • See requirements.txt for full list

License

MIT

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

About

Deep learning model for converting non-speech audio into text descriptions using fine-tuned Whisper. Transform environmental sounds, machine noises, and audio events into meaningful text output.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%