Speech-recognition

The project is closely related to Automatic Speech Recognition except that instead of recognizing the continuous speech constituting of many different words and sentences, we will be recognizing a small set of words and label remaining as unknown or silence. The training and test data contains wave files of nearly one second long each one uttering a word, noise or just silence. The datasets required to train this project are provided by Google as part of a kaggle speech recognition challenge (kaggle).

Usage:

Download the train and test data from the below link. https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/data

Maintain the folder structure as ipython notebook data/train etc.

create a folder called silence in data/train/audio and run DataExploration.ipynb to genrate few silence data samples from background_noise wav files.

remove background_noise folder from data/train/audio and run Speech_recognition_Submit.ipynb notebook.

Improvement:

The dimensionality of features is huge and try using PCA to remove some dimensions of negligible variance before passing it to the model for training.
Though the MFCC features are state of the art in speech recognition based on the literature, we should try other features like raw wave, fft, log mel features etc and combinations of different features as well.
We should try using few other neural network layers for mfcc features and see how the model performs. Some of them might include: a. Convolution LSTM. b. VGGbn19. c. InceptionResnetV2. d. Densenet201 etc.
I should try other models like XGBoost (LGBM implementation, as it is computationally effective), random forest kind of ensemble models and see how they perform on this Multi-classification problem.
I should either setup or rent GPU to run these models as these will take days if I were to run on normal CPU. I need to make use floydhub, paperspace or vectordash in future.
I believe, experimenting all these options with proper computational power will definitely enhance my final benchmark model.
Listen to mislabeled samples in validation set.
Data augmentation: Add heavy noise augmentation while keeping noise vs signal ratio below 2.

References:

Boiler Plate : https://www.kaggle.com/davids1992/speech-representation-and-data-exploration

MFCC : https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html

Keras : https://keras.io/layers/recurrent/#lstm https://ideasforeversite.wordpress.com/2018/03/18/automatic-speech-recognition/ http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

Voice Activity Detection : https://pypi.python.org/pypi/webrtcvad , https://github.com/wiseman/py-webrtcvad/blob/master/example.py, https://github.com/marsbroshok/VAD-python, https://github.com/marsbroshok/VAD-python/blob/master/detectVoiceInWave.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
DataExploration.ipynb		DataExploration.ipynb
Machine Learning Nanodegree Capstone Proposal - rev4.pdf		Machine Learning Nanodegree Capstone Proposal - rev4.pdf
README.md		README.md
SpeechRecognitionReport.pdf		SpeechRecognitionReport.pdf
Speech_recognition - V3.ipynb		Speech_recognition - V3.ipynb
Speech_recognition-V2.ipynb		Speech_recognition-V2.ipynb
Speech_recognition.ipynb		Speech_recognition.ipynb
Speech_recognition_Submit.ipynb		Speech_recognition_Submit.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-recognition

Usage:

Improvement:

References:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-recognition

Usage:

Improvement:

References:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages