Automatic Speaker Recognition

Description

This project aims to apply some basic techniques in signal processing for speaker recognition.

The following is the project workflow:

Pre-processing of input audio signal
Feature Extraction with LPC/MFCC
Feature Matching with LBG
Dataset Training
User Matching

Data

The speaker data was taken from the CSTR VCTK Corpus
8 speakers were taken from the entire dataset of 110 speakers
Each speaker says the sentence "Please Call Stella" (first sentence from the corpus)
For each speaker, the mic1 recordings were taken as train data and mic2 recordings were taken as test data
The .flac files were pre-processed to single-channel .wav files sampled at a frequency of 11025 Hz

Compatibility

This code has been developed and tested well in Windows with Jupter Notebook (Python) on Visual Studio Code.

Special Notes

To fix the warning of a singular matrix whose determinant is zero, we have modified it to peform calculations on pseudo inverse (pinv). Additionally, we have set r[k][0] = 0.0001 due to NaN issues coupled with the previous warning. Using the above patch may or may not result in loss of accuracy.

Conversion from LPC to LPCC has been performed but results are not as expected. Although, feel free to modify the code for testing purposes. Let us know your results !

Results

On a database size of 8 speakers , both the LPC and MFCC algorithms have observed 100% accuracy.

References

LPC literature
MFCC literature
Auto Correlation technique was taken from here
K-means clustering reference
This is our reference code flow
Additional reference papers can found here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Speaker Recognition

Results

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Automatic Speaker Recognition

Results