Description
This project aims to apply some basic techniques in signal processing for speaker recognition.
The following is the project workflow:
- Pre-processing of input audio signal
- Feature Extraction with LPC/MFCC
- Feature Matching with LBG
- Dataset Training
- User Matching
Data
- The speaker data was taken from the CSTR VCTK Corpus
- 8 speakers were taken from the entire dataset of 110 speakers
- Each speaker says the sentence "Please Call Stella" (first sentence from the corpus)
- For each speaker, the
mic1recordings were taken as train data andmic2recordings were taken as test data - The
.flacfiles were pre-processed to single-channel.wavfiles sampled at a frequency of 11025 Hz
Compatibility
This code has been developed and tested well in Windows with Jupter Notebook (Python) on Visual Studio Code.
Special Notes
- To fix the warning of a singular matrix whose determinant is zero, we have modified it to peform calculations on pseudo inverse (pinv). Additionally, we have set r[k][0] = 0.0001 due to NaN issues coupled with the previous warning. Using the above patch may or may not result in loss of accuracy.
- Conversion from LPC to LPCC has been performed but results are not as expected. Although, feel free to modify the code for testing purposes. Let us know your results !
On a database size of 8 speakers , both the LPC and MFCC algorithms have observed 100% accuracy.
References
- LPC literature
- MFCC literature
- Auto Correlation technique was taken from here
- K-means clustering reference
- This is our reference code flow
- Additional reference papers can found here