Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 2.21 KB

File metadata and controls

40 lines (30 loc) · 2.21 KB

Automatic Speaker Recognition

Description

This project aims to apply some basic techniques in signal processing for speaker recognition.

The following is the project workflow:

  1. Pre-processing of input audio signal
  2. Feature Extraction with LPC/MFCC
  3. Feature Matching with LBG
  4. Dataset Training
  5. User Matching

Data

  • The speaker data was taken from the CSTR VCTK Corpus
  • 8 speakers were taken from the entire dataset of 110 speakers
  • Each speaker says the sentence "Please Call Stella" (first sentence from the corpus)
  • For each speaker, the mic1 recordings were taken as train data and mic2 recordings were taken as test data
  • The .flac files were pre-processed to single-channel .wav files sampled at a frequency of 11025 Hz

Compatibility

This code has been developed and tested well in Windows with Jupter Notebook (Python) on Visual Studio Code.

Special Notes

  • To fix the warning of a singular matrix whose determinant is zero, we have modified it to peform calculations on pseudo inverse (pinv). Additionally, we have set r[k][0] = 0.0001 due to NaN issues coupled with the previous warning. Using the above patch may or may not result in loss of accuracy.
  • Conversion from LPC to LPCC has been performed but results are not as expected. Although, feel free to modify the code for testing purposes. Let us know your results !

Results

On a database size of 8 speakers , both the LPC and MFCC algorithms have observed 100% accuracy.

References