Skip to content

ThanmayJ/speaker-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Speaker Recognition

Description

This project aims to apply some basic techniques in signal processing for speaker recognition.

The following is the project workflow:

  1. Pre-processing of input audio signal
  2. Feature Extraction with LPC/MFCC
  3. Feature Matching with LBG
  4. Dataset Training
  5. User Matching

Data

  • The speaker data was taken from the CSTR VCTK Corpus
  • 8 speakers were taken from the entire dataset of 110 speakers
  • Each speaker says the sentence "Please Call Stella" (first sentence from the corpus)
  • For each speaker, the mic1 recordings were taken as train data and mic2 recordings were taken as test data
  • The .flac files were pre-processed to single-channel .wav files sampled at a frequency of 11025 Hz

Compatibility

This code has been developed and tested well in Windows with Jupter Notebook (Python) on Visual Studio Code.

Special Notes

  • To fix the warning of a singular matrix whose determinant is zero, we have modified it to peform calculations on pseudo inverse (pinv). Additionally, we have set r[k][0] = 0.0001 due to NaN issues coupled with the previous warning. Using the above patch may or may not result in loss of accuracy.
  • Conversion from LPC to LPCC has been performed but results are not as expected. Although, feel free to modify the code for testing purposes. Let us know your results !

Results

On a database size of 8 speakers , both the LPC and MFCC algorithms have observed 100% accuracy.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors