Speaker Identification System

This project is a speaker identification system based on MFCC (Mel Frequency Cepstral Coefficients) feature extraction and a TDNN (Time Delay Neural Network) model for classification. It aims to accurately determine the identity of a speaker from an input voice recording.

Overview

Speaker identification is the task of recognizing who is speaking by analyzing their voice characteristics. This project processes voice recordings through a pipeline that includes feature extraction, model training using a TDNN architecture, and speaker matching.

System Flow

The entire speaker identification pipeline is summarized in the diagram below:

Why MFCC?

MFCCs are a widely used feature set in speech and speaker recognition systems due to their effectiveness at capturing the timbral aspects of human voice, which are key for distinguishing speakers.

Benefits of MFCC:

Mimics the human auditory system using the Mel scale
Captures perceptually relevant features of speech
Low-dimensional yet information-rich representation
Robust to noise in many real-world environments

How MFCC Works

The MFCC feature extraction process transforms a raw audio signal into a set of coefficients that represent the short-term power spectrum of sound. Here's how the pipeline works:

Steps:

Pre-processing + Normalization: Removes DC offset and normalizes the signal.
Frame Blocking & Windowing: Segments signal into frames and applies a Hamming window.
FFT: Converts each frame from time to frequency domain.
Mel Filter Banks: Emphasizes frequencies relevant to human hearing.
Log & DCT (DCT2): Reduces dimensionality and decorrelates features.
Cepstral Mean Subtraction: Improves robustness by reducing noise.
Output: A compact set of MFCC features.

Model Architecture: TDNN-Based Speaker Identification

We use a TDNN (Time Delay Neural Network) to model the temporal dependencies in speech, which helps capture the speaker's unique voice characteristics.

Why TDNN?

Efficiently handles variable-length inputs
Captures long-range dependencies in the signal
Proven performance in speaker verification and recognition tasks

The TDNN is trained on the MFCC features extracted from audio recordings during the enrollment phase. During the matching phase, the input features are passed through the same pipeline and compared to enrolled speaker models.

Phases

Enrollment Phase: Register known speakers' voiceprints.
Matching Phase: Identify input speaker by comparing with enrolled data.

The system calculates a similarity score to determine the identity or return "Unknown" if confidence is low.

Future Improvements

Add voice activity detection (VAD)
Use x-vectors for speaker embeddings
Enhance with real-time inference capabilities

Diagrams

flow.png: Speaker identification process overview
mfccdiag.png: MFCC feature extraction pipeline

Author

Built by Akram — full-stack developer passionate about voice tech and deep learning.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
backend		backend
cnn-approach		cnn-approach
frontend @ 1263a89		frontend @ 1263a89
speaker-id		speaker-id
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
flow.png		flow.png
mfccdiag.png		mfccdiag.png
relu.png		relu.png
temp_audio.wav		temp_audio.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Identification System

Overview

System Flow

Why MFCC?

Benefits of MFCC:

How MFCC Works

Steps:

Model Architecture: TDNN-Based Speaker Identification

Why TDNN?

Phases

Future Improvements

Diagrams

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speaker Identification System

Overview

System Flow

Why MFCC?

Benefits of MFCC:

How MFCC Works

Steps:

Model Architecture: TDNN-Based Speaker Identification

Why TDNN?

Phases

Future Improvements

Diagrams

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages