Emotion and Speaker Recognition from Audio

This project is focused on detecting emotions and recognizing speakers from audio files using deep learning models. We have two pre-trained models for this purpose: one for emotion detection and another for speaker recognition. The code for the models are in the files: Emotion_Detection_Ravdess_Dataset.ipynb and Speaker_Identification_Ravdess_Dataset.ipynb

Prerequisites

Before running the code, ensure you have the following prerequisites installed:

Python 3.x
TensorFlow
Keras

Installation

Clone the repository to your local machine:

git clone https://github.com/pranay-5374/VoiceSculptor.git

Navigate to the project directory:
```
cd your-repo
```
Install the required dependencies:
```
pip install tensorflow keras
```
Download the pre-trained models (if not already provided) and place them in the project directory.

Usage

You can use the code to perform emotion detection and speaker recognition on audio files.

Import the necessary libraries:

from tensorflow import keras
from tensorflow.keras import layers

Load the pre-trained models:

emotion_model = keras.models.load_model("emotion_detection_model.h5")
speaker_model = keras.models.load_model("speaker_recognition_model.h5")

Freeze the layers of the loaded models to retain their pre-trained weights:

for layer in emotion_model.layers:
    layer.trainable = False

for layer in speaker_model.layers:
    layer.trainable = False

Define an input layer for your audio data and connect it to both models:

input_layer = layers.Input(shape=(input_shape,))

emotion_output = emotion_model(input_layer)
speaker_output = speaker_model(input_layer)

Add additional layers for custom classification:

emotion_output = layers.Dense(128, activation='relu')(emotion_output)
emotion_output = layers.Dense(num_emotion_classes, activation='softmax', name='emotion_output')(emotion_output)

speaker_output = layers.Dense(128, activation='relu')(speaker_output)
speaker_output = layers.Dense(num_speaker_classes, activation='softmax', name='speaker_output')(speaker_output)

Create a combined model that outputs both emotion and speaker predictions:

combined_model = keras.Model(inputs=input_layer, outputs=[emotion_output, speaker_output])

Compile the combined model with suitable loss functions, optimizers, and metrics:

combined_model.compile(optimizer='adam',
                      loss={'emotion_output': 'categorical_crossentropy', 'speaker_output': 'categorical_crossentropy'},
                      metrics={'emotion_output': 'accuracy', 'speaker_output': 'accuracy'})

Train and evaluate the combined model using your audio data.

For more details, you can check the code and its comments.

Summary

The provided code allows you to integrate pre-trained emotion detection and speaker recognition models into a single model for audio analysis. Please make sure to replace input_shape, num_emotion_classes, and num_speaker_classes with appropriate values for your specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Combined_model.py		Combined_model.py
Demo.py		Demo.py
Emotion_Detection_Ravdess_Dataset.ipynb		Emotion_Detection_Ravdess_Dataset.ipynb
README.md		README.md
Speaker_Identification_Ravdess_Dataset.ipynb		Speaker_Identification_Ravdess_Dataset.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotion and Speaker Recognition from Audio

Prerequisites

Installation

Usage

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Emotion and Speaker Recognition from Audio

Prerequisites

Installation

Usage

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages