Skip to content

Commit 51836c5

Browse files
committed
cog-ified
1 parent 2a1bd63 commit 51836c5

File tree

3 files changed

+92
-2
lines changed

3 files changed

+92
-2
lines changed

README.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
11
# Speech Emotion Recognition
22
## Introduction
3+
<a href="https://replicate.ai/x4nth055/emotion-recognition-using-speech"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=darkgreen" height=20></a>
4+
5+
36
- This repository handles building and training Speech Emotion Recognition System.
47
- The basic idea behind this tool is to build and train/test a suited machine learning ( as well as deep learning ) algorithm that could recognize and detects human emotions from speech.
58
- This is useful for many industry fields such as making product recommendations, affective computing, etc.
69
- Check this [tutorial](https://www.thepythoncode.com/article/building-a-speech-emotion-recognizer-using-sklearn) for more information.
710
## Requirements
811
- **Python 3.6+**
912
### Python Packages
13+
- **tensorflow**
1014
- **librosa==0.6.3**
1115
- **numpy**
1216
- **pandas**
@@ -38,7 +42,7 @@ Feature extraction is the main part of the speech emotion recognition system. It
3842

3943
In this repository, we have used the most used features that are available in [librosa](https://github.com/librosa/librosa) library including:
4044
- [MFCC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)
41-
- Chromagram
45+
- Chromagram
4246
- MEL Spectrogram Frequency (mel)
4347
- Contrast
4448
- Tonnetz (tonal centroid features)
@@ -102,6 +106,7 @@ print("Prediction:", rec.predict("data/tess_ravdess/validation/Actor_25/25_01_01
102106
Prediction: neutral
103107
Prediction: sad
104108
```
109+
You can pass any audio file, if it's not in the appropriate format (16000Hz and mono channel), then it'll be automatically converted, make sure you have `ffmpeg` installed in your system and added to *PATH*.
105110
## Example 2: Using RNNs for 5 Emotions
106111
```python
107112
from deep_emotion_recognition import DeepEmotionRecognizer
@@ -143,6 +148,45 @@ true_neutral 3.846154 8.974360 82.051285 2.564103
143148
true_ps 2.564103 0.000000 1.282051 83.333328 12.820514
144149
true_happy 20.512821 2.564103 2.564103 2.564103 71.794876
145150
```
151+
## Example 3: Not Passing any Model and Removing the Custom Dataset
152+
Below code initializes `EmotionRecognizer` with 3 chosen emotions while removing Custom dataset, and setting `balance` to `False`:
153+
```python
154+
from emotion_recognition import EmotionRecognizer
155+
# initialize instance, this will take a bit the first time executed
156+
# as it'll extract the features and calls determine_best_model() automatically
157+
# to load the best performing model on the picked dataset
158+
rec = EmotionRecognizer(emotions=["angry", "neutral", "sad"], balance=False, verbose=1, custom_db=False)
159+
# it will be trained, so no need to train this time
160+
# get the accuracy on the test set
161+
print(rec.confusion_matrix())
162+
# predict angry audio sample
163+
prediction = rec.predict('data/validation/Actor_10/03-02-05-02-02-02-10_angry.wav')
164+
print(f"Prediction: {prediction}")
165+
```
166+
**Output:**
167+
```
168+
[+] Best model determined: RandomForestClassifier with 93.454% test accuracy
169+
170+
predicted_angry predicted_neutral predicted_sad
171+
true_angry 98.275864 1.149425 0.574713
172+
true_neutral 0.917431 88.073395 11.009174
173+
true_sad 6.250000 1.875000 91.875000
174+
175+
Prediction: angry
176+
```
177+
You can print the number of samples on each class:
178+
```python
179+
rec.get_samples_by_class()
180+
```
181+
**Output:**
182+
```
183+
train test total
184+
angry 910 174 1084
185+
neutral 650 109 759
186+
sad 862 160 1022
187+
total 2422 443 2865
188+
```
189+
In this case, the dataset is only from TESS and RAVDESS, and not balanced, you can pass `True` to `balance` on the `EmotionRecognizer` instance to balance the data.
146190
## Algorithms Used
147191
This repository can be used to build machine learning classifiers as well as regressors for the case of 3 emotions {'sad': 0, 'neutral': 1, 'happy': 2} and the case of 5 emotions {'angry': 1, 'sad': 2, 'neutral': 3, 'ps': 4, 'happy': 5}
148192
### Classifiers
@@ -207,4 +251,4 @@ plot_histograms(classifiers=True)
207251
**Output:**
208252

209253
<img src="images/Figure.png">
210-
<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p>
254+
<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p>

cog.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
build:
2+
python_version: "3.6"
3+
gpu: false
4+
python_packages:
5+
- pandas==1.1.5
6+
- numpy==1.17.3
7+
- wave==0.0.2
8+
- sklearn==0.0
9+
- librosa==0.6.3
10+
- soundfile==0.9.0
11+
- tqdm==4.28.1
12+
- matplotlib==2.2.3
13+
- pyaudio==0.2.11
14+
- numba==0.48
15+
system_packages:
16+
- "ffmpeg"
17+
- "portaudio19-dev"
18+
predict: "predict.py:EmoPredictor"

predict.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import json
2+
import os
3+
import tempfile
4+
from pathlib import Path
5+
6+
import cog
7+
from emotion_recognition import EmotionRecognizer
8+
9+
10+
class EmoPredictor(cog.Predictor):
11+
def setup(self):
12+
"""Load the emotion recognition model and (quickly) train it"""
13+
# self.rec = EmotionRecognizer(None, emotions=["boredom", "neutral"], features=["mfcc"])
14+
self.rec = EmotionRecognizer(
15+
None,
16+
emotions=["sad", "neutral", "happy"],
17+
features=["mfcc"],
18+
probability=True,
19+
)
20+
# evaluate all models in `grid` folder and determine the best one in terms of test accuracy
21+
self.rec.determine_best_model()
22+
23+
@cog.input("input", type=Path, help="Speech audio file")
24+
def predict(self, input):
25+
"""Compute emotion prediction"""
26+
prediction = self.rec.predict_proba(str(input))
27+
28+
return prediction

0 commit comments

Comments
 (0)