|
1 | 1 | # Speech Emotion Recognition
|
2 | 2 | ## Introduction
|
| 3 | +<a href="https://replicate.ai/x4nth055/emotion-recognition-using-speech"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=darkgreen" height=20></a> |
| 4 | + |
| 5 | + |
3 | 6 | - This repository handles building and training Speech Emotion Recognition System.
|
4 | 7 | - The basic idea behind this tool is to build and train/test a suited machine learning ( as well as deep learning ) algorithm that could recognize and detects human emotions from speech.
|
5 | 8 | - This is useful for many industry fields such as making product recommendations, affective computing, etc.
|
6 | 9 | - Check this [tutorial](https://www.thepythoncode.com/article/building-a-speech-emotion-recognizer-using-sklearn) for more information.
|
7 | 10 | ## Requirements
|
8 | 11 | - **Python 3.6+**
|
9 | 12 | ### Python Packages
|
| 13 | +- **tensorflow** |
10 | 14 | - **librosa==0.6.3**
|
11 | 15 | - **numpy**
|
12 | 16 | - **pandas**
|
@@ -38,7 +42,7 @@ Feature extraction is the main part of the speech emotion recognition system. It
|
38 | 42 |
|
39 | 43 | In this repository, we have used the most used features that are available in [librosa](https://github.com/librosa/librosa) library including:
|
40 | 44 | - [MFCC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)
|
41 |
| -- Chromagram |
| 45 | +- Chromagram |
42 | 46 | - MEL Spectrogram Frequency (mel)
|
43 | 47 | - Contrast
|
44 | 48 | - Tonnetz (tonal centroid features)
|
@@ -102,6 +106,7 @@ print("Prediction:", rec.predict("data/tess_ravdess/validation/Actor_25/25_01_01
|
102 | 106 | Prediction: neutral
|
103 | 107 | Prediction: sad
|
104 | 108 | ```
|
| 109 | +You can pass any audio file, if it's not in the appropriate format (16000Hz and mono channel), then it'll be automatically converted, make sure you have `ffmpeg` installed in your system and added to *PATH*. |
105 | 110 | ## Example 2: Using RNNs for 5 Emotions
|
106 | 111 | ```python
|
107 | 112 | from deep_emotion_recognition import DeepEmotionRecognizer
|
@@ -143,6 +148,45 @@ true_neutral 3.846154 8.974360 82.051285 2.564103
|
143 | 148 | true_ps 2.564103 0.000000 1.282051 83.333328 12.820514
|
144 | 149 | true_happy 20.512821 2.564103 2.564103 2.564103 71.794876
|
145 | 150 | ```
|
| 151 | +## Example 3: Not Passing any Model and Removing the Custom Dataset |
| 152 | +Below code initializes `EmotionRecognizer` with 3 chosen emotions while removing Custom dataset, and setting `balance` to `False`: |
| 153 | +```python |
| 154 | +from emotion_recognition import EmotionRecognizer |
| 155 | +# initialize instance, this will take a bit the first time executed |
| 156 | +# as it'll extract the features and calls determine_best_model() automatically |
| 157 | +# to load the best performing model on the picked dataset |
| 158 | +rec = EmotionRecognizer(emotions=["angry", "neutral", "sad"], balance=False, verbose=1, custom_db=False) |
| 159 | +# it will be trained, so no need to train this time |
| 160 | +# get the accuracy on the test set |
| 161 | +print(rec.confusion_matrix()) |
| 162 | +# predict angry audio sample |
| 163 | +prediction = rec.predict('data/validation/Actor_10/03-02-05-02-02-02-10_angry.wav') |
| 164 | +print(f"Prediction: {prediction}") |
| 165 | +``` |
| 166 | +**Output:** |
| 167 | +``` |
| 168 | +[+] Best model determined: RandomForestClassifier with 93.454% test accuracy |
| 169 | +
|
| 170 | + predicted_angry predicted_neutral predicted_sad |
| 171 | +true_angry 98.275864 1.149425 0.574713 |
| 172 | +true_neutral 0.917431 88.073395 11.009174 |
| 173 | +true_sad 6.250000 1.875000 91.875000 |
| 174 | +
|
| 175 | +Prediction: angry |
| 176 | +``` |
| 177 | +You can print the number of samples on each class: |
| 178 | +```python |
| 179 | +rec.get_samples_by_class() |
| 180 | +``` |
| 181 | +**Output:** |
| 182 | +``` |
| 183 | + train test total |
| 184 | +angry 910 174 1084 |
| 185 | +neutral 650 109 759 |
| 186 | +sad 862 160 1022 |
| 187 | +total 2422 443 2865 |
| 188 | +``` |
| 189 | +In this case, the dataset is only from TESS and RAVDESS, and not balanced, you can pass `True` to `balance` on the `EmotionRecognizer` instance to balance the data. |
146 | 190 | ## Algorithms Used
|
147 | 191 | This repository can be used to build machine learning classifiers as well as regressors for the case of 3 emotions {'sad': 0, 'neutral': 1, 'happy': 2} and the case of 5 emotions {'angry': 1, 'sad': 2, 'neutral': 3, 'ps': 4, 'happy': 5}
|
148 | 192 | ### Classifiers
|
@@ -207,4 +251,4 @@ plot_histograms(classifiers=True)
|
207 | 251 | **Output:**
|
208 | 252 |
|
209 | 253 | <img src="images/Figure.png">
|
210 |
| -<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p> |
| 254 | +<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p> |
0 commit comments