| 
1 | 1 | # Speech Emotion Recognition  | 
2 | 2 | ## Introduction  | 
 | 3 | +<a href="https://replicate.ai/x4nth055/emotion-recognition-using-speech"><img src="https://img.shields.io/static/v1?label=Replicate&message=Demo and Docker Image&color=darkgreen" height=20></a>  | 
 | 4 | + | 
 | 5 | + | 
3 | 6 | - This repository handles building and training Speech Emotion Recognition System.  | 
4 | 7 | - The basic idea behind this tool is to build and train/test a suited machine learning ( as well as deep learning ) algorithm that could recognize and detects human emotions from speech.  | 
5 | 8 | - This is useful for many industry fields such as making product recommendations, affective computing, etc.  | 
6 | 9 | - Check this [tutorial](https://www.thepythoncode.com/article/building-a-speech-emotion-recognizer-using-sklearn) for more information.  | 
7 | 10 | ## Requirements  | 
8 | 11 | - **Python 3.6+**  | 
9 | 12 | ### Python Packages  | 
 | 13 | +- **tensorflow**  | 
10 | 14 | - **librosa==0.6.3**  | 
11 | 15 | - **numpy**  | 
12 | 16 | - **pandas**  | 
@@ -38,7 +42,7 @@ Feature extraction is the main part of the speech emotion recognition system. It  | 
38 | 42 | 
 
  | 
39 | 43 | In this repository, we have used the most used features that are available in [librosa](https://github.com/librosa/librosa) library including:  | 
40 | 44 | - [MFCC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)  | 
41 |  | -- Chromagram   | 
 | 45 | +- Chromagram  | 
42 | 46 | - MEL Spectrogram Frequency (mel)  | 
43 | 47 | - Contrast  | 
44 | 48 | - Tonnetz (tonal centroid features)  | 
@@ -102,6 +106,7 @@ print("Prediction:", rec.predict("data/tess_ravdess/validation/Actor_25/25_01_01  | 
102 | 106 | Prediction: neutral  | 
103 | 107 | Prediction: sad  | 
104 | 108 | ```  | 
 | 109 | +You can pass any audio file, if it's not in the appropriate format (16000Hz and mono channel), then it'll be automatically converted, make sure you have `ffmpeg` installed in your system and added to *PATH*.  | 
105 | 110 | ## Example 2: Using RNNs for 5 Emotions  | 
106 | 111 | ```python  | 
107 | 112 | from deep_emotion_recognition import DeepEmotionRecognizer  | 
@@ -143,6 +148,45 @@ true_neutral         3.846154       8.974360          82.051285      2.564103  | 
143 | 148 | true_ps              2.564103       0.000000           1.282051     83.333328        12.820514  | 
144 | 149 | true_happy          20.512821       2.564103           2.564103      2.564103        71.794876  | 
145 | 150 | ```  | 
 | 151 | +## Example 3: Not Passing any Model and Removing the Custom Dataset  | 
 | 152 | +Below code initializes `EmotionRecognizer` with 3 chosen emotions while removing Custom dataset, and setting `balance` to `False`:  | 
 | 153 | +```python  | 
 | 154 | +from emotion_recognition import EmotionRecognizer  | 
 | 155 | +# initialize instance, this will take a bit the first time executed  | 
 | 156 | +# as it'll extract the features and calls determine_best_model() automatically  | 
 | 157 | +# to load the best performing model on the picked dataset  | 
 | 158 | +rec = EmotionRecognizer(emotions=["angry", "neutral", "sad"], balance=False, verbose=1, custom_db=False)  | 
 | 159 | +# it will be trained, so no need to train this time  | 
 | 160 | +# get the accuracy on the test set  | 
 | 161 | +print(rec.confusion_matrix())  | 
 | 162 | +# predict angry audio sample  | 
 | 163 | +prediction = rec.predict('data/validation/Actor_10/03-02-05-02-02-02-10_angry.wav')  | 
 | 164 | +print(f"Prediction: {prediction}")  | 
 | 165 | +```  | 
 | 166 | +**Output:**  | 
 | 167 | +```  | 
 | 168 | +[+] Best model determined: RandomForestClassifier with 93.454% test accuracy  | 
 | 169 | +
  | 
 | 170 | +              predicted_angry  predicted_neutral  predicted_sad  | 
 | 171 | +true_angry          98.275864           1.149425       0.574713  | 
 | 172 | +true_neutral         0.917431          88.073395      11.009174  | 
 | 173 | +true_sad             6.250000           1.875000      91.875000  | 
 | 174 | +
  | 
 | 175 | +Prediction: angry  | 
 | 176 | +```  | 
 | 177 | +You can print the number of samples on each class:  | 
 | 178 | +```python  | 
 | 179 | +rec.get_samples_by_class()  | 
 | 180 | +```  | 
 | 181 | +**Output:**  | 
 | 182 | +```  | 
 | 183 | +         train  test  total  | 
 | 184 | +angry      910   174   1084  | 
 | 185 | +neutral    650   109    759  | 
 | 186 | +sad        862   160   1022  | 
 | 187 | +total     2422   443   2865  | 
 | 188 | +```  | 
 | 189 | +In this case, the dataset is only from TESS and RAVDESS, and not balanced, you can pass `True` to `balance` on the `EmotionRecognizer` instance to balance the data.  | 
146 | 190 | ## Algorithms Used  | 
147 | 191 | This repository can be used to build machine learning classifiers as well as regressors for the case of 3 emotions {'sad': 0, 'neutral': 1, 'happy': 2} and the case of 5 emotions {'angry': 1, 'sad': 2, 'neutral': 3, 'ps': 4, 'happy': 5}  | 
148 | 192 | ### Classifiers  | 
@@ -207,4 +251,4 @@ plot_histograms(classifiers=True)  | 
207 | 251 | **Output:**  | 
208 | 252 | 
 
  | 
209 | 253 | <img src="images/Figure.png">  | 
210 |  | -<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p>  | 
 | 254 | +<p align="center">A Histogram shows different algorithms metric results on different data sizes as well as time consumed to train/predict.</p>  | 
0 commit comments