Decoding-Data-Science · Suhail270 · Jun 2, 2024 · Jun 2, 2024 · Jun 3, 2024 · Jun 3, 2024
diff --git a/Demo Video.mp4 b/Demo Video.mp4
diff --git a/README.md b/README.md
@@ -1,6 +1,127 @@
 # FER May Hakathon
 
-Facial Emotion Detection Hackathon Project, Create a model and test it uing 5 to 10 sec videos to detect emotions 
+Facial Emotion Detection Hackathon Project, Create a model and test it using 5 to 10 second videos to detect emotions 
+
+**Please watch the demonstration video along with testing the app** as in the video, I discuss an *alternate approach* to doing this which I was not able to implement in time. In the video, I used clips from the CREMA-D dataset. CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) and four different emotion levels (Low, Medium, High, and Unspecified).
+
+### Team Name: MLX
+### Member: Suhail Ahmed | [email protected]
+### Deployed Link: https://fer-dds-mlx.streamlit.app
+
+### Model Accuracy: 35.77% on the validation dataset - FER2013 Partial (as provided in the repository).
+
+## Implemented Methodology:
+
+### Machine Learning Model
+
+I implemented a CNN model and trained it on the FER 2013 partial daataset for 75 epochs. 
+
+The model is built using the Keras Sequential API. It consists of multiple layers of convolutional neural networks, batch normalization, max-pooling, and dropout layers to prevent overfitting. The architecture is designed to progressively extract higher-level features from the input images.
+
+- **Input Layer:** The input layer expects images of shape (48, 48, 1), which corresponds to grayscale images of size 48x48 pixels.
+
+- **Convolutional Layers:** These layers use 3x3 filters to convolve the input and extract features. The activation function used is ReLU (Rectified Linear Unit).
+
+- **Batch Normalization:** This layer normalizes the outputs of the previous layer to stabilize and accelerate the training process.
+
+- **Max-Pooling Layers:** These layers downsample the input by taking the maximum value in each 2x2 pool, reducing the spatial dimensions.
+
+- **Dropout Layers:** These layers randomly drop a fraction of the units during training to prevent overfitting.
+
+- **Flatten Layer:** This layer flattens the 3D output from the convolutional layers into a 1D vector, which is fed into the dense (fully connected) layers.
+
+- **Dense Layers:** These layers perform the final classification. The last dense layer uses a softmax activation function to output probabilities for each of the seven emotion classes.
+
+#### Compilation
+The model is compiled using the Adam optimizer with the specified learning rate. The loss function used is categorical cross-entropy, which is suitable for multi-class classification problems. Accuracy is used as the evaluation metric.
+
+##### Callbacks
+Three callbacks are used during training to improve performance and prevent overfitting:
+
+ReduceLROnPlateau: This callback reduces the learning rate when the validation loss plateaus, helping the model converge.
+EarlyStopping: This callback stops training when the validation accuracy does not improve for a specified number of epochs, preventing overfitting.
+ModelCheckpoint: This callback saves the model weights when the validation loss improves, ensuring the best model is saved.
+
+#### Model Training
+The model is trained using the fit method. The training data is split into training and validation sets, and the model is trained for the specified number of epochs and batch size. The shuffle parameter ensures that the data is shuffled before each epoch.
+
+#### Rationale
+
+##### Convolutional Neural Network
+The use of a convolutional neural network (CNN) is appropriate for image classification tasks due to its ability to automatically learn spatial hierarchies of features from input images. The multiple convolutional layers with increasing filter sizes help the model capture complex patterns in the data.
+
+##### Regularization
+Dropout and batch normalization are used extensively throughout the network to prevent overfitting and improve generalization. The l2 regularization on the first layer also helps in reducing overfitting by penalizing large weights.
+
+##### Optimizer
+The Adam optimizer is chosen for its efficiency and adaptive learning rate, which helps in faster convergence compared to traditional stochastic gradient descent.
+
+
+### Streamlit Application
+
+Please nagivate to the dds directory and once you are in it, run:
+
+```console
+streamlit run app.py
+```
+
+Upon uploading you video, it will take some time to process it, depending on the number of pixels and length of the video. After its processing, you will be able to see all the detected emotions per frame and the most detected emotion is the predicted emotion.
+
+<br>
+
+<img width="766" alt="image" src="https://github.com/Suhail270/fer-may-hackathon/assets/57321434/fa700a38-f84c-441a-8a6c-5ef44daeff36">
+
+<br>
+<br>
+You may also scroll below and use the slider to view the emotion detected at a particular frame. This feature can be especially useful when multiple emotions are covered in the same video.
+<br>
+<br>
+
+![image](https://github.com/Suhail270/fer-may-hackathon/assets/57321434/b0146f09-74a4-4e17-b0cf-3917d8c01072)
+
+<img width="614" alt="image" src="https://github.com/Suhail270/fer-may-hackathon/assets/57321434/f733cf20-1ae5-4cdc-af34-1c3fc831eb11">
+
+
+#### Implementation Methodology and Rationale
+
+##### Libraries and Imports
+The following libraries are used in this project:
+
+- *streamlit* for creating the web application.
+- *cv2 (OpenCV)* for image processing and face detection.
+- *numpy* for array manipulation.
+- *keras* for loading the pre-trained emotion detection model.
+- *tempfile* for handling temporary files.
+- *streamlit_webrtc* for handling real-time video processing.
+
+##### Model and Classifier Loading
+The custom-trained emotion detection model and Haar Cascade face classifier are loaded at the beginning of the script. This ensures that the models are ready for use when processing the video frames.
+
+##### Emotion Counts Initialization
+An emotion count dictionary is initialized to keep track of the number of times each emotion is detected in the video frames.
+
+##### Video Transformer Class
+A custom VideoTransformer class is defined to process each video frame. This class uses the transform method to:
+
+##### Convert the frame to grayscale.
+Detect faces in the frame.
+Predict the emotion for each detected face.
+Draw a rectangle around each face and annotate it with the predicted emotion label.
+The emotion counts are updated for each prediction.
+
+##### Video Processing Function
+The process_video function handles the uploaded video file. It:
+
+Saves the uploaded file to a temporary location.
+Reads the video frame-by-frame.
+Processes each frame to detect faces and emotions.
+Stores the processed frames in a list.
+Displays the detected emotions and their counts.
+Provides a slider to navigate through the frames.
+
+##### Main Function
+The main function sets up the Streamlit interface. It sets the title of the app, provides a file uploader for the user to upload a video file and calls the process_video function if a file is uploaded.
+
 
 # Facial Emotion Recognition
 

diff --git a/dds/Aptfile b/dds/Aptfile
@@ -0,0 +1 @@
+poppler-utils
diff --git a/dds/README.md b/dds/README.md
@@ -0,0 +1,37 @@
+
+
+### Prerequisites
+
+Please install poppler-utils from the terminal.
+
+#### Linux
+
+```bash
+apt-get install poppler-utils
+```
+
+If the above causes an error, please run it using sudo.
+
+```bash
+sudo apt-get install poppler-utils
+```
+
+#### Mac
+
+```bash
+brew install poppler-utils
+```
+
+After installation of poppler-utils, please install the rest of the requirements.
+
+```bash
+pip install -r requirements.txt
+```
+
+### Running Project
+
+Please ensure that you are in the root directory: suhail-aaico. Upon doing so, please run the following command in the terminal. This will open the web application locally.
+
+```bash
+streamlit run app.py
+```
diff --git a/dds/app.py b/dds/app.py
@@ -0,0 +1,86 @@
+from streamlit_webrtc import VideoTransformerBase, webrtc_streamer
+import av
+import streamlit as st
+import cv2
+import numpy as np
+from keras.models import load_model
+from keras.preprocessing.image import img_to_array
+import tempfile
+
+
+try:
+    classifier = load_model('model.h5')
+    face_classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
+    emotion_labels = ['Angry', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']
+except Exception as e:
+    st.write(f"Error loading model or cascade classifier: {e}")
+
+emotion_counts = {label: 0 for label in emotion_labels}
+
+
+class VideoTransformer(VideoTransformerBase):
+    def transform(self, frame):
+        img = frame.to_ndarray(format="bgr24")
+        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+        faces = face_classifier.detectMultiScale(gray, 1.3, 5)
+        for (x, y, w, h) in faces:
+            roi_gray = gray[y:y+h, x:x+w]
+            roi_gray = cv2.resize(roi_gray, (48, 48), interpolation=cv2.INTER_AREA)
+            roi = roi_gray.astype('float') / 255.0
+            roi = img_to_array(roi)
+            roi = np.expand_dims(roi, axis=0)
+            prediction = classifier.predict(roi)[0]
+            label = emotion_labels[prediction.argmax()]
+            emotion_counts[label] += 1
+            cv2.putText(img, label, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+            cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
+        return img
+
+
+def process_video(uploaded_file):
+    image_list = []
+    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
+        tmp_file.write(uploaded_file.read())
+        video_capture = cv2.VideoCapture(tmp_file.name)
+        while True:
+            ret, frame = video_capture.read()
+            if not ret:
+                break
+            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
+            faces = face_classifier.detectMultiScale(gray, 1.3, 5)
+            for (x, y, w, h) in faces:
+                roi_gray = gray[y:y+h, x:x+w]
+                roi_gray = cv2.resize(roi_gray, (48, 48), interpolation=cv2.INTER_AREA)
+                roi = roi_gray.astype('float') / 255.0
+                roi = img_to_array(roi)
+                roi = np.expand_dims(roi, axis=0)
+                prediction = classifier.predict(roi)[0]
+                label = emotion_labels[prediction.argmax()]
+                emotion_counts[label] += 1
+                cv2.putText(frame, label, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
+                cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
+            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+            image_list.append(frame_rgb)
+        video_capture.release()
+
+    st.write('Detected Emotion: ', max(emotion_counts, key=emotion_counts.get))
+    st.write('Confidence Level (%): ', round((emotion_counts[max(emotion_counts, key=emotion_counts.get)] / len(image_list) * 100), 2))
+    st.write('------------------------')
+
+    st.write('Emotion Detected Per Frame:')
+    for label, count in emotion_counts.items():
+        st.write(f'{label}: {count}')
+
+    image_index = st.slider("Please use the slider to drag across different frames", 0, len(image_list) - 1, 0)
+    st.image(image_list[image_index], use_column_width=True)
+
+
+def main():
+    st.title("Emotion Detection in Uploaded Videos")
+    uploaded_file = st.file_uploader("Choose a video file", type=["mp4", "avi", "flv"])
+    if uploaded_file is not None:
+        process_video(uploaded_file)
+
+
+if __name__ == "__main__":
+    main()