Skip to content

Committed to my fork by 12, forgot to initiate the pull request. Team MLX #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
Binary file added Demo Video.mp4
Binary file not shown.
123 changes: 122 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,127 @@
# FER May Hakathon

Facial Emotion Detection Hackathon Project, Create a model and test it uing 5 to 10 sec videos to detect emotions
Facial Emotion Detection Hackathon Project, Create a model and test it using 5 to 10 second videos to detect emotions

**Please watch the demonstration video along with testing the app** as in the video, I discuss an *alternate approach* to doing this which I was not able to implement in time. In the video, I used clips from the CREMA-D dataset. CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified). Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) and four different emotion levels (Low, Medium, High, and Unspecified).

### Team Name: MLX
### Member: Suhail Ahmed | [email protected]
### Deployed Link: https://fer-dds-mlx.streamlit.app

### Model Accuracy: 35.77% on the validation dataset - FER2013 Partial (as provided in the repository).

## Implemented Methodology:

### Machine Learning Model

I implemented a CNN model and trained it on the FER 2013 partial daataset for 75 epochs.

The model is built using the Keras Sequential API. It consists of multiple layers of convolutional neural networks, batch normalization, max-pooling, and dropout layers to prevent overfitting. The architecture is designed to progressively extract higher-level features from the input images.

- **Input Layer:** The input layer expects images of shape (48, 48, 1), which corresponds to grayscale images of size 48x48 pixels.

- **Convolutional Layers:** These layers use 3x3 filters to convolve the input and extract features. The activation function used is ReLU (Rectified Linear Unit).

- **Batch Normalization:** This layer normalizes the outputs of the previous layer to stabilize and accelerate the training process.

- **Max-Pooling Layers:** These layers downsample the input by taking the maximum value in each 2x2 pool, reducing the spatial dimensions.

- **Dropout Layers:** These layers randomly drop a fraction of the units during training to prevent overfitting.

- **Flatten Layer:** This layer flattens the 3D output from the convolutional layers into a 1D vector, which is fed into the dense (fully connected) layers.

- **Dense Layers:** These layers perform the final classification. The last dense layer uses a softmax activation function to output probabilities for each of the seven emotion classes.

#### Compilation
The model is compiled using the Adam optimizer with the specified learning rate. The loss function used is categorical cross-entropy, which is suitable for multi-class classification problems. Accuracy is used as the evaluation metric.

##### Callbacks
Three callbacks are used during training to improve performance and prevent overfitting:

ReduceLROnPlateau: This callback reduces the learning rate when the validation loss plateaus, helping the model converge.
EarlyStopping: This callback stops training when the validation accuracy does not improve for a specified number of epochs, preventing overfitting.
ModelCheckpoint: This callback saves the model weights when the validation loss improves, ensuring the best model is saved.

#### Model Training
The model is trained using the fit method. The training data is split into training and validation sets, and the model is trained for the specified number of epochs and batch size. The shuffle parameter ensures that the data is shuffled before each epoch.

#### Rationale

##### Convolutional Neural Network
The use of a convolutional neural network (CNN) is appropriate for image classification tasks due to its ability to automatically learn spatial hierarchies of features from input images. The multiple convolutional layers with increasing filter sizes help the model capture complex patterns in the data.

##### Regularization
Dropout and batch normalization are used extensively throughout the network to prevent overfitting and improve generalization. The l2 regularization on the first layer also helps in reducing overfitting by penalizing large weights.

##### Optimizer
The Adam optimizer is chosen for its efficiency and adaptive learning rate, which helps in faster convergence compared to traditional stochastic gradient descent.


### Streamlit Application

Please nagivate to the dds directory and once you are in it, run:

```console
streamlit run app.py
```

Upon uploading you video, it will take some time to process it, depending on the number of pixels and length of the video. After its processing, you will be able to see all the detected emotions per frame and the most detected emotion is the predicted emotion.

<br>

<img width="766" alt="image" src="https://github.com/Suhail270/fer-may-hackathon/assets/57321434/fa700a38-f84c-441a-8a6c-5ef44daeff36">

<br>
<br>
You may also scroll below and use the slider to view the emotion detected at a particular frame. This feature can be especially useful when multiple emotions are covered in the same video.
<br>
<br>

![image](https://github.com/Suhail270/fer-may-hackathon/assets/57321434/b0146f09-74a4-4e17-b0cf-3917d8c01072)

<img width="614" alt="image" src="https://github.com/Suhail270/fer-may-hackathon/assets/57321434/f733cf20-1ae5-4cdc-af34-1c3fc831eb11">


#### Implementation Methodology and Rationale

##### Libraries and Imports
The following libraries are used in this project:

- *streamlit* for creating the web application.
- *cv2 (OpenCV)* for image processing and face detection.
- *numpy* for array manipulation.
- *keras* for loading the pre-trained emotion detection model.
- *tempfile* for handling temporary files.
- *streamlit_webrtc* for handling real-time video processing.

##### Model and Classifier Loading
The custom-trained emotion detection model and Haar Cascade face classifier are loaded at the beginning of the script. This ensures that the models are ready for use when processing the video frames.

##### Emotion Counts Initialization
An emotion count dictionary is initialized to keep track of the number of times each emotion is detected in the video frames.

##### Video Transformer Class
A custom VideoTransformer class is defined to process each video frame. This class uses the transform method to:

##### Convert the frame to grayscale.
Detect faces in the frame.
Predict the emotion for each detected face.
Draw a rectangle around each face and annotate it with the predicted emotion label.
The emotion counts are updated for each prediction.

##### Video Processing Function
The process_video function handles the uploaded video file. It:

Saves the uploaded file to a temporary location.
Reads the video frame-by-frame.
Processes each frame to detect faces and emotions.
Stores the processed frames in a list.
Displays the detected emotions and their counts.
Provides a slider to navigate through the frames.

##### Main Function
The main function sets up the Streamlit interface. It sets the title of the app, provides a file uploader for the user to upload a video file and calls the process_video function if a file is uploaded.


# Facial Emotion Recognition

Expand Down
1 change: 1 addition & 0 deletions dds/Aptfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
poppler-utils
37 changes: 37 additions & 0 deletions dds/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@


### Prerequisites

Please install poppler-utils from the terminal.

#### Linux

```bash
apt-get install poppler-utils
```

If the above causes an error, please run it using sudo.

```bash
sudo apt-get install poppler-utils
```

#### Mac

```bash
brew install poppler-utils
```

After installation of poppler-utils, please install the rest of the requirements.

```bash
pip install -r requirements.txt
```

### Running Project

Please ensure that you are in the root directory: suhail-aaico. Upon doing so, please run the following command in the terminal. This will open the web application locally.

```bash
streamlit run app.py
```
86 changes: 86 additions & 0 deletions dds/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
from streamlit_webrtc import VideoTransformerBase, webrtc_streamer
import av
import streamlit as st
import cv2
import numpy as np
from keras.models import load_model
from keras.preprocessing.image import img_to_array
import tempfile


try:
classifier = load_model('model.h5')
face_classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
emotion_labels = ['Angry', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']
except Exception as e:
st.write(f"Error loading model or cascade classifier: {e}")

emotion_counts = {label: 0 for label in emotion_labels}


class VideoTransformer(VideoTransformerBase):
def transform(self, frame):
img = frame.to_ndarray(format="bgr24")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_classifier.detectMultiScale(gray, 1.3, 5)
for (x, y, w, h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_gray = cv2.resize(roi_gray, (48, 48), interpolation=cv2.INTER_AREA)
roi = roi_gray.astype('float') / 255.0
roi = img_to_array(roi)
roi = np.expand_dims(roi, axis=0)
prediction = classifier.predict(roi)[0]
label = emotion_labels[prediction.argmax()]
emotion_counts[label] += 1
cv2.putText(img, label, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
return img


def process_video(uploaded_file):
image_list = []
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
tmp_file.write(uploaded_file.read())
video_capture = cv2.VideoCapture(tmp_file.name)
while True:
ret, frame = video_capture.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_classifier.detectMultiScale(gray, 1.3, 5)
for (x, y, w, h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_gray = cv2.resize(roi_gray, (48, 48), interpolation=cv2.INTER_AREA)
roi = roi_gray.astype('float') / 255.0
roi = img_to_array(roi)
roi = np.expand_dims(roi, axis=0)
prediction = classifier.predict(roi)[0]
label = emotion_labels[prediction.argmax()]
emotion_counts[label] += 1
cv2.putText(frame, label, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_list.append(frame_rgb)
video_capture.release()

st.write('Detected Emotion: ', max(emotion_counts, key=emotion_counts.get))
st.write('Confidence Level (%): ', round((emotion_counts[max(emotion_counts, key=emotion_counts.get)] / len(image_list) * 100), 2))
st.write('------------------------')

st.write('Emotion Detected Per Frame:')
for label, count in emotion_counts.items():
st.write(f'{label}: {count}')

image_index = st.slider("Please use the slider to drag across different frames", 0, len(image_list) - 1, 0)
st.image(image_list[image_index], use_column_width=True)


def main():
st.title("Emotion Detection in Uploaded Videos")
uploaded_file = st.file_uploader("Choose a video file", type=["mp4", "avi", "flv"])
if uploaded_file is not None:
process_video(uploaded_file)


if __name__ == "__main__":
main()
Loading