Skip to content

Fine-tuning a Vision Transformer (ViT) model for emotion classification using the FER2013 dataset – including preprocessing, augmentation, training, and evaluation with confusion matrix and poster.

Notifications You must be signed in to change notification settings

Sivanajani/Emotion-Recognition-with-Vision-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Emotion Detection with Vision Transformer (ViT)

This project focuses on classifying facial emotions using a fine-tuned Vision Transformer (ViT) model.
We used the ViT-Base-Patch16-224 pretrained model and adapted its classification head to predict seven emotions from grayscale face images.

The work was done as part of a team project. This repository contains my contribution to the ViT model, including data augmentation, model training, and evaluation.


🧠 Dataset

We used the FER 2013 dataset on Kaggle, which contains 35,685 grayscale images (48×48 pixels), categorized into seven emotions:

  • 😠 angry
  • 🤢 disgusted
  • 😨 fearful
  • 😀 happy
  • 😐 neutral
  • 😢 sad
  • 😲 surprised

The dataset is split into training and test sets. The disgusted class was underrepresented and required augmentation (see below).


🔍 Challenges

  • Imbalanced Classes: e.g., disgusted was underrepresented
    → Resolved via targeted augmentation (rotation, flips)
  • Low Image Quality: Some images were blurry or contained text artifacts
  • Similar Emotions: Even for humans, emotions like fearful and sad are hard to distinguish

🔧 Tech Stack & Methoden:

  • ViT Model: Google ViT-Base-Patch16-224

  • Libraries: PyTorch · Hugging Face Transformers

  • Training Techniques:

    • Data Augmentation (flips, rotation)
    • Weight Decay Regularization
    • Early Stopping · Learning Rate Scheduler
    • Cross-Entropy Loss
    • Self-Attention Mechanism

📈 Confusion Matrix – ViT Model

This matrix shows the model’s performance on the test set (normalized):

Confusion Matrix


🧾 Project Poster

The final poster summarizes all three models (CNN, Transfer Learning & ViT), their performance, challenges, and findings:

Project Poster


📄 License

This project was created as part of a university project at FHNW and is intended for demonstration and educational purposes only.


📚 Quelle des Datensatzes

You can view and download the original data here:
➡️ Kaggle – Facial Emotion Recognition (FER2013)

About

Fine-tuning a Vision Transformer (ViT) model for emotion classification using the FER2013 dataset – including preprocessing, augmentation, training, and evaluation with confusion matrix and poster.

Topics

Resources

Stars

Watchers

Forks