Skip to content

MuSE is a NLP model for explaining sarcastic social media posts using both text and images. It combines BART and ViT with a shared fusion module and sarcasm target input. Trained on the MORE+ dataset, it achieves strong scores on BLEU, ROUGE, METEOR, and BERTScore metrics.

Notifications You must be signed in to change notification settings

A-WASIF/Multimodal-Sarcasm-Explanation-MuSE-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Sarcasm Explanation (MuSE)

🧠 Project Overview

This project is focused on Multimodal Sarcasm Explanation (MuSE). The goal is to generate natural language explanations for sarcastic social media posts by leveraging both textual and visual information. It is based on a simplified version of the TURBO architecture proposed in this PAPER.


📁 Dataset: MORE+ (MuSE)

The dataset includes sarcastic posts from Twitter, Instagram, and Tumblr, and for each post:

  • An image
  • A text caption
  • A sarcasm explanation
  • A sarcasm target

Files Used:

  • train_df.tsv, val_df.tsv, test_df.tsv: Main data with pid, text, explanation, target_of_sarcasm
  • D_*.pkl: Image descriptions (from BLIP or similar model)
  • O_*.pkl: Object detection labels (from YOLOv9)
  • images/: Folder with all post images

🏗️ Model Architecture

🔤 Text Encoder & Decoder

🖼️ Vision Encoder

🔀 Shared Fusion Mechanism

A custom module that:

  • Applies multi-head self-attention to text and image embeddings
  • Computes gated cross-modal attention (text-guided vision and vision-guided text)
  • Produces a fused representation used as inputs_embeds to BART

🎯 Sarcasm Target

The sarcasm target is concatenated to the input and influences the explanation.


🧪 Evaluation Metrics

Model outputs were evaluated on the validation and test sets using:

Metric Score
BLEU-1 0.5394
BLEU-2 0.4449
BLEU-3 0.3830
BLEU-4 0.3296
ROUGE-1 0.5127
ROUGE-2 0.3536
ROUGE-L 0.4835
ROUGE-Lsum 0.4837
METEOR 0.5167
BERTScore (F1) 0.4835

These scores are competitive with state-of-the-art TURBO model results.


📂 Project Structure

├── main.ipynb                      # Main notebook
├── shared_fusion_epochN.py         # Saved custom fusion module checkpoints
├── bart_gen_epochN.pt              # Saved BART model checkpoints
├── shared_fusion_epochN.pt         # Saved fusion model checkpoints
├── MORE-PLUS-DATASET/              # Folder for .tsv, .pkl, and images
├── test_predictions.tsv            # Generated sarcasm explanations
├── README.md                       # This file

✍️ Author

Feel free to fork and improve!

About

MuSE is a NLP model for explaining sarcastic social media posts using both text and images. It combines BART and ViT with a shared fusion module and sarcasm target input. Trained on the MORE+ dataset, it achieves strong scores on BLEU, ROUGE, METEOR, and BERTScore metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published