Multimodal Sarcasm Explanation (MuSE)

🧠 Project Overview

This project is focused on Multimodal Sarcasm Explanation (MuSE). The goal is to generate natural language explanations for sarcastic social media posts by leveraging both textual and visual information. It is based on a simplified version of the TURBO architecture proposed in this PAPER.

📁 Dataset: MORE+ (MuSE)

The dataset includes sarcastic posts from Twitter, Instagram, and Tumblr, and for each post:

An image
A text caption
A sarcasm explanation
A sarcasm target

Files Used:

train_df.tsv, val_df.tsv, test_df.tsv: Main data with pid, text, explanation, target_of_sarcasm
D_*.pkl: Image descriptions (from BLIP or similar model)
O_*.pkl: Object detection labels (from YOLOv9)
images/: Folder with all post images

🏗️ Model Architecture

🔤 Text Encoder & Decoder

facebook/bart-base: Pretrained BART model for conditional text generation

🖼️ Vision Encoder

google/vit-base-patch16-224-in21k: Pretrained Vision Transformer

🔀 Shared Fusion Mechanism

A custom module that:

Applies multi-head self-attention to text and image embeddings
Computes gated cross-modal attention (text-guided vision and vision-guided text)
Produces a fused representation used as inputs_embeds to BART

🎯 Sarcasm Target

The sarcasm target is concatenated to the input and influences the explanation.

🧪 Evaluation Metrics

Model outputs were evaluated on the validation and test sets using:

Metric	Score
BLEU-1	0.5394
BLEU-2	0.4449
BLEU-3	0.3830
BLEU-4	0.3296
ROUGE-1	0.5127
ROUGE-2	0.3536
ROUGE-L	0.4835
ROUGE-Lsum	0.4837
METEOR	0.5167
BERTScore (F1)	0.4835

These scores are competitive with state-of-the-art TURBO model results.

📂 Project Structure

├── main.ipynb                      # Main notebook
├── shared_fusion_epochN.py         # Saved custom fusion module checkpoints
├── bart_gen_epochN.pt              # Saved BART model checkpoints
├── shared_fusion_epochN.pt         # Saved fusion model checkpoints
├── MORE-PLUS-DATASET/              # Folder for .tsv, .pkl, and images
├── test_predictions.tsv            # Generated sarcasm explanations
├── README.md                       # This file

✍️ Author

Feel free to fork and improve!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
MORE-PLUS-DATASET		MORE-PLUS-DATASET
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal Sarcasm Explanation (MuSE)

🧠 Project Overview

📁 Dataset: MORE+ (MuSE)

Files Used:

🏗️ Model Architecture

🔤 Text Encoder & Decoder

🖼️ Vision Encoder

🔀 Shared Fusion Mechanism

🎯 Sarcasm Target

🧪 Evaluation Metrics

📂 Project Structure

✍️ Author

About

Uh oh!

Releases

Packages

Languages

A-WASIF/Multimodal-Sarcasm-Explanation-MuSE-

Folders and files

Latest commit

History

Repository files navigation

Multimodal Sarcasm Explanation (MuSE)

🧠 Project Overview

📁 Dataset: MORE+ (MuSE)

Files Used:

🏗️ Model Architecture

🔤 Text Encoder & Decoder

🖼️ Vision Encoder

🔀 Shared Fusion Mechanism

🎯 Sarcasm Target

🧪 Evaluation Metrics

📂 Project Structure

✍️ Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages