Code for the MM 2024 paper
This paper proposes a knowledge-enhanced self-supervised balanced representation approach (KEBR) to capture common sentimental knowledge in unlabeled videos and explore the optimization issue of information imbalance between modalities.
$ git clone https://github.com/aoqzhu/KEBR.git$ conda create -n envir_name python=3.8
$ source activate envir_name
$ pip install -r requirements.txtRaw pretraining datasets VoxCeleb1 and VoxCeleb2 can be acquired in this website (You may need to apply for an account and password to get permission to download). Raw CMU-MOSI and CMU-MOSEI datasets can be acquired in this website.
The baseline model is not pretrained with unlabeled video data.
$ CUDA_VISIBLE_DEVICES=0 python baseline.pyYou can change command line arguments to train different models on different datasets and backbone language models.
Sentiment knowledge enhanced pretraining.
$ CUDA_VISIBLE_DEVICES=0 python pretrain.py$ CUDA_VISIBLE_DEVICES=0 python language_model_classifier.pyYou can change command line arguments to train different models on different datasets and language models.
$ CUDA_VISIBLE_DEVICES=0 python main.py