-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi, thanks for sharing this great work.
I re-ran OneLLM on the MOSI dataset and obtained results that are lower than those reported in the paper. Below are the steps I followed and the issue I encountered.
Step 1: Separate audio and video without information loss
I first separated audio and video streams using ffmpeg.
(I also tried VideoFileClip from moviepy, and the final results were very similar.)
subprocess.run([
"ffmpeg", "-y",
"-i", str(in_path),
"-an", "-c:v", "copy", str(mp4_path)
])
subprocess.run([
"ffmpeg", "-y",
"-i", str(in_path),
"-vn", "-c:a", "pcm_s16le", str(wav_path)
])Step 2: Create IB prototypes for both training and test sets
python collect_IB_embeddings_MOSI_SF.py
python read_IB_embeddings_mosi.pyStep 3: Precompute modality tokens for the training set
(This step follows the instructions in the repository.)
Step 4: MissRAG + prompt engineering for the complete modality
python audiovideo_sentimentAnalysis_MOSI_retrieval.py \
--modal video audio \
--use_text_modality True \
--task_modals audio video \
--prototipe_prompt True \
--prompt_template <PROMPT>(I could not find --prompt_template )
Results
- My result (Complete): 70.7
- Reported in the paper: 75.51
Questions
- Is there any step that I may have missed or implemented incorrectly?
- Is the
prompt_templatecritical for reproducing the reported results, and could you clarify how it should be set? - If possible, could you share the precomputed IB_embeddings and modality_tokens (
.h5files) for MOSI, so that I can verify whether the discrepancy comes from feature extraction or the downstream pipeline?
Thanks in advance for your help!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels