Question about reproducing MOSI results



Hi, thanks for sharing this great work.

I re-ran **OneLLM** on the **MOSI** dataset and obtained results that are lower than those reported in the paper. Below are the steps I followed and the issue I encountered.

### Step 1: Separate audio and video without information loss

I first separated audio and video streams using `ffmpeg`.
(I also tried `VideoFileClip` from *moviepy*, and the final results were very similar.)

```bash
subprocess.run([
    "ffmpeg", "-y",
    "-i", str(in_path),
    "-an", "-c:v", "copy", str(mp4_path)
])

subprocess.run([
    "ffmpeg", "-y",
    "-i", str(in_path),
    "-vn", "-c:a", "pcm_s16le", str(wav_path)
])
```



### Step 2: Create IB prototypes for both training and test sets

```bash
python collect_IB_embeddings_MOSI_SF.py
python read_IB_embeddings_mosi.py
```


### Step 3: Precompute modality tokens for the training set

(This step follows the instructions in the repository.)


### Step 4: MissRAG + prompt engineering for the **complete** modality

```bash
python audiovideo_sentimentAnalysis_MOSI_retrieval.py \
    --modal video audio \
    --use_text_modality True \
    --task_modals audio video \
    --prototipe_prompt True \
    --prompt_template <PROMPT>
```
(I could not find `--prompt_template` )


### Results

* **My result (Complete): 70.7**
* **Reported in the paper: 75.51**


### Questions

1. Is there any step that I may have missed or implemented incorrectly?
2. Is the `prompt_template` critical for reproducing the reported results, and could you clarify how it should be set?
3. If possible, could you share the precomputed **IB_embeddings** and **modality_tokens** (`.h5` files) for MOSI, so that I can verify whether the discrepancy comes from feature extraction or the downstream pipeline?

Thanks in advance for your help!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about reproducing MOSI results #4

Step 1: Separate audio and video without information loss

Step 2: Create IB prototypes for both training and test sets

Step 3: Precompute modality tokens for the training set

Step 4: MissRAG + prompt engineering for the complete modality

Results

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about reproducing MOSI results #4

Description

Step 1: Separate audio and video without information loss

Step 2: Create IB prototypes for both training and test sets

Step 3: Precompute modality tokens for the training set

Step 4: MissRAG + prompt engineering for the complete modality

Results

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions