unable to reproduce the Dreamgen Bench metrics in the paper

Hello, and thank you for your work. However, I was unable to reproduce the Dreamgen Bench metrics reported in the paper using the Cosmos-Predict2-14B-Sample-GR00T-Dreams-GR1 model. Did I miss something here? Thank you.
Here is the procedure I followed:
1. Generated videos by running inference with Cosmos-Predict2-14B-Sample-GR00T-Dreams-GR1;
```
python -m examples.video2world_gr00t \ 
--model_size 14B \ 
--gr00t_variant gr1 \ 
--batch_input_json dream_gen_benchmark/gr1_object/batch_input.json \ 
--disable_guardrail
```
2. Evaluated them on the Dreamgen benchmark with zero_shot set to false.
```
python -m dreamgenbench.eval_sr_qwen_whole \
    --video_dir "$video_dir" \ 
    --output_csv "$csv_path" \  
    --device "$device" \ 
    --zeroshot false
```
The results I obtained are shown below, 

<img width="1006" height="237" alt="Image" src="https://github.com/user-attachments/assets/836d6cb1-4278-40d0-ba71-b0a0a856f6a5" />

alongside those reported in the paper:

<img width="1280" height="450" alt="Image" src="https://github.com/user-attachments/assets/22332f92-b548-4470-a2d0-cc6d9cb0613a" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to reproduce the Dreamgen Bench metrics in the paper #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unable to reproduce the Dreamgen Bench metrics in the paper #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions