Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
39ccf58
enable gemini veo3.1; replicate: tested on google and minimax; openai…
snyhlxde1 Oct 28, 2025
1bb6736
enable gemini veo3.1; replicate: tested on google and minimax; openai…
snyhlxde1 Oct 28, 2025
785e837
enable gemini veo3.1; replicate: tested on google and minimax; openai…
snyhlxde1 Oct 28, 2025
7e0b5af
support wan and klong
snyhlxde1 Oct 28, 2025
72a50f6
merge with main
snyhlxde1 Oct 28, 2025
9bf5a57
Merge branch 'main' into lanxiang/videogen_backend_dev
snyhlxde1 Oct 30, 2025
bdb4e09
Merge branch 'main' into lanxiang/videogen_backend_dev
snyhlxde1 Oct 30, 2025
a5fd5e2
add preliminary expvid filtering code
snyhlxde1 Oct 30, 2025
062e168
update readme
snyhlxde1 Oct 30, 2025
0bb101c
Merge branch 'main' into lanxiang/videogen_backend_dev
snyhlxde1 Oct 30, 2025
b07a23b
add preliminary vlm as a judge; passed functional unit test
snyhlxde1 Nov 5, 2025
3438250
add preliminary vlm as a judge; passed functional unit test
snyhlxde1 Nov 5, 2025
be7674c
add luma, veo3gen support; add vlm as a judge basics
snyhlxde1 Nov 6, 2025
caba4fe
standardize rubrics
snyhlxde1 Nov 7, 2025
2eb4310
standardize rubrics
snyhlxde1 Nov 7, 2025
4f02fe2
merge with main
snyhlxde1 Nov 7, 2025
8086f03
merge with main
snyhlxde1 Nov 8, 2025
fab1e29
add tools for data processing, gcloud upload; update prepare_experime…
snyhlxde1 Nov 9, 2025
b41d5e8
merge with update on main: checklist-based vlm as a judge
snyhlxde1 Nov 9, 2025
d46e003
add tools for data processing, gcloud upload; update prepare_experime…
snyhlxde1 Nov 9, 2025
c9e0461
Merge branch 'main' into lanxiang/videogen_backend_dev
snyhlxde1 Nov 9, 2025
4b12b5c
clean up redundancy and update gitignore
snyhlxde1 Nov 9, 2025
c7346ba
Merge branch 'main' into lanxiang/videogen_backend_dev
snyhlxde1 Nov 9, 2025
2c0caac
update vlm as a judge: batched runs
snyhlxde1 Nov 12, 2025
dd3fd3e
Merge branch 'main' into lanxiang/videogen_backend_dev
snyhlxde1 Nov 12, 2025
63806db
update readme for vlm as a judge instruction
snyhlxde1 Nov 21, 2025
888294a
update readme for vlm as a judge instruction
snyhlxde1 Nov 21, 2025
4bc552a
add data
snyhlxde1 Jan 8, 2026
85015b0
upload data
snyhlxde1 Jan 8, 2026
32616d5
merge with main
snyhlxde1 Jan 8, 2026
3590e22
merge with main
snyhlxde1 Feb 13, 2026
d6b9ca0
update README 12/13
snyhlxde1 Feb 13, 2026
11dff84
refine README text layout
snyhlxde1 Feb 13, 2026
b254f6e
clarify data download
snyhlxde1 Feb 13, 2026
83c1398
clarify topics
snyhlxde1 Feb 13, 2026
ffaf8dd
add logo
snyhlxde1 Feb 13, 2026
c013aa7
add logo
snyhlxde1 Feb 13, 2026
3f73cb8
add logo figs
snyhlxde1 Feb 13, 2026
8f135db
adjust fig dimensions
snyhlxde1 Feb 13, 2026
5e57f89
add license
snyhlxde1 Feb 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Hao AI Lab

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
183 changes: 172 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,98 @@
# VideoScience
<p align="center">
<img src="assets/VideoScience-Logo-v1.png" alt="vsci-bench-logo" width="220" align="center">
</p>


<div align="center"><h1>&nbsp;VideoScience-Bench: Benchmarking Scientific Reasoning in Video Generations</h1></div>


<p align="center">
<a href="https://arxiv.org/pdf/2512.02942">📄 Paper</a> •
<a href="https://hao-ai-lab.github.io/blogs/videoscience/">📝 Blog</a> •
<a href="https://huggingface.co/datasets/lmgame/VideoScienceBench">🤗 Dataset</a> •
<a href="https://huggingface.co/spaces/lmgame/videoscience-bench">🚀 Demo</a>
</p>

---

## What this repo provides

**VideoScience-Bench** evaluates whether video models can go beyond *looking plausible* to *being scientifically correct*.

- **200** undergraduate-level scientific scenarios (physics + chemistry)
- 160 for T2V evaluation
- and 40 for I2V evaluation
- **12 topics**, **103 concepts**, and **multi-concept scientific reasoning required** in a single prompt
- Evaluation along **5 dimensions** (Prompt Consistency, Phenomenon Congruency, Correct Dynamism, Immutability, Spatio-Temporal Coherence)

**VideoScience-Judge** is an auto evaluation pipeline that supports:
1) **Prompt-specific checklist** generation
2) **CV-grounded evidence extraction** (e.g., object detection, object tracking, motion tracking)
3) **Salient key frames selection** where scientific phenomena occur
4) final grading with a **reasoning-capable VLM**

---

## Table of Contents

- [Dataset Overview](#dataset-overview)
- [Installation](#installation)
- [Usage](#usage)
- [Understand Evaluation Metrics](#evaluation-metrics)
- [VideoScience-Judge Results](#videoscience-judge-results)
- [Citation](#citation)
- [License](#license)

---

## Dataset Overview

VideoScience-Bench is curated to stress **scientific reasoning** in video generation: each prompt typically requires **at least 2 interacting scientific concepts** to produce the correct phenomenon.

### Topics (12)

**Physics (7):**
- Classical Mechanics
- Thermodynamics
- Electromagnetism
- Optics
- Fluid Mechanics
- Material Mechanics
- Modern Physics

**Chemistry (5):**
- Redox Reactions
- Acid-Base
- Reaction Kinetics
- Solution and Phase Chemistry
- Materials and Solid-State Chemistry

### What each example contains

The prompt suite is lightweight and easy to integrate into any video generation harness.

Common fields (as in the HF release):
- `prompt`: the experimental setup + procedure
- `expected phenomenon`: a concise description of what should happen if the laws are obeyed
- `keywords`: fine-grained scientific concepts involved
- `field`: Physics / Chemistry
- `vid`: instance id

### Loading from Hugging Face

```python
from datasets import load_dataset

ds = load_dataset("lmgame/VideoScienceBench")
data = ds["test"]

# sanity check an example with
print(data[0]["prompt"])
print(data[0]["expected phenomenon"])
print(data[0]["keywords"])
```

---

## Installation

Expand Down Expand Up @@ -42,22 +136,89 @@ export FASTVIDEO_MODEL_PATH="FastVideo/FastWan2.1-T2V-1.3B-Diffusers"
- CUDA-capable GPU with sufficient VRAM
- PyTorch with CUDA support

---

## Usage

### launch batched video generation
1. Download csv data file under `data/database/data.csv`.
2. Launch script:
```
### 1) Batched video generation

1. Download the CSV data file under `data/database/data_filtered.jsonl`.
2. Launch the script:

```bash
bash scripts/batched_generation_using_csv.sh
```

### launch single video generation
1. Launch script:
```
python3 single_generation_frontend.py --provider {provider_name} --model {model_name} --prompt {customized_prompt}
```
### 2) Single video generation

### VLM as a judge
```bash
python3 single_generation_frontend.py \
--provider {provider_name} \
--model {model_name} \
--prompt "{your_prompt}"
```

### 3) VLM-as-a-judge evaluation

```bash
bash judge/batched_evaluate_all_models.sh
```

---

## Evaluation Metrics

We evaluate each generated video on **five dimensions** (Likert **1–4**):

- **Prompt Consistency (PCS)**: is the setup/procedure faithful to the prompt?
- **Phenomenon Congruency (PCG)**: does the correct scientific outcome occur?
- **Correct Dynamism (CDN)**: are motions / dynamics physically consistent?
- **Immutability (IMB)**: are static attributes preserved (no flicker/identity drift)?
- **Spatio-Temporal Coherence (STC)**: is the video coherent over time and space?

---

## VideoScience-Judge vs. Human Annotations

Manual scientific evaluation is expensive. VideoScience-Judge aims to be human expert-aligned while remaining scalable.

### Ranking correlation with expert ratings

We report ranking correlations between automatic metrics and **domain-expert annotations** across 7 evaluated video models.

| Metric | Kendall τ | Spearman ρ |
|---|---:|---:|
| **VSci-Judge** | **0.81** | **0.89** |
| **VSci-Judge (Checklist)** | **0.90** | **0.96** |
| **VSci-Judge (Checklist + CV evidence)** | **0.90** | **0.96** |
| PhyGenEval | 0.52 | 0.61 |
| VideoScore2 | 0.24 | 0.29 |

> Note: adding prompt-specific checklists (and optional CV evidence) makes the judge align **near-perfectly** with expert-ranked model quality on VideoScience-Bench.

### VideoScience-Judge Features

1. **[optional] Checklist generation**: create an evaluative rubric tied to the prompt
2. **[optional] CV-based evidence extraction** (optional but recommended): tracking, motion, attribute changes, key frames
3. **final grading**: VLM-as-a-judge reasons over the checklist + all evidences

---

## Citation

If you use VideoScience in your research, please cite:

```bibtex
@article{hu2025videoscience,
title={Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench},
author={Hu, Lanxiang and Shankarampeta, Abhilash and Huang, Yixin and Dai, Zilin and Yu, Haoyang and Zhao, Yujie and Kang, Haoqiang and Zhao, Daniel and Rosing, Tajana and Zhang, Hao},
journal={arXiv preprint arXiv:2512.02942},
year={2025}
}
```

---

## License

This project is released under the **MIT License**. See [LICENSE](LICENSE).
Binary file added assets/VideoScience-Logo-v1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.