Skip to content

Commit 4175644

Browse files
committed
update gh-pages
1 parent b78b43c commit 4175644

File tree

1 file changed

+34
-22
lines changed

1 file changed

+34
-22
lines changed

index.html

Lines changed: 34 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,8 @@ <h2 class="subtitle is-3 publication-subtitle"> Think before You Score in Genera
163163
<sup>6</sup><span class="author-block">Yi Lu,</span>
164164
<sup>2</sup><span class="author-block">Keming Wu,</span>
165165
<sup>2</sup><span class="author-block">Benjamin Schneider,</span>
166-
<sup>2</sup><span class="author-block">Quy Duc Do,</span>
167166
<br>
167+
<sup>2</sup><span class="author-block">Quy Duc Do,</span>
168168
<sup>2</sup><span class="author-block">Zhuofeng Li,</span>
169169
<sup>6</sup><span class="author-block">Yiming Jia,</span>
170170
<sup>2</sup><span class="author-block">Yuxuan Zhang,</span>
@@ -189,6 +189,7 @@ <h2 class="subtitle is-3 publication-subtitle"> Think before You Score in Genera
189189
<sup>3</sup>Independent,
190190
<sup>4</sup>2077AI,
191191
<sup>5</sup>M-A-P,
192+
<br>
192193
<sup>6</sup>University of Toronto,
193194
<sup>7</sup>Zhejiang University,
194195
<sup>8</sup>Abaka AI,
@@ -198,7 +199,7 @@ <h2 class="subtitle is-3 publication-subtitle"> Think before You Score in Genera
198199
<br>
199200
<div class="is-size-5 publication-authors">
200201
<span class="author-block">
201-
*Equal Contribution, Xuan He lead the project
202+
*Equal Contribution, Xuan leads the project
202203
</span>
203204
</div>
204205
<div class="is-size-5 publication-authors">
@@ -365,56 +366,67 @@ <h2 class="title is-4">
365366

366367
<div class="hero-body has-text-centered">
367368
<h2 class="title is-4">
368-
<span style="vertical-align: middle">Evaluation Results</span>
369+
<span style="vertical-align: middle">Evaluation on VideoScore2-Bench</span>
369370
</h2>
370371
</div>
371372
<div class="container is-max-desktop">
372373
<div class="columns is-centered">
373374
<div class="column is-full-width">
374-
<h2 class="title is-5"><span style="font-size: 100%;">
375-
VideoScore2-Bench</span></h2>
376375
<p>
377376
We test VideoScore2 on the test set of VideoFeedback2, containing 500 videos with human scores from three dimensions.
378377
Below we show the results of some scoring/reward models for image or video, like VideoReward, VisionReward, Q-Insight, DeQA-Score, and MLLM-prompting methods like Claude-Sonnet-4, GPT-5, Gemini-2.5-Pro, etc and our VideoScore2.
379378
</p>
380-
<img id="res_vs2_bench" width="100%" src="static/images/res_vs2_bench.png">
379+
<img id="res_vs2_bench" width="70%" src="static/images/res_vs2_bench.png">
380+
</div>
381+
</div>
382+
</div>
383+
384+
381385

382-
<br>
383-
<h2 class="title is-5"><span style="font-size: 100%;">
384-
<img id="painting_icon" width="3%" src="static/images/ec_icon.png">
385-
Out-of-Domain Benchmark</span></h2>
386+
<div class="hero-body has-text-centered">
387+
<h2 class="title is-4">
388+
<span style="vertical-align: middle">Evaluation on Out-of-Domain Benchmarks</span>
389+
</h2>
390+
</div>
391+
<div class="container is-max-desktop">
392+
<div class="columns is-centered">
393+
<div class="column is-full-width">
386394
<p>
387-
We further test on four out-of-domain (OOD) benchmarks: two pairwise preference and two point
388-
score. Preference benchmark results include ties. As shown in Table below, while VideoScore2 is
395+
We further test on four out-of-domain (OOD) benchmarks: two pairwise preference (VideoGenReward-Bench and T2VQA-DB (preference version)) and two point score (MJ-Bench-Video and Video-Phy2-test).
396+
397+
Preference benchmark results include ties. As shown in Table below, while VideoScore2 is
389398
not always the top model on each benchmark, it achieves the highest overall average.
390399
</p>
391-
<img id="res_ood_bench" width="100%" src="static/images/res_ood_bench.png">
400+
<img id="res_ood_bench" width="70%" src="static/images/res_ood_bench.png">
392401
</div>
393402
</div>
394403
</div>
395404

405+
<div class="hero-body has-text-centered">
406+
<h2 class="title is-4">
407+
<span style="vertical-align: middle">Best-of-N Sampling with VideoScore2</span>
408+
</h2>
409+
</div>
410+
396411
<div class="container is-max-desktop">
397412
<div class="columns is-centered">
398413
<div class="column is-full-width">
399-
<h2 class="title is-5"><span style="font-size: 100%;">
400-
Best-of-N Sampling</span></h2>
401414
<p>
402415
We evaluate VIDEOSCORE2 with best-of-n (BoN) sampling (n = 5), where the model selects the
403416
best video among candidates. Six T2V models of moderate or poor quality are used, avoiding very
404417
strong ones to highlight the BoN effect. For 500 prompts, each model generates 500 × 5 videos.
405418
Comparison on VBench shows BoN consistently outperforms random sampling, confirm
406419
ing that VIDEOSCORE2 effectively guides higher-quality selection.
407420
</p>
408-
<img id="BoN" width="100%" src="static/images/BoN.png">
421+
<img id="BoN" width="70%" src="static/images/BoN.png">
409422
</div>
410423
</div>
411424
</div>
425+
426+
</section>
412427

413428

414-
415429

416-
417-
</section>
418430

419431
<section class="hero is-light is-small">
420432
<div class="hero-body has-text-centered">
@@ -424,9 +436,9 @@ <h2 class="title is-3">
424436
</div>
425437
</section>
426438

427-
<img id="cs1" width="100%" src="static/images/case1.png">
428-
<img id="cs2" width="100%" src="static/images/case2.png">
429-
<img id="cs3" width="100%" src="static/images/case3.png">
439+
<img id="cs1" width="70%" src="static/images/case1.png">
440+
<img id="cs2" width="70%" src="static/images/case2.png">
441+
<img id="cs3" width="70%" src="static/images/case3.png">
430442

431443

432444
<section class="section" id="BibTeX">

0 commit comments

Comments
 (0)