update gh-pages

hexuan21 · hexuan21 · commit b78b43c4dc81 · 2025-10-01T07:49:33.000Z
diff --git a/index.html b/index.html
@@ -295,12 +295,6 @@ <h2 class="subtitle is-3 publication-subtitle"> Think before You Score in Genera
               </div>
             </div>
 
-            <centering>
-              <div style="text-align: center;">
-                <img id="teaser" width="85%" src="static/images/teaser.png">     
-              </div>
-            </centering> 
-
           </div>
         </div>
       </div>
@@ -345,15 +339,24 @@ <h2 class="title is-3">
   </div>
 </section>  
 
+  <div class="hero-body has-text-centered">
+    <h2 class="title is-4">
+      <span style="vertical-align: middle">Overview</span>
+    </h2>
+  </div>
+  <div style="text-align: center;">
+    <img id="teaser" width="85%" src="static/images/teaser.png">     
+  </div>
+
 <section class="section"> 
   <div class="container is-max-desktop">
     <div class="columns is-centered">
       <div class="column is-full-width">
         <div class="content has-text-justified"> 
           <p>
-VideoScore2 is trained on the VideoFeedback2 dataset containing 27K human-annotated videos with both scores and rationales across three dimensions. We adopt a two-stage pipeline: first, supervised fine-tuning (SFT) on Qwen2.5-VL-7B-Instruct to establish format-following and scoring ability; then, reinforcement learning with Group Relative Policy Optimization (GRPO) to further align model outputs with human judgment and enhance analytical robustness.
+        VideoScore2 is trained on the VideoFeedback2 dataset containing 27K human-annotated videos with both scores and rationales across three dimensions. We adopt a two-stage pipeline: first, supervised fine-tuning (SFT) on Qwen2.5-VL-7B-Instruct to establish format-following and scoring ability; then, reinforcement learning with Group Relative Policy Optimization (GRPO) to further align model outputs with human judgment and enhance analytical robustness.
 
-Compared to VideoScore (v1), VS2 introduces interpretable scoring for three dimensions (Visual Quality, Text Alignment, Physical/Common-sense Consistency) and CoT-style rationales, achieving stronger generalization on out-of-domain benchmarks while providing transparent and human-aligned video evaluation.
+        Compared to VideoScore (v1), VS2 introduces interpretable scoring for three dimensions (Visual Quality, Text Alignment, Physical/Common-sense Consistency) and CoT-style rationales, achieving stronger generalization on out-of-domain benchmarks while providing transparent and human-aligned video evaluation.
           </p>
         </div>          
       </div>
@@ -362,7 +365,7 @@ <h2 class="title is-3">
 
   <div class="hero-body has-text-centered">
     <h2 class="title is-4">
-      <span style="vertical-align: middle">Evaluation Benchmarks</span>
+      <span style="vertical-align: middle">Evaluation Results</span>
     </h2>
   </div>
   <div class="container is-max-desktop">
@@ -389,13 +392,27 @@ <h2 class="title is-5"><span style="font-size: 100%;">
       </div>
     </div>
   </div>
-
-  <div class="hero-body has-text-centered">
-    <h2 class="title is-4">
-      <span style="vertical-align: middle">Best-of-N Sampling</span>
-    </h2>
+  
+  <div class="container is-max-desktop">
+    <div class="columns is-centered">
+      <div class="column is-full-width">
+        <h2 class="title is-5"><span style="font-size: 100%;">
+          Best-of-N Sampling</span></h2>
+        <p>
+          We evaluate VIDEOSCORE2 with best-of-n (BoN) sampling (n = 5), where the model selects the
+ best video among candidates. Six T2V models of moderate or poor quality are used, avoiding very
+ strong ones to highlight the BoN effect. For 500 prompts, each model generates 500 × 5 videos.
+ Comparison on VBench shows BoN consistently outperforms random sampling, confirm
+ing that VIDEOSCORE2 effectively guides higher-quality selection.
+        </p>
+        <img id="BoN" width="100%" src="static/images/BoN.png">
+      </div>
+    </div>
   </div>
 
+
+  
+
     
 </section>