Add news, update numbers for local reproduction

nielstron · nielstron · commit d51c79fde884 · 2025-09-13T20:53:02.000+02:00
diff --git a/docs/index.template.html b/docs/index.template.html
@@ -289,6 +289,7 @@
               <p>News</p>
             </div>
             <div class="message-body">
+            <p><strong><time>2025-09-13</time></strong> <a href="https://logicstar.ai">LogicStar</a> claims the first place on SWT-Verified, achieving almost 80% accuracy. Meanwhile, we release a new version of SWT-Bench, resolving various issues in evaluation grading. This results generally in increasing previously reported scores between 2-3%. Special thanks to all contributors!</p>
             <p><strong><time>2025-08-22</time></strong> The 1st and 3rd place on SWT-Verified are reclaimed by the latest release of <a href="https://all-hands.dev">OpenHands</a>, equipped with the newly released <a href="https://openai.com/index/introducing-gpt-5/">GPT-5</a> and <a href="https://openai.com/index/introducing-gpt-5/">GPT-5-mini</a>, respectively.</p>
             <p><strong><time>2025-08-11</time></strong> <a href="https://arxiv.org/abs/2508.06365">e-Otter++</a> claims the first position on the leaderboard with 50.7% and 60.7% on Lite and Verified respectively. They improve upon prior <a href="https://arxiv.org/abs/2502.05368v2">Otter</a> by more deeply integrating execution feedback and heterogeneous prompts in the generation loop.</p>
             <p><strong><time>2025-07-28</time></strong> <a href="https://github.com/uw-swag/AssertFlip">AssertFlip</a> demonstrates a method to generate test cases by flipping the semantics of generated passing tests, achieving superior performance with a success rate of 35.1% on SWT-Bench Lite and 43.4% on Verified.</p>
diff --git a/docs/runs.csv b/docs/runs.csv
@@ -27,4 +27,4 @@ verified,,Otter,GPT-4o,31.6,37.6,2025-03-10,unittest
 verified,,OpenHands,Cl. Sonnet 3.5,27.7,52.9,2025-02-28,unittest
 verified,,LIBRO,GPT-4o,17.8,38.0,2025-02-28,unittest
 verified,,Zero-Shot Plus,GPT-4o + BM25,14.3,34.0,2025-02-28,unittest
-verified,new,LogicStar AI,L*Agent v1, 79.9, 66.5,2025-09-12,unittest
+verified,new,LogicStar AI,L*Agent v1, 79.9, 66.5,2025-09-13,unittest