You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.template.html
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -289,6 +289,7 @@
289
289
<p>News</p>
290
290
</div>
291
291
<divclass="message-body">
292
+
<p><strong><time>2025-09-13</time></strong><ahref="https://logicstar.ai">LogicStar</a> claims the first place on SWT-Verified, achieving almost 80% accuracy. Meanwhile, we release a new version of SWT-Bench, resolving various issues in evaluation grading. This results generally in increasing previously reported scores between 2-3%. Special thanks to all contributors!</p>
292
293
<p><strong><time>2025-08-22</time></strong> The 1st and 3rd place on SWT-Verified are reclaimed by the latest release of <ahref="https://all-hands.dev">OpenHands</a>, equipped with the newly released <ahref="https://openai.com/index/introducing-gpt-5/">GPT-5</a> and <ahref="https://openai.com/index/introducing-gpt-5/">GPT-5-mini</a>, respectively.</p>
293
294
<p><strong><time>2025-08-11</time></strong><ahref="https://arxiv.org/abs/2508.06365">e-Otter++</a> claims the first position on the leaderboard with 50.7% and 60.7% on Lite and Verified respectively. They improve upon prior <ahref="https://arxiv.org/abs/2502.05368v2">Otter</a> by more deeply integrating execution feedback and heterogeneous prompts in the generation loop.</p>
294
295
<p><strong><time>2025-07-28</time></strong><ahref="https://github.com/uw-swag/AssertFlip">AssertFlip</a> demonstrates a method to generate test cases by flipping the semantics of generated passing tests, achieving superior performance with a success rate of 35.1% on SWT-Bench Lite and 43.4% on Verified.</p>
0 commit comments