You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.html
+4-7Lines changed: 4 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -138,13 +138,10 @@
138
138
<tr>
139
139
<td>
140
140
<pclass="subtitle">
141
-
SWT-Bench is a benchmark for software testing capabilities on real-world code bases.
141
+
TLDR; Can your model write reproduction tests for real-world issues?
142
142
</p>
143
143
<pstyle="max-width: 750px;">
144
-
SWT-Bench is based on real-world GitHub issues and pull-requests of popular Python libraries. Given the code base and a user-reported issue, the task is to generate a reproducing test. A test is considered to be reproducing if it fails on the original code base but passes after the pull-request resolving the issue has been applied.
145
-
</p>
146
-
<pstyle="margin-top: 0.5em;">
147
-
SWT-Bench is a project by <ahref="https://logicstar.ai">LogicStar AI</a> and the <ahref="https://www.sri.inf.ethz.ch/">Secure, Reliable, and Intelligent Systems Lab</a> at ETH Zürich.<br/>
144
+
The task of SWT-Bench is to reproduce a reported issue by adding an appropriate test case to the projects' test suite. The test should fail on the original code base and pass after the issue is resolved.
<h5class="subtitle is-5">Generate software tests which reproduce user-reported issues and increase code coverage.</h5>
398
-
<p>Each SWT-Bench task is based on a pull-request from a GitHub repository which resolves a reported issue and contains the code patch fixing the issue and unit tests testing the fix. Given the original state of the code base and the user issue, the task is to generate tests that reproduce this issue. These generated tests are expected to fail on the original code base and pass after the issue is resolved.
395
+
<p>Each SWT-Bench task is based on a real-world pull-request from a GitHub repository which resolves a user-reported issue and contains the code patch fixing the issue and unit tests testing the fix. Given the original state of the code base and the user issue, the task is to generate tests that reproduce this issue. These generated tests are expected to fail on the original code base and pass after the issue is resolved.
0 commit comments