Skip to content

Releases: logic-star-ai/swt-bench

Release 1.3.0 - Parser patches

13 Sep 19:04
443e03b

Choose a tag to compare

This release updates some parsers to more accurately grade evaluation results. Especially for more capable methods, this can affect benchmark scores by up to 4%.

Further it includes small bugfixes, such as not being able to run without coverage or not being able to run specific instances.

What's Changed

Full Changelog: 1.2.0...1.3.0

Release 1.2.0 - Reproductions script mode

06 Mar 08:43
5d6e00e

Choose a tag to compare

This release adds a "reproduction script" mode for SWT-Bench. In this mode (which was leveraged by i.e. AEGIS), the test is not required to fit into the unit test framework of the repository but can be a standalone script. We compute coverage delta as usual and count a non-zero exit code of the script as failing and a zero exit code as passing. In this setting, it is not possible to adversely affect other test cases in the framework.

What's Changed

Full Changelog: 1.1.0...1.2.0

Release 1.1.0 - SWT-Bench Verified

04 Mar 06:21

Choose a tag to compare

This release transfers a number of further patches that have been reported useful in SWE-Bench and adds support for SWT-Bench Verified, obtained with the same quality criteria as SWT-Bench Lite.

We released SWT-Bench Verified and published the three best performing methods of SWT-Bench Lite as baselines on our website.

What's Changed

Full Changelog: 1.0.1...1.1.0

Release 1.0.1 - Patch instances

16 Nov 10:46
f42d9fe

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.0.0...1.0.1

Release 1.0.0 - Initial Release

01 Nov 16:18

Choose a tag to compare

This version is the original code of the Neurips Published paper "SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents"

Full Changelog: https://github.com/logic-star-ai/swt-bench/commits/1.0.0