Releases · logic-star-ai/swt-bench

13 Sep 19:04

1.3.0

443e03b

Latest

This release updates some parsers to more accurately grade evaluation results. Especially for more capable methods, this can affect benchmark scores by up to 4%.

Further it includes small bugfixes, such as not being able to run without coverage or not being able to run specific instances.

What's Changed

Just eval raw output when not computing coverage by @nielstron in #32
Fix grading of unmatched tests. by @mnmueller in #33
Allow filtering of instances to official SWT during evaluation. by @mnmueller in #34
Implement suggested Fix #29 by @nielstron in #31
Fix copying too large files into docker containers by @nielstron in #35

Full Changelog: 1.2.0...1.3.0

Contributors

nielstron and mnmueller

Assets 2

06 Mar 08:43

nielstron

1.2.0

5d6e00e

Release 1.2.0 - Reproductions script mode

This release adds a "reproduction script" mode for SWT-Bench. In this mode (which was leveraged by i.e. AEGIS), the test is not required to fit into the unit test framework of the repository but can be a standalone script. We compute coverage delta as usual and count a non-zero exit code of the script as failing and a zero exit code as passing. In this setting, it is not possible to adversely affect other test cases in the framework.

What's Changed

Add reproduction script mode by @nielstron in #20

Full Changelog: 1.1.0...1.2.0

Contributors

nielstron

Assets 2

04 Mar 06:21

nielstron

1.1.0

5c85ecf

Release 1.1.0 - SWT-Bench Verified

This release transfers a number of further patches that have been reported useful in SWE-Bench and adds support for SWT-Bench Verified, obtained with the same quality criteria as SWT-Bench Lite.

We released SWT-Bench Verified and published the three best performing methods of SWT-Bench Lite as baselines on our website.

What's Changed

Fix sklearn constants by @nielstron in #11
Reproduce docker image fixes (pinning versions) from SWE-Bench by @nielstron in #12
Add leaderboard website and submission instructions by @nielstron in #14
Add SWT-Verified by @nielstron in #18

Full Changelog: 1.0.1...1.1.0

Contributors

nielstron

Assets 2

16 Nov 10:46

nielstron

1.0.1

f42d9fe

Release 1.0.1 - Patch instances

What's Changed

Fix building django images by @zyone1991 in #6
Run install and test on several Python versions by @nielstron in #7

New Contributors

@zyone1991 made their first contribution in #6

Full Changelog: 1.0.0...1.0.1

Contributors

nielstron and zyone1991

Assets 2

01 Nov 16:18

nielstron

1.0.0

dce4aee

Release 1.0.0 - Initial Release

This version is the original code of the Neurips Published paper "SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents"

Full Changelog: https://github.com/logic-star-ai/swt-bench/commits/1.0.0

Assets 2

Releases: logic-star-ai/swt-bench

Release 1.3.0 - Parser patches

What's Changed

Contributors

Uh oh!

Release 1.2.0 - Reproductions script mode

What's Changed

Contributors

Uh oh!

Release 1.1.0 - SWT-Bench Verified

What's Changed

Contributors

Uh oh!

Release 1.0.1 - Patch instances

What's Changed

New Contributors

Contributors

Uh oh!

Release 1.0.0 - Initial Release

Uh oh!