- 
                Notifications
    
You must be signed in to change notification settings  - Fork 20
 
Open
Description
While trying to extract the test cases used for assigning a successful or failed eval, we noticed the test cases in fail_to_pass are not valid JSON:
example from benchmark instance_NodeBB__NodeBB-04998908ba6721d64eba79ae3b65a351dcfbc5b5-vnan:
["test/database.js | Test database test/database/keys.js::Key methods should return multiple keys and null if key doesn't exist", 'test/database.js | Test database test/database/keys.js::Key methods should return empty array if keys is empty array or falsy', 'test/user/emails.js | email confirmation (library methods) canSendValidation should return true if it has been long enough to re-send confirmation']
This means that when trying to load those test cases we are getting an error:
import json
from datasets import load_dataset
ds = load_dataset("ScaleAI/SWE-bench_Pro")
json.loads(ds['test'][0]['fail_to_pass'])
# json.decoder.JSONDecodeError: Expecting value: line 1 column 131 (char 130)Metadata
Metadata
Assignees
Labels
No labels