Skip to content

Format of fail_to_pass in Hugging Face is not valid JSON #14

@jtmcg

Description

@jtmcg

While trying to extract the test cases used for assigning a successful or failed eval, we noticed the test cases in fail_to_pass are not valid JSON:

example from benchmark instance_NodeBB__NodeBB-04998908ba6721d64eba79ae3b65a351dcfbc5b5-vnan:

["test/database.js | Test database test/database/keys.js::Key methods should return multiple keys and null if key doesn't exist", 'test/database.js | Test database test/database/keys.js::Key methods should return empty array if keys is empty array or falsy', 'test/user/emails.js | email confirmation (library methods) canSendValidation should return true if it has been long enough to re-send confirmation']

This means that when trying to load those test cases we are getting an error:

import json
from datasets import load_dataset
ds = load_dataset("ScaleAI/SWE-bench_Pro")
json.loads(ds['test'][0]['fail_to_pass'])

# json.decoder.JSONDecodeError: Expecting value: line 1 column 131 (char 130)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions