File tree Expand file tree Collapse file tree 2 files changed +10
-2
lines changed
Expand file tree Collapse file tree 2 files changed +10
-2
lines changed Original file line number Diff line number Diff line change @@ -189,6 +189,8 @@ You are strongly recommended to use a sandbox such as [docker](https://docs.dock
189189docker run -v $( pwd) :/bigcodebench terryzho/bigcodebench-evaluate:latest --subset [complete| instruct] --samples samples.jsonl
190190# ...Or locally ⚠️
191191bigcodebench.evaluate --subset [complete| instruct] --samples samples.jsonl
192+ # ...If the ground truth is working
193+ bigcodebench.evaluate --subset [complete| instruct] --samples samples.jsonl --no-gt
192194```
193195
194196...Or if you want to try it locally regardless of the risks ⚠️:
Original file line number Diff line number Diff line change @@ -118,8 +118,11 @@ def evaluate(flags):
118118 results = compatible_eval_result (results )
119119 else :
120120 problems = get_bigcodebench ()
121- dataset_hash = get_bigcodebench_hash ()
122- expected_time = get_groundtruth (problems , dataset_hash , flags .check_gt_only )
121+ dataset_hash = get_bigcodebench_hash ()
122+ if flags .no_gt :
123+ expected_time = [20 ]* len (problems )
124+ else :
125+ expected_time = get_groundtruth (problems , dataset_hash , flags .check_gt_only )
123126
124127 if flags .check_gt_only :
125128 return
@@ -253,6 +256,9 @@ def main():
253256 parser .add_argument (
254257 "--check-gt-only" , action = "store_true" , help = "Check the groundtruth"
255258 )
259+ parser .add_argument (
260+ "--no-gt" , action = "store_true" , help = "Check the groundtruth"
261+ )
256262 args = parser .parse_args ()
257263
258264 evaluate (args )
You can’t perform that action at this time.
0 commit comments