Skip to content

Commit 7c7886e

Browse files
committed
feat: make gt assertion optional
1 parent 7d186ec commit 7c7886e

File tree

2 files changed

+10
-2
lines changed

2 files changed

+10
-2
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,8 @@ You are strongly recommended to use a sandbox such as [docker](https://docs.dock
189189
docker run -v $(pwd):/bigcodebench terryzho/bigcodebench-evaluate:latest --subset [complete|instruct] --samples samples.jsonl
190190
# ...Or locally ⚠️
191191
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl
192+
# ...If the ground truth is working
193+
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl --no-gt
192194
```
193195

194196
...Or if you want to try it locally regardless of the risks ⚠️:

bigcodebench/evaluate.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,11 @@ def evaluate(flags):
118118
results = compatible_eval_result(results)
119119
else:
120120
problems = get_bigcodebench()
121-
dataset_hash = get_bigcodebench_hash()
122-
expected_time = get_groundtruth(problems, dataset_hash, flags.check_gt_only)
121+
dataset_hash = get_bigcodebench_hash()
122+
if flags.no_gt:
123+
expected_time = [20]*len(problems)
124+
else:
125+
expected_time = get_groundtruth(problems, dataset_hash, flags.check_gt_only)
123126

124127
if flags.check_gt_only:
125128
return
@@ -253,6 +256,9 @@ def main():
253256
parser.add_argument(
254257
"--check-gt-only", action="store_true", help="Check the groundtruth"
255258
)
259+
parser.add_argument(
260+
"--no-gt", action="store_true", help="Check the groundtruth"
261+
)
256262
args = parser.parse_args()
257263

258264
evaluate(args)

0 commit comments

Comments
 (0)