Skip to content

Commit 05490a2

Browse files
committed
fix: update readme
1 parent 45ac535 commit 05490a2

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -227,12 +227,12 @@ docker run -it --entrypoint bigcodebench.syncheck -v $(pwd):/app bigcodebench/bi
227227
You are strongly recommended to use a sandbox such as [docker](https://docs.docker.com/get-docker/):
228228
229229
```bash
230-
# mount the current directory to the container
231-
docker run -v $(pwd):/app bigcodebench/bigcodebench-evaluate:latest --subset [complete|instruct] --samples samples.jsonl
230+
# Mount the current directory to the container
231+
docker run -v $(pwd):/app bigcodebench/bigcodebench-evaluate:latest --subset [complete|instruct] --samples samples-sanitized-calibrated
232232
# ...Or locally ⚠️
233-
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl
233+
bigcodebench.evaluate --subset [complete|instruct] --samples samples-sanitized-calibrated
234234
# ...If the ground truth is working locally (due to some flaky tests)
235-
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl --no-gt
235+
bigcodebench.evaluate --subset [complete|instruct] --samples samples-sanitized-calibrated --no-gt
236236
```
237237
238238
...Or if you want to try it locally regardless of the risks ⚠️:
@@ -247,9 +247,9 @@ Then, run the evaluation:
247247
248248
```bash
249249
# ...Or locally ⚠️
250-
bigcodebench.evaluate --subset [complete|instruct] --samples samples-calibrated.jsonl
250+
bigcodebench.evaluate --subset [complete|instruct] --samples samples-sanitized-calibrated.jsonl
251251
# ...If the ground truth is not working locally
252-
bigcodebench.evaluate --subset [complete|instruct] --samples samples-calibrated.jsonl --no-gt
252+
bigcodebench.evaluate --subset [complete|instruct] --samples samples-sanitized-calibrated --no-gt
253253
```
254254
255255
> [!Tip]
@@ -303,7 +303,7 @@ Here are some tips to speed up the evaluation:
303303
You can inspect the failed samples by using the following command:
304304
305305
```bash
306-
bigcodebench.inspect --eval-results sample-sanitized_eval_results.json --in-place
306+
bigcodebench.inspect --eval-results sample-sanitized-calibrated_eval_results.json --in-place
307307
```
308308
309309
## Full Script

0 commit comments

Comments
 (0)