Skip to content

Commit 3c89437

Browse files
committed
fix: update readme
1 parent dbf10f0 commit 3c89437

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ You are strongly recommended to use a sandbox such as [docker](https://docs.dock
231231
docker run -v $(pwd):/app bigcodebench/bigcodebench-evaluate:latest --subset [complete|instruct] --samples samples.jsonl
232232
# ...Or locally ⚠️
233233
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl
234-
# ...If the ground truth is working locally
234+
# ...If the ground truth is working locally (due to some flaky tests)
235235
bigcodebench.evaluate --subset [complete|instruct] --samples samples.jsonl --no-gt
236236
```
237237
@@ -335,7 +335,7 @@ We share pre-generated code samples from LLMs we have [evaluated](https://bigcod
335335
336336
- [ ] We notice that some tasks heavily use memory for scientific modeling during testing. It will lead to timeout issues on some machines. If you get an error message like `Check failed: ret == 0 (11 vs. 0)Thread creation via pthread_create() failed.` in Tensorflow, it is very likely due to the memory issue. Try to allocate more memory to the process or reduce the number of parallel processes.
337337
338-
- [ ] Due to the flakes in the evaluation, the execution results may vary slightly (~0.5%) between runs. We are working on improving the evaluation stability.
338+
- [ ] Due to the flakes in the evaluation, the execution results may vary slightly (~0.2%) between runs. We are working on improving the evaluation stability.
339339
340340
- [ ] We are aware of the issue that some users may need to use a proxy to access the internet. We are working on a subset of the tasks that do not require internet access to evaluate the code.
341341

0 commit comments

Comments
 (0)