Skip to content

Commit b5ef467

Browse files
committed
update remarks section and remove old scores
1 parent 8b5a5c8 commit b5ef467

File tree

2 files changed

+2
-84
lines changed

2 files changed

+2
-84
lines changed

README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ We use [`accelerate`](https://huggingface.co/docs/accelerate/index) to generate
7070
accelerate config
7171
```
7272

73-
This evaluation harness can also be used in an evaluation only mode, you can use a Multi-CPU setting. For large model, set up the precision of the model using the `--precision` flag instead of accelerate config to have only one copy of the model in memory.
73+
This evaluation harness can also be used in an evaluation only mode, you can use a Multi-CPU setting. For large models, we recommend specifying the precision of the model using the `--precision` flag instead of accelerate config to have only one copy of the model in memory.
7474

7575
The evaluation part (solutions execution) for [MultiPL-E](https://github.com/nuprl/MultiPL-E) requires extra dependencies for some programming languages, we provide a Dockerfile with all dependencies, see section [Docker](#docker-containers) for more details.
7676

@@ -172,9 +172,7 @@ To implement a new task in this evaluation harness, see the guide in [`docs/guid
172172
We provide documentation for the existing benchmarks and how we make the evaluation in [`docs/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/README.md).
173173

174174
## Remarks
175-
* Currenltly, we use parallel evaluation across multiple GPUs using `accelerate`, this assumes that you can fit the model in one GPU.
176-
* Please note this evaluation harness tries to cover a wide set of models, but there could still be room for improvement based on each model, some might require different prompt engineering or post-processing of the code generations.
177-
* For some scores of ongoing experiments please refer to [`example_scores/README.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/master/example_scores/README.md).
175+
* Currenltly, we use data parallel evaluation across multiple GPUs using `accelerate`, this assumes that you can fit the model in one GPU.
178176

179177
## Acknowledgements
180178
We thank EleutherAI for their work on the [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness) from which this repository is inspired.

example_scores/README.md

Lines changed: 0 additions & 80 deletions
This file was deleted.

0 commit comments

Comments
 (0)