You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For safety, we provide a Dockerfiles to do the execution inside a docker container. To do that, first, do the generation on your machine and save them in generations.json by adding the flag --generation_only to the command. Then build the docker container and run the evaluation inside it.
115
+
116
+
### Building Docker image
117
+
Here's how to build a docker image for the evaluation harness:
118
+
```bash
119
+
$ sudo make DOCKERFILE=Dockerfile all
120
+
```
121
+
This creates an image called `evaluation-harness`, and runs a test on it. To skip the test remove `all` form the command.
122
+
123
+
If you want to evaluate on MultiPL-E, we have a different Dockerfile since it requires more dependencies, use:
124
+
```bash
125
+
$ sudo make DOCKERFILE=Dockerfile-multiple all
126
+
```
127
+
This creates an image called `evaluation-harness-multiple`.
128
+
129
+
### Evaluating inside a container
130
+
Suppose you generated text with the `bigcode/santacoder` model and saved it in `generations.json` with:
131
+
```bash
132
+
accelerate launch main.py \
133
+
--model bigcode/santacoder \
134
+
--tasks multiple-py \
135
+
--max_length_generation 650 \
136
+
--temperature 0.8 \
137
+
--do_sample True \
138
+
--n_samples 200 \
139
+
--batch_size 200 \
140
+
--trsut_remote_code \
141
+
--generation_only \
142
+
--save_generations \
143
+
--save_generations_path generations_py.json
144
+
```
145
+
146
+
To run the container (here from image `evaluation-harness`) to evaluate on `generations.json`, or another file mount it with `-v`, specify `n_samples` and allow code execution with `--allow_code_execution` (and add the number of problems `--limit` if it was used during generation):
To implement a new task in this evaluation harness, see the guide in [`docs/guide`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/guide.md). The are also contribution guidelines in this [`CONTRIBUTING.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/CONTRIBUTING.md)
0 commit comments