Skip to content

Commit 3ad3b8d

Browse files
authored
Merge pull request #44 from loubnabnl/integrate_multipl-e
Integrate MultiPL-E
2 parents 8b28d3a + afc0c69 commit 3ad3b8d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+1962
-193
lines changed

Dockerfile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
FROM ubuntu:22.04
2+
3+
RUN apt-get update && apt-get install -y python3 python3-pip
4+
5+
COPY . /app
6+
7+
WORKDIR /app
8+
9+
RUN test -f /app/generations.json && rm /app/generations.json || true
10+
11+
RUN pip3 install .
12+
13+
CMD ["python3", "main.py"]

Dockerfile-multiple

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
FROM ubuntu:22.04
2+
RUN apt-get update -yqq && apt-get install -yqq curl build-essential python3-pip python3-tqdm
3+
RUN apt-get install racket -yqq
4+
ARG DEBIAN_FRONTEND=noninteractive
5+
ENV TZ=Etc/UTC
6+
RUN apt-get install -yqq \
7+
default-jdk-headless \
8+
golang-go \
9+
php-cli \
10+
ruby \
11+
lua5.3 \
12+
r-base \
13+
rustc \
14+
scala
15+
16+
RUN apt-get install -yqq libtest-deep-perl
17+
RUN apt-get install -yqq wget
18+
19+
# JS/TS
20+
RUN curl -fsSL https://deb.nodesource.com/setup_current.x | bash -
21+
RUN apt-get install -y nodejs
22+
RUN npm install -g typescript
23+
24+
# Dlang
25+
RUN wget https://netcologne.dl.sourceforge.net/project/d-apt/files/d-apt.list -O /etc/apt/sources.list.d/d-apt.list
26+
RUN apt-get update --allow-insecure-repositories
27+
RUN apt-get -y --allow-unauthenticated install --reinstall d-apt-keyring
28+
RUN apt-get update && apt-get install -yqq dmd-compiler dub
29+
30+
# C#
31+
RUN apt install gnupg ca-certificates
32+
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
33+
RUN echo "deb https://download.mono-project.com/repo/ubuntu stable-focal main" | tee /etc/apt/sources.list.d/mono-official-stable.list
34+
RUN apt update
35+
RUN apt install -yqq mono-devel
36+
37+
# Post-processing
38+
39+
# Julia
40+
RUN curl https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.2-linux-x86_64.tar.gz | tar xz
41+
ENV PATH="/julia-1.8.2/bin:${PATH}"
42+
# Swift
43+
RUN curl https://download.swift.org/swift-5.7-release/ubuntu2204/swift-5.7-RELEASE/swift-5.7-RELEASE-ubuntu22.04.tar.gz | tar xz
44+
ENV PATH="/swift-5.7-RELEASE-ubuntu22.04/usr/bin:${PATH}"
45+
# Javatuples
46+
RUN mkdir /usr/multiple && wget https://repo.mavenlibs.com/maven/org/javatuples/javatuples/1.2/javatuples-1.2.jar -O /usr/multiple/javatuples-1.2.jar
47+
# Luaunit
48+
RUN apt-get update -yqq && apt-get install -yqq lua-unit
49+
50+
# Standard requirements
51+
COPY . /app
52+
WORKDIR /app
53+
RUN test -f /app/generations.json && rm /app/generations.json || true
54+
55+
RUN pip3 install .
56+
CMD ["python3", "main.py"]

README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,49 @@ Below is an example, be mind of specifying arguments proper to the task you are
110110
```bash
111111
accelerate launch main.py --tasks mbpp --allow_code_execution --load_generations_path generations.json --model incoder-temperature-08
112112
```
113+
## Docker containers
114+
For safety, we provide a Dockerfiles to do the execution inside a docker container. To do that, first, do the generation on your machine and save them in generations.json by adding the flag --generation_only to the command. Then build the docker container and run the evaluation inside it.
115+
116+
### Building Docker image
117+
Here's how to build a docker image for the evaluation harness:
118+
```bash
119+
$ sudo make DOCKERFILE=Dockerfile all
120+
```
121+
This creates an image called `evaluation-harness`, and runs a test on it. To skip the test remove `all` form the command.
122+
123+
If you want to evaluate on MultiPL-E, we have a different Dockerfile since it requires more dependencies, use:
124+
```bash
125+
$ sudo make DOCKERFILE=Dockerfile-multiple all
126+
```
127+
This creates an image called `evaluation-harness-multiple`.
128+
129+
### Evaluating inside a container
130+
Suppose you generated text with the `bigcode/santacoder` model and saved it in `generations.json` with:
131+
```bash
132+
accelerate launch main.py \
133+
--model bigcode/santacoder \
134+
--tasks multiple-py \
135+
--max_length_generation 650 \
136+
--temperature 0.8 \
137+
--do_sample True \
138+
--n_samples 200 \
139+
--batch_size 200 \
140+
--trsut_remote_code \
141+
--generation_only \
142+
--save_generations \
143+
--save_generations_path generations_py.json
144+
```
145+
146+
To run the container (here from image `evaluation-harness`) to evaluate on `generations.json`, or another file mount it with `-v`, specify `n_samples` and allow code execution with `--allow_code_execution` (and add the number of problems `--limit` if it was used during generation):
147+
```bash
148+
$ sudo docker run -v $(pwd)/generations_py.json:/app/generations_py.json:ro -it evaluation-harness-multiple python3 main.py \
149+
--model bigcode/santacoder \
150+
--tasks multiple-py \
151+
--load_generations_path /app/generations_py.json \
152+
--allow_code_execution \
153+
--temperature 0.8 \
154+
--n_samples 200
155+
```
113156

114157
## Implementing new tasks
115158
To implement a new task in this evaluation harness, see the guide in [`docs/guide`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/docs/guide.md). The are also contribution guidelines in this [`CONTRIBUTING.md`](https://github.com/bigcode-project/bigcode-evaluation-harness/blob/main/CONTRIBUTING.md)

finetuning/APPS/apps_dataset.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ class APPSBaseDataset(torch.utils.data.Dataset):
1010
def __init__(self, dataset, max_tokens, tokenizer_path):
1111
self.dataset = dataset
1212
self.max_tokens = max_tokens
13-
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_auth_token=True)
13+
self.tokenizer = AutoTokenizer.from_pretrained(
14+
tokenizer_path, use_auth_token=True
15+
)
1416
self.samples = [] # Should be set in initialize()
1517

1618
self.initialize(self.tokenizer)

finetuning/APPS/apps_train.py

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,12 @@
44

55
import argparse
66
import os
7-
import torch
87

8+
import torch
99
from apps_dataset import APPSBaseDataset
1010
from datasets import load_dataset
11-
from transformers import (
12-
AutoModelForCausalLM,
13-
Trainer,
14-
TrainingArguments,
15-
logging,
16-
set_seed,
17-
)
11+
from transformers import (AutoModelForCausalLM, Trainer, TrainingArguments,
12+
logging, set_seed)
1813

1914

2015
def get_args():
@@ -59,22 +54,20 @@ def run_training(args, train_data, val_data):
5954
training_args = TrainingArguments(
6055
output_dir=args.output_dir,
6156
dataloader_drop_last=True,
62-
evaluation_strategy = "steps",
57+
evaluation_strategy="steps",
6358
num_train_epochs=args.num_epochs,
64-
max_steps = args.max_steps,
65-
eval_steps = args.eval_freq,
59+
max_steps=args.max_steps,
60+
eval_steps=args.eval_freq,
6661
save_steps=args.save_freq,
6762
logging_steps=args.log_freq,
68-
6963
per_device_train_batch_size=args.batch_size,
7064
per_device_eval_batch_size=args.batch_size,
7165
learning_rate=args.learning_rate,
7266
lr_scheduler_type=args.lr_scheduler_type,
73-
warmup_steps = args.num_warmup_steps,
67+
warmup_steps=args.num_warmup_steps,
7468
gradient_accumulation_steps=args.gradient_accumulation_steps,
7569
weight_decay=args.weight_decay,
7670
fp16=args.fp16,
77-
7871
run_name="apps-train",
7972
report_to="wandb",
8073
)
@@ -99,8 +92,14 @@ def main(args):
9992
dataset.shuffle(seed=args.seed)
10093
data = get_dataset(dataset, args)
10194
train_size = int(0.95 * len(data))
102-
train_data, val_data = torch.utils.data.random_split(data, [train_size, len(data) - train_size], generator=torch.Generator().manual_seed(args.seed))
103-
print(f"size of training data {len(train_data)}\nsize of validation data {len(val_data)}")
95+
train_data, val_data = torch.utils.data.random_split(
96+
data,
97+
[train_size, len(data) - train_size],
98+
generator=torch.Generator().manual_seed(args.seed),
99+
)
100+
print(
101+
f"size of training data {len(train_data)}\nsize of validation data {len(val_data)}"
102+
)
104103
run_training(args, train_data, val_data)
105104

106105

finetuning/Code-to-text/train.py

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,15 @@
11
import argparse
22

33
from datasets import load_dataset
4-
5-
from transformers import (
6-
AutoModelForSequenceClassification,
7-
AutoTokenizer,
8-
Trainer,
9-
TrainingArguments,
10-
set_seed,
11-
)
4+
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
5+
Trainer, TrainingArguments, set_seed)
126

137

148
def get_args():
159
parser = argparse.ArgumentParser()
16-
parser.add_argument("--model_ckpt", type=str, default="microsoft/unixcoder-base-nine")
10+
parser.add_argument(
11+
"--model_ckpt", type=str, default="microsoft/unixcoder-base-nine"
12+
)
1713
parser.add_argument("--language", type=str, default="Python")
1814
parser.add_argument("--max_length", type=int, default=1024)
1915
parser.add_argument("--num_epochs", type=int, default=5)
@@ -40,7 +36,9 @@ def main():
4036
print("Loading tokenizer and model")
4137
tokenizer = AutoTokenizer.from_pretrained(args.model_ckpt)
4238
tokenizer.pad_token = tokenizer.eos_token
43-
model = AutoModelForSequenceClassification.from_pretrained(args.model_ckpt, num_labels=2)
39+
model = AutoModelForSequenceClassification.from_pretrained(
40+
args.model_ckpt, num_labels=2
41+
)
4442
model.config.pad_token_id = model.config.eos_token_id
4543

4644
if args.freeze:
@@ -49,13 +47,20 @@ def main():
4947

5048
def tokenize(example):
5149
if args.language == "Python":
52-
#remove docstring from code
50+
# remove docstring from code
5351
chunks = example["code"].split('"""')
5452
code = chunks[0].strip() + chunks[2]
5553
else:
5654
code = example["code"]
57-
inputs = tokenizer(code, padding="max_length", truncation=True, max_length=args.max_length)
58-
labels = tokenizer(example["docstring"], padding="max_length", truncation=True, max_length=args.max_length).input_ids
55+
inputs = tokenizer(
56+
code, padding="max_length", truncation=True, max_length=args.max_length
57+
)
58+
labels = tokenizer(
59+
example["docstring"],
60+
padding="max_length",
61+
truncation=True,
62+
max_length=args.max_length,
63+
).input_ids
5964
labels_with_ignore_index = []
6065
for labels_example in labels:
6166
labels_example = [label if label != 0 else -100 for label in labels_example]
@@ -99,10 +104,11 @@ def tokenize(example):
99104

100105
print("Training...")
101106
trainer.train()
102-
107+
103108
# push the model to the Hugging Face hub
104109
if args.push_to_hub:
105110
model.push_to_hub(args.model_hub_name)
106111

112+
107113
if __name__ == "__main__":
108114
main()

finetuning/CodeClone/train.py

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,17 @@
33

44
import numpy as np
55
from datasets import ClassLabel, load_dataset
6-
76
from evaluate import load
8-
from transformers import (
9-
AutoModelForSequenceClassification,
10-
AutoTokenizer,
11-
DataCollatorWithPadding,
12-
Trainer,
13-
TrainerCallback,
14-
TrainingArguments,
15-
set_seed,
16-
)
7+
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
8+
DataCollatorWithPadding, Trainer, TrainerCallback,
9+
TrainingArguments, set_seed)
1710

1811

1912
def get_args():
2013
parser = argparse.ArgumentParser()
21-
parser.add_argument("--model_ckpt", type=str, default="microsoft/unixcoder-base-nine")
14+
parser.add_argument(
15+
"--model_ckpt", type=str, default="microsoft/unixcoder-base-nine"
16+
)
2217
parser.add_argument("--max_length", type=int, default=1024)
2318
parser.add_argument("--num_epochs", type=int, default=5)
2419
parser.add_argument("--batch_size", type=int, default=6)
@@ -52,7 +47,9 @@ def __init__(self, trainer) -> None:
5247
def on_epoch_end(self, args, state, control, **kwargs):
5348
if control.should_evaluate:
5449
control_copy = deepcopy(control)
55-
self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train")
50+
self._trainer.evaluate(
51+
eval_dataset=self._trainer.train_dataset, metric_key_prefix="train"
52+
)
5653
return control_copy
5754

5855

@@ -61,21 +58,28 @@ def main():
6158
set_seed(args.seed)
6259

6360
ds = load_dataset("code_x_glue_cc_clone_detection_big_clone_bench")
64-
labels = ClassLabel(num_classes = 2, names=[True, False])
61+
labels = ClassLabel(num_classes=2, names=[True, False])
6562
ds = ds.cast_column("label", labels)
6663

6764
print("Loading tokenizer and model")
6865
tokenizer = AutoTokenizer.from_pretrained(args.model_ckpt)
6966
tokenizer.pad_token = tokenizer.eos_token
70-
model = AutoModelForSequenceClassification.from_pretrained(args.model_ckpt, num_labels=2)
67+
model = AutoModelForSequenceClassification.from_pretrained(
68+
args.model_ckpt, num_labels=2
69+
)
7170
model.config.pad_token_id = model.config.eos_token_id
7271

7372
if args.freeze:
7473
for param in model.roberta.parameters():
7574
param.requires_grad = False
7675

7776
def tokenize(example):
78-
inputs = tokenizer(example["func1"], example["func2"], truncation=True, max_length=args.max_length)
77+
inputs = tokenizer(
78+
example["func1"],
79+
example["func2"],
80+
truncation=True,
81+
max_length=args.max_length,
82+
)
7983
return {
8084
"input_ids": inputs["input_ids"],
8185
"attention_mask": inputs["attention_mask"],
@@ -121,10 +125,11 @@ def tokenize(example):
121125

122126
result = trainer.evaluate(eval_dataset=tokenized_datasets["test"])
123127
print(f"Evaluation accuracy on the test set: {result['eval_accuracy']}")
124-
128+
125129
# push the model to the Hugging Face hub
126130
if args.push_to_hub:
127131
model.push_to_hub(args.model_hub_name)
128132

133+
129134
if __name__ == "__main__":
130135
main()

0 commit comments

Comments
 (0)