Skip to content

Commit c452039

Browse files
committed
update gorilla reproduction code
1 parent 0b8f7ba commit c452039

File tree

201 files changed

+180980
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

201 files changed

+180980
-0
lines changed

gorilla/README.md

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Environment Setup
2+
3+
* Setup up a new conda env and install required packages
4+
```bash
5+
# create conda env
6+
conda create -n minigpt python=3.10 -y
7+
conda activate minigpt
8+
# install packages
9+
pip install -r requirements.txt
10+
```
11+
12+
* This project relies on [apex](https://github.com/NVIDIA/apex), which, unfortunately, you need to compile from source. Please follow the [official instructions](https://github.com/NVIDIA/apex#from-source) to compile.
13+
* Some experience to compile successfully:
14+
1. `git clone https://github.com/NVIDIA/apex`
15+
2. make sure the version of CUDA on your machine is eqaul to the version with which your installed pytorch is built.
16+
3. make sure `pip >= 23.1`, otherwise run `pip install --upgrade pip`
17+
4. `pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./`
18+
19+
* LLaMA ckpt preparation. Please request access to the pre-trained LLaMA from [this form](https://forms.gle/jk851eBVbX1m5TAv5), and orgnize the diractory like:
20+
```
21+
{path/to/llama}/
22+
|- consolidated.00.pth
23+
|- params.json
24+
|- tokenizer.model
25+
```
26+
27+
* **The `torchhub_train.json` from [Gorilla official repository](https://github.com/ShishirPatil/gorilla/tree/main/data/apibench) has different format compared to `tensorflow_train.json` and `huggingface_train.json`, so currently we didn't conduct experiments on it.**
28+
29+
30+
# Full finetune
31+
32+
## model training
33+
34+
* First specify `llama_path` in `finetune/scripts/finetune/finetune_7B_gorilla_{tf,hf,th}.sh`
35+
36+
* Then run script:
37+
```bash
38+
cd finetune
39+
bash scripts/finetune/finetune_7B_gorilla_{tf,hf,th}.sh sdp 1
40+
```
41+
Last parameter is model parallel size, and you can increase it if GPU memory is low. For A100/A6000, you can leave it 1.
42+
43+
44+
45+
## Inference
46+
47+
* To evaluate model performance, first we need generate responses with finetuned model.
48+
49+
* First copy `params.json` to model folder:
50+
```bash
51+
cp {path/to/llama}/params.json finetune/output/{exp_name}/{epoch*}/
52+
```
53+
54+
* Then run command:
55+
```bash
56+
cd inference
57+
torchrun --nproc_per_node 1 gorilla_inference_full_finetune.py --dataset_path ../gorilla-main/eval/eval-data/questions/{tensorflowhub, huggingface, torchhub}/questions_{tensorflowhub, huggingface, torchhub}_0_shot.jsonl --ckpt_dir ../finetune/output/{exp_name}/{epoch*}/ --tokenizer_path {path/to/llama}/tokenizer.model
58+
```
59+
**Note**: `ckpt_dir` should be a **FOLDER**, not a `.pth` file. Only inference on one GPU is supported.
60+
61+
62+
63+
# LLaMA adapter finetune
64+
65+
## model training
66+
67+
* First specify `llama_path` in `alpaca_finetuning_v1/finetune_{tf,hf,th}.sh`
68+
69+
* Then run script:
70+
```bash
71+
cd alpaca_finetuning_v1
72+
bash finetune_{tf,hf,th}.sh
73+
```
74+
Note: `--blr` is the base learning rate, we have `lr = blr * eff_batch_size / 256` in `alpaca_finetuning_v1/finetuning.py` line 237. Adjust it when you change the GPU number.
75+
76+
* After adapter finetune, we can extract adapter parameters from checkpoint. Run:
77+
```bash
78+
python extract_adapter_from_checkpoint.py --model_path ./checkpoint/{exp_name}/{pth_file}
79+
```
80+
81+
## Inference
82+
83+
* Run command:
84+
```bash
85+
cd inference
86+
torchrun --nproc_per_node 1 gorilla_inference_llama_adapter_v1.py --ckpt_dir {path/to/llama} --tokenizer_path {path/to/llama}/tokenizer.model --adapter_path ../alpaca_finetuning_v1/checkpoint/{exp_name}/{adapter_pth_file} --dataset_path ../gorilla-main/eval/eval-data/questions/{tensorflowhub, huggingface, torchhub}/questions_{tensorflowhub, huggingface, torchhub}_0_shot.jsonl
87+
```
88+
**Note**: `ckpt_dir` should be a **FOLDER**, not a `.pth` file. Only inference on one GPU is supported.
89+
90+
91+
92+
# Evaluation
93+
94+
* Run Gorilla official evaluation code by:
95+
```bash
96+
cd gorilla-main/eval/eval-scripts/
97+
98+
# For full finetune
99+
python ast_eval_{tf,hf,th}.py --api_dataset ../../data/api/{tensorflowhub_api, huggingface_api, torchhub_api}.jsonl --apibench ../../data/apibench/{tensorflow,huggingface,torchhub}_eval.json --llm_responses ../../../finetune/output/{exp_name}/{epoch*}/model_prediction_results.jsonl
100+
101+
# For llama-adapter
102+
python ast_eval_{tf,hf,th}.py --api_dataset ../../data/api/{tensorflowhub_api, huggingface_api, torchhub_api}.jsonl --apibench ../../data/apibench/{tensorflow,huggingface,torchhub}_eval.json --llm_responses ../../../alpaca_finetuning_v1/checkpoint/{exp_name}/model_prediction_results.jsonl
103+
```
104+
105+
106+
107+
# Results
108+
109+
Our finetuned LLaMA-adapter models and their predictions can be found in [this link](https://drive.google.com/drive/folders/1PN5QjOlMVnmSSFi68CubvQfGYeodvO8w?usp=sharing).
110+
111+
| Methods | TensorFlow Hub | TensorFlow Hub | HuggingFace | HuggingFace |
112+
| ------------- | ------------------- | ------------------ | ------------------- | ------------------ |
113+
| | overall $\uparrow$ | hallu $\downarrow$ | overall $\uparrow$ | hallu $\downarrow$ |
114+
| Official | 83.79 | 5.40 | 71.68 | 10.95 |
115+
| Full finetune | 88.02 | 1.02 | 69.69 | 10.29 |
116+
| LLaMA-adapter | 86.90 | 0.74 | 63.62 | 11.83 |
117+
118+
119+
120+
121+
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
import math
2+
import sys
3+
from typing import Iterable
4+
5+
import torch
6+
import util.lr_sched as lr_sched
7+
import util.misc as misc
8+
9+
10+
def train_one_epoch(
11+
model: torch.nn.Module,
12+
data_loader: Iterable,
13+
optimizer: torch.optim.Optimizer,
14+
device: torch.device,
15+
epoch: int,
16+
loss_scaler,
17+
log_writer=None,
18+
args=None,
19+
):
20+
21+
model.train(True)
22+
metric_logger = misc.MetricLogger(delimiter=" ")
23+
metric_logger.add_meter("lr", misc.SmoothedValue(window_size=1, fmt="{value:.6f}"))
24+
header = "Epoch: [{}]".format(epoch)
25+
print_freq = 10
26+
27+
accum_iter = args.accum_iter
28+
29+
optimizer.zero_grad()
30+
31+
if log_writer is not None:
32+
print("log_dir: {}".format(log_writer.log_dir))
33+
for data_iter_step, (examples, labels, example_mask) in enumerate(
34+
metric_logger.log_every(data_loader, print_freq, header)
35+
):
36+
# we use a per iteration (instead of per epoch) lr scheduler
37+
if data_iter_step % accum_iter == 0:
38+
lr_sched.adjust_learning_rate(optimizer, data_iter_step / len(data_loader) + epoch, args)
39+
40+
c_loss = model(examples, labels)
41+
loss = c_loss
42+
loss_value = loss.item()
43+
c_loss_value = c_loss.item()
44+
45+
if not math.isfinite(loss_value):
46+
print("Loss is {}, stopping training".format(loss_value))
47+
sys.exit(1)
48+
49+
loss /= accum_iter
50+
51+
loss_scaler(loss, optimizer, parameters=model.parameters(), update_grad=(data_iter_step + 1) % accum_iter == 0)
52+
if (data_iter_step + 1) % accum_iter == 0:
53+
optimizer.zero_grad()
54+
55+
torch.cuda.synchronize()
56+
57+
metric_logger.update(closs=c_loss_value)
58+
59+
lr = optimizer.param_groups[0]["lr"]
60+
metric_logger.update(lr=lr)
61+
62+
misc.all_reduce_mean(loss_value)
63+
c_loss_value_reduce = misc.all_reduce_mean(c_loss_value)
64+
65+
if log_writer is not None and (data_iter_step + 1) % accum_iter == 0:
66+
"""We use epoch_1000x as the x-axis in tensorboard.
67+
This calibrates different curves when batch size changes.
68+
"""
69+
epoch_1000x = int((data_iter_step / len(data_loader) + epoch) * 1000)
70+
log_writer.add_scalar("c_train_loss", c_loss_value_reduce, epoch_1000x)
71+
log_writer.add_scalar("lr", lr, epoch_1000x)
72+
73+
# gather the stats from all processes
74+
metric_logger.synchronize_between_processes()
75+
print("Averaged stats:", metric_logger)
76+
return {k: meter.global_avg for k, meter in metric_logger.meters.items()}
77+
78+
79+
def val_one_epoch(
80+
model: torch.nn.Module,
81+
data_loader: Iterable,
82+
optimizer: torch.optim.Optimizer,
83+
device: torch.device,
84+
epoch: int,
85+
loss_scaler,
86+
log_writer=None,
87+
args=None,
88+
):
89+
model.eval()
90+
metric_logger = misc.MetricLogger(delimiter=" ")
91+
metric_logger.add_meter("lr", misc.SmoothedValue(window_size=1, fmt="{value:.6f}"))
92+
header = "Epoch: [{}]".format(epoch)
93+
print_freq = 10
94+
95+
accum_iter = args.accum_iter
96+
97+
if log_writer is not None:
98+
print("log_dir: {}".format(log_writer.log_dir))
99+
for data_iter_step, (examples, labels, example_mask) in enumerate(
100+
metric_logger.log_every(data_loader, print_freq, header)
101+
):
102+
103+
with torch.no_grad():
104+
c_loss = model(examples, labels)
105+
loss = c_loss
106+
loss_value = loss.item()
107+
108+
c_loss_value = c_loss.item()
109+
110+
if not math.isfinite(loss_value):
111+
print("Loss is {}, stopping training".format(loss_value))
112+
sys.exit(1)
113+
114+
metric_logger.update(closs=c_loss_value)
115+
116+
lr = optimizer.param_groups[0]["lr"]
117+
metric_logger.update(lr=lr)
118+
119+
misc.all_reduce_mean(loss_value)
120+
c_loss_value_reduce = misc.all_reduce_mean(c_loss_value)
121+
if log_writer is not None and (data_iter_step + 1) % accum_iter == 0:
122+
"""We use epoch_1000x as the x-axis in tensorboard.
123+
This calibrates different curves when batch size changes.
124+
"""
125+
epoch_1000x = int((data_iter_step / len(data_loader) + epoch) * 1000)
126+
log_writer.add_scalar("c_train_loss", c_loss_value_reduce, epoch_1000x)
127+
log_writer.add_scalar("lr", lr, epoch_1000x)
128+
129+
# gather the stats from all processes
130+
metric_logger.synchronize_between_processes()
131+
print("Averaged stats:", metric_logger)
132+
return {k: meter.global_avg for k, meter in metric_logger.meters.items()}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
import torch
2+
import argparse
3+
4+
args = argparse.ArgumentParser("extract", add_help=False)
5+
6+
args.add_argument("--model_path", type=str)
7+
8+
args = args.parse_args()
9+
10+
model = torch.load(args.model_path, map_location="cpu")
11+
new_model = dict()
12+
weight_list = ["layers." + str(i) + ".attention.gate" for i in range(32)]
13+
old_weight_list = ["layers." + str(i) + ".attention.gate" for i in range(32)]
14+
weight_list = weight_list + ["adapter_query.weight"]
15+
16+
print(weight_list)
17+
print(model["model"]["adapter_query.weight"].shape)
18+
19+
for i in range(len(weight_list)):
20+
new_model[weight_list[i]] = model["model"][weight_list[i]]
21+
22+
save_path = args.model_path.replace('.pth', '-adapter.pth')
23+
torch.save(new_model, save_path)

0 commit comments

Comments
 (0)