Skip to content

Commit af41ee9

Browse files
rasbtawaelchlicarmocca
committed
Tinyllama configs (#1113)
Co-authored-by: awaelchli <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]>
1 parent 3a982fa commit af41ee9

File tree

5 files changed

+232
-3
lines changed

5 files changed

+232
-3
lines changed

config_hub/finetune/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
## Config files
2+
3+
The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. See the [Dealing with out-of-memory (OOM) errors](../../tutorials/oom.md) on lowering the memory requirements.
4+
5+
| | Size | Dataset | Epochs | Val loss | Peak memory | Max seq length | Micro batch size | Precision | Training runtime |
6+
| --------------------- | ---- | --------- | ------ | -------- | ----------- | -------------- | ---------------- | --------- | ---------------- |
7+
| tiny-llama/lora.yaml | 1.1B | Alpaca 2k | 4 | 1.053 | 10.54 GB | 512 | 8 | bfloat16 | 9.24 min (A10G) |
8+
| tiny-llama/qlora.yaml | 1.1B | Alpaca 2k | 4 | 1.074 | 13.32 GB | 512 | 8 | bfloat16 | 9.89 min (A10G) |
9+
| tiny-llama/full.yaml | 1.1B | Alpaca 2k | 1 | 1.105 | 14.10 GB | 512 | 4 | bfloat16 | 2.59 min (A10G) |
10+
| llama-2-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.814 | 13.68 GB | 512 | 2 | bfloat16 | 45.68 min (A10G) |
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
2+
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
3+
checkpoint_dir: checkpoints/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
4+
5+
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
6+
out_dir: out/finetune/full-tiny-llama-1.1b
7+
8+
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
9+
precision: bf16-true
10+
11+
# How many devices/GPUs to use. (type: Union[int, str], default: 1)
12+
devices: 1
13+
14+
# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
15+
data:
16+
class_path: litgpt.data.Alpaca2k
17+
init_args:
18+
mask_prompt: false
19+
val_split_fraction: 0.03847
20+
prompt_style: alpaca
21+
ignore_index: -100
22+
seed: 42
23+
num_workers: 4
24+
25+
# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
26+
train:
27+
28+
# Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
29+
save_interval: 800
30+
31+
# Number of iterations between logging calls (type: int, default: 1)
32+
log_interval: 1
33+
34+
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
35+
global_batch_size: 32
36+
37+
# Number of samples per data-parallel rank (type: int, default: 4)
38+
micro_batch_size: 4
39+
40+
# Number of iterations with learning rate warmup active (type: int, default: 100)
41+
lr_warmup_steps: 1000
42+
43+
# Number of epochs to train on (type: Optional[int], default: 5)
44+
epochs: 1
45+
46+
# Total number of tokens to train on (type: Optional[int], default: null)
47+
max_tokens:
48+
49+
# Limits the number of optimizer steps to run. (type: Optional[int], default: null)
50+
max_steps:
51+
52+
# Limits the length of samples. Off by default (type: Optional[int], default: null)
53+
max_seq_length: 512
54+
55+
# Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
56+
tie_embeddings:
57+
58+
# (type: float, default: 0.0003)
59+
learning_rate: 0.0002
60+
61+
# (type: float, default: 0.02)
62+
weight_decay: 0.0
63+
64+
# (type: float, default: 0.9)
65+
beta1: 0.9
66+
67+
# (type: float, default: 0.95)
68+
beta2: 0.95
69+
70+
# (type: Optional[float], default: null)
71+
max_norm:
72+
73+
# (type: float, default: 6e-05)
74+
min_lr: 6.0e-05
75+
76+
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
77+
eval:
78+
79+
# Number of optimizer steps between evaluation calls (type: int, default: 100)
80+
interval: 25
81+
82+
# Number of tokens to generate (type: Optional[int], default: 100)
83+
max_new_tokens: 100
84+
85+
# Number of iterations (type: int, default: 100)
86+
max_iters: 100
87+
88+
# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
89+
logger_name: csv
90+
91+
# The random seed to use for reproducibility. (type: int, default: 1337)
92+
seed: 1337

config_hub/finetune/tiny-llama/lora.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ lora_head: false
4343

4444
# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
4545
data:
46-
class_path: litgpt.data.AlpacaGPT4
46+
class_path: litgpt.data.Alpaca2k
4747
init_args:
4848
mask_prompt: false
4949
val_split_fraction: 0.03847
@@ -71,7 +71,7 @@ train:
7171
lr_warmup_steps: 10
7272

7373
# Number of epochs to train on (type: Optional[int], default: 5)
74-
epochs: 1
74+
epochs: 4
7575

7676
# Total number of tokens to train on (type: Optional[int], default: null)
7777
max_tokens:
@@ -107,7 +107,7 @@ train:
107107
eval:
108108

109109
# Number of optimizer steps between evaluation calls (type: int, default: 100)
110-
interval: 400
110+
interval: 100
111111

112112
# Number of tokens to generate (type: Optional[int], default: 100)
113113
max_new_tokens: 100
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
2+
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
3+
checkpoint_dir: checkpoints/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
4+
5+
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
6+
out_dir: out/finetune/qlora-tiny-llama-1.1b
7+
8+
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
9+
precision: bf16-true
10+
11+
# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
12+
quantize: bnb.nf4
13+
14+
# How many devices/GPUs to use. (type: Union[int, str], default: 1)
15+
devices: 1
16+
17+
# The LoRA rank. (type: int, default: 8)
18+
lora_r: 32
19+
20+
# The LoRA alpha. (type: int, default: 16)
21+
lora_alpha: 16
22+
23+
# The LoRA dropout value. (type: float, default: 0.05)
24+
lora_dropout: 0.05
25+
26+
# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
27+
lora_query: true
28+
29+
# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
30+
lora_key: false
31+
32+
# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
33+
lora_value: true
34+
35+
# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
36+
lora_projection: false
37+
38+
# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
39+
lora_mlp: false
40+
41+
# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
42+
lora_head: false
43+
44+
# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
45+
data:
46+
class_path: litgpt.data.Alpaca2k
47+
init_args:
48+
mask_prompt: false
49+
val_split_fraction: 0.03847
50+
prompt_style: alpaca
51+
ignore_index: -100
52+
seed: 42
53+
num_workers: 4
54+
55+
# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
56+
train:
57+
58+
# Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
59+
save_interval: 800
60+
61+
# Number of iterations between logging calls (type: int, default: 1)
62+
log_interval: 1
63+
64+
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
65+
global_batch_size: 8
66+
67+
# Number of samples per data-parallel rank (type: int, default: 4)
68+
micro_batch_size: 8
69+
70+
# Number of iterations with learning rate warmup active (type: int, default: 100)
71+
lr_warmup_steps: 10
72+
73+
# Number of epochs to train on (type: Optional[int], default: 5)
74+
epochs: 4
75+
76+
# Total number of tokens to train on (type: Optional[int], default: null)
77+
max_tokens:
78+
79+
# Limits the number of optimizer steps to run. (type: Optional[int], default: null)
80+
max_steps:
81+
82+
# Limits the length of samples. Off by default (type: Optional[int], default: null)
83+
max_seq_length: 512
84+
85+
# Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
86+
tie_embeddings:
87+
88+
# (type: float, default: 0.0003)
89+
learning_rate: 0.0002
90+
91+
# (type: float, default: 0.02)
92+
weight_decay: 0.0
93+
94+
# (type: float, default: 0.9)
95+
beta1: 0.9
96+
97+
# (type: float, default: 0.95)
98+
beta2: 0.95
99+
100+
# (type: Optional[float], default: null)
101+
max_norm:
102+
103+
# (type: float, default: 6e-05)
104+
min_lr: 6.0e-05
105+
106+
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
107+
eval:
108+
109+
# Number of optimizer steps between evaluation calls (type: int, default: 100)
110+
interval: 100
111+
112+
# Number of tokens to generate (type: Optional[int], default: 100)
113+
max_new_tokens: 100
114+
115+
# Number of iterations (type: int, default: 100)
116+
max_iters: 100
117+
118+
# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
119+
logger_name: csv
120+
121+
# The random seed to use for reproducibility. (type: int, default: 1337)
122+
seed: 1337

tests/test_config_hub.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from unittest.mock import Mock
66

77
import pytest
8+
from lightning.fabric.plugins import Precision
89

910

1011
@pytest.mark.parametrize(["script_file", "config_file"], [
@@ -14,7 +15,10 @@
1415
("litgpt/pretrain.py", "https://raw.githubusercontent.com/Lightning-AI/litgpt/wip/config_hub/pretrain/tinystories.yaml"),
1516
("litgpt/finetune/full.py", "finetune/llama-2-7b/full.yaml"),
1617
("litgpt/finetune/lora.py", "finetune/llama-2-7b/lora.yaml"),
18+
("litgpt/finetune/lora.py", "finetune/llama-2-7b/qlora.yaml"),
19+
("litgpt/finetune/full.py", "finetune/tiny-llama/full.yaml"),
1720
("litgpt/finetune/lora.py", "finetune/tiny-llama/lora.yaml"),
21+
("litgpt/finetune/lora.py", "finetune/tiny-llama/qlora.yaml"),
1822
])
1923
def test_config_help(script_file, config_file, monkeypatch, tmp_path):
2024
"""Test that configs validate against the signature in the scripts."""
@@ -32,6 +36,7 @@ def test_config_help(script_file, config_file, monkeypatch, tmp_path):
3236

3337
module.main = Mock()
3438
module.Tokenizer = Mock()
39+
module.BitsandbytesPrecision = Mock(return_value=Precision())
3540

3641
with mock.patch("sys.argv", [script_file.name, "--config", str(config_file), "--devices", "1"]):
3742
CLI(module.setup)

0 commit comments

Comments
 (0)