Skip to content

Commit 134a071

Browse files
authored
Add mistral 7b 0.2 checkpoint (#1211)
1 parent fa0085e commit 134a071

File tree

5 files changed

+274
-2
lines changed

5 files changed

+274
-2
lines changed

config_hub/finetune/README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,11 @@ For more information, see the [Dealing with out-of-memory (OOM) errors](../../tu
2222
| llama-2-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.814 | 13.68 GB | 512 | 2 | bfloat16 | 45.68 min (A10G) |
2323
| llama-2-7b/full.yaml | 7B | Alpaca 2k | 1 | 0.941 | 26.81 GB | 512 | 4 | bfloat16 | 1.78 min (4xA100) |
2424
| | | | | | | | | | |
25-
| mistral-7b/lora.yaml | 7B | Alpaca 2k | 4 | 0.796 | 20.65 GB | 512 | 2 | bfloat16 | 31.04 min (1xA10G) |
26-
| mistral-7b/qlora.yaml | 7B | Alpaca 2k | 4 | 0.803 | 14.29 GB | 512 | 2 | bfloat16 | 44.69 min (1xA10G) |
25+
| mistral-7b/lora.yaml (v0.1) | 7B | Alpaca 2k | 4 | 0.796 | 20.65 GB | 512 | 2 | bfloat16 | 31.04 min (1xA10G) |
26+
| mistral-7b/qlora.yaml (v0.1) | 7B | Alpaca 2k | 4 | 0.803 | 14.29 GB | 512 | 2 | bfloat16 | 44.69 min (1xA10G) |
27+
| | | | | | | | | | |
28+
| mistral-7b-v0.2/lora.yaml | 7B | Alpaca 2k | 4 | 0.801 | 20.65 GB | 512 | 2 | bfloat16 | 30.96 min (1xA10G) |
29+
| mistral-7b-v0.2/qlora.yaml | 7B | Alpaca 2k | 4 | 0.813 | 14.29 GB | 512 | 2 | bfloat16 | 44.68 min (1xA10G) |
2730
| | | | | | | | | | |
2831
| phi-2/lora.yaml | 2B | Alpaca 2k | 1 | 0.832 | 13.98 GB | 512 | 4 | bfloat16 | 3.82 min (1xA10G) |
2932
| phi-2/qlora.yaml | 2B | Alpaca 2k | 1 | 0.846 | 14.27 GB | 512 | 4 | bfloat16 | 4.55 min (1xA10G) |
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
2+
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
3+
checkpoint_dir: checkpoints/unsloth/Mistral-7B-v0.2
4+
5+
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
6+
out_dir: out/finetune/lora-mistral-7b
7+
8+
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
9+
precision: bf16-true
10+
11+
# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
12+
quantize:
13+
14+
# How many devices/GPUs to use. (type: Union[int, str], default: 1)
15+
devices: 1
16+
17+
# The LoRA rank. (type: int, default: 8)
18+
lora_r: 32
19+
20+
# The LoRA alpha. (type: int, default: 16)
21+
lora_alpha: 16
22+
23+
# The LoRA dropout value. (type: float, default: 0.05)
24+
lora_dropout: 0.05
25+
26+
# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
27+
lora_query: true
28+
29+
# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
30+
lora_key: false
31+
32+
# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
33+
lora_value: true
34+
35+
# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
36+
lora_projection: false
37+
38+
# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
39+
lora_mlp: false
40+
41+
# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
42+
lora_head: false
43+
44+
# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
45+
data:
46+
class_path: litgpt.data.Alpaca2k
47+
init_args:
48+
mask_prompt: false
49+
prompt_style: alpaca
50+
ignore_index: -100
51+
seed: 42
52+
num_workers: 4
53+
54+
# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
55+
train:
56+
57+
# Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
58+
save_interval: 200
59+
60+
# Number of iterations between logging calls (type: int, default: 1)
61+
log_interval: 1
62+
63+
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
64+
global_batch_size: 8
65+
66+
# Number of samples per data-parallel rank (type: int, default: 4)
67+
micro_batch_size: 2
68+
69+
# Number of iterations with learning rate warmup active (type: int, default: 100)
70+
lr_warmup_steps: 10
71+
72+
# Number of epochs to train on (type: Optional[int], default: 5)
73+
epochs: 4
74+
75+
# Total number of tokens to train on (type: Optional[int], default: null)
76+
max_tokens:
77+
78+
# Limits the number of optimizer steps to run. (type: Optional[int], default: null)
79+
max_steps:
80+
81+
# Limits the length of samples. Off by default (type: Optional[int], default: null)
82+
max_seq_length: 512
83+
84+
# Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
85+
tie_embeddings:
86+
87+
# (type: float, default: 0.0003)
88+
learning_rate: 0.0002
89+
90+
# (type: float, default: 0.02)
91+
weight_decay: 0.0
92+
93+
# (type: float, default: 0.9)
94+
beta1: 0.9
95+
96+
# (type: float, default: 0.95)
97+
beta2: 0.95
98+
99+
# (type: Optional[float], default: null)
100+
max_norm:
101+
102+
# (type: float, default: 6e-05)
103+
min_lr: 6.0e-05
104+
105+
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
106+
eval:
107+
108+
# Number of optimizer steps between evaluation calls (type: int, default: 100)
109+
interval: 100
110+
111+
# Number of tokens to generate (type: Optional[int], default: 100)
112+
max_new_tokens: 100
113+
114+
# Number of iterations (type: int, default: 100)
115+
max_iters: 100
116+
117+
# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
118+
logger_name: csv
119+
120+
# The random seed to use for reproducibility. (type: int, default: 1337)
121+
seed: 1337
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
2+
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
3+
checkpoint_dir: checkpoints/unsloth/Mistral-7B-v0.2
4+
5+
# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
6+
out_dir: out/finetune/qlora-mistral-7b
7+
8+
# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
9+
precision: bf16-true
10+
11+
# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
12+
quantize: bnb.nf4
13+
14+
# How many devices/GPUs to use. (type: Union[int, str], default: 1)
15+
devices: 1
16+
17+
# The LoRA rank. (type: int, default: 8)
18+
lora_r: 32
19+
20+
# The LoRA alpha. (type: int, default: 16)
21+
lora_alpha: 16
22+
23+
# The LoRA dropout value. (type: float, default: 0.05)
24+
lora_dropout: 0.05
25+
26+
# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
27+
lora_query: true
28+
29+
# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
30+
lora_key: false
31+
32+
# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
33+
lora_value: true
34+
35+
# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
36+
lora_projection: false
37+
38+
# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
39+
lora_mlp: false
40+
41+
# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
42+
lora_head: false
43+
44+
# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
45+
data:
46+
class_path: litgpt.data.Alpaca2k
47+
init_args:
48+
mask_prompt: false
49+
val_split_fraction: 0.05
50+
prompt_style: alpaca
51+
ignore_index: -100
52+
seed: 42
53+
num_workers: 4
54+
download_dir: data/alpaca2k
55+
56+
# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
57+
train:
58+
59+
# Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
60+
save_interval: 200
61+
62+
# Number of iterations between logging calls (type: int, default: 1)
63+
log_interval: 1
64+
65+
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
66+
global_batch_size: 8
67+
68+
# Number of samples per data-parallel rank (type: int, default: 4)
69+
micro_batch_size: 2
70+
71+
# Number of iterations with learning rate warmup active (type: int, default: 100)
72+
lr_warmup_steps: 10
73+
74+
# Number of epochs to train on (type: Optional[int], default: 5)
75+
epochs: 4
76+
77+
# Total number of tokens to train on (type: Optional[int], default: null)
78+
max_tokens:
79+
80+
# Limits the number of optimizer steps to run (type: Optional[int], default: null)
81+
max_steps:
82+
83+
# Limits the length of samples (type: Optional[int], default: null)
84+
max_seq_length: 512
85+
86+
# Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
87+
tie_embeddings:
88+
89+
# (type: float, default: 0.0003)
90+
learning_rate: 0.0002
91+
92+
# (type: float, default: 0.02)
93+
weight_decay: 0.0
94+
95+
# (type: float, default: 0.9)
96+
beta1: 0.9
97+
98+
# (type: float, default: 0.95)
99+
beta2: 0.95
100+
101+
# (type: Optional[float], default: null)
102+
max_norm:
103+
104+
# (type: float, default: 6e-05)
105+
min_lr: 6.0e-05
106+
107+
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
108+
eval:
109+
110+
# Number of optimizer steps between evaluation calls (type: int, default: 100)
111+
interval: 100
112+
113+
# Number of tokens to generate (type: Optional[int], default: 100)
114+
max_new_tokens: 100
115+
116+
# Number of iterations (type: int, default: 100)
117+
max_iters: 100
118+
119+
# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
120+
logger_name: csv
121+
122+
# The random seed to use for reproducibility. (type: int, default: 1337)
123+
seed: 1337

litgpt/config.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1387,6 +1387,24 @@ def norm_class(self) -> Type:
13871387
copy["name"] = c["name"].format(kind)
13881388
copy["hf_config"]["name"] = c["hf_config"]["name"].format(kind)
13891389
configs.append(copy)
1390+
configs.append(
1391+
# https://huggingface.co/unsloth/mistral-7b-v0.2/blob/main/config.json
1392+
dict(
1393+
name="Mistral-7B-v0.2",
1394+
hf_config=dict(org="unsloth", name="Mistral-7B-v0.2"),
1395+
padded_vocab_size=32000,
1396+
block_size=32768,
1397+
n_layer=32,
1398+
n_query_groups=8,
1399+
rotary_percentage=1.0,
1400+
parallel_residual=False,
1401+
bias=False,
1402+
norm_class_name="RMSNorm",
1403+
norm_eps=1e-05,
1404+
mlp_class_name="LLaMAMLP",
1405+
intermediate_size=14336,
1406+
)
1407+
)
13901408
configs.append(
13911409
# https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/config.json
13921410
dict(

tutorials/download_model_weights.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,8 +146,15 @@ togethercomputer/RedPajama-INCITE-Chat-7B-v0.1
146146
togethercomputer/RedPajama-INCITE-Instruct-3B-v1
147147
togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
148148
Trelis/Llama-2-7b-chat-hf-function-calling-v2
149+
unsloth/Mistral-7B-v0.2
149150
```
150151

152+
&nbsp;
153+
154+
> [!TIP]
155+
> To sort the list above by model name after the `/`, use `litgpt download | sort -f -t'/' -k2`.
156+
157+
151158
&nbsp;
152159
### 2. Download Model Weights
153160

0 commit comments

Comments
 (0)