Skip to content

Commit b67bf3c

Browse files
author
hanjian.thu123
committed
[update] add fine-tunning instructions
1 parent e693260 commit b67bf3c

File tree

3 files changed

+7
-4
lines changed

3 files changed

+7
-4
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,10 @@ We provide [eval.sh](scripts/eval.sh) for evaluation on various benchmarks with
155155
bash scripts/eval.sh
156156
```
157157

158+
## Fine-tuning
159+
Fine-tuning Infinity is quite simple where you only need append ```--rush_resume=[infinity_vae_d32reg.pth]``` to [train.sh](scripts/train.sh).
160+
161+
158162
## One More Thing: Infinity-20B is coming soon 📆
159163
Infinity shows strong scaling capabilities as illustrated before. Thus we are encouraged to continue to scale up the model size to 20B. Here we present the side-by-side comparison results between Infinity-2B and Infinity-20B.
160164

scripts/train.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ train.py \
103103
--wpe=1 \
104104
--dynamic_resolution_across_gpus 1 \
105105
--enable_dynamic_length_prompt 1 \
106-
--reweight_loss_by_scale 0 \
106+
--reweight_loss_by_scale 1 \
107107
--add_lvl_embeding_only_first_block 1 \
108108
--rope2d_each_sa_layer 1 \
109109
--rope2d_normalized_by_hw 2 \
@@ -117,6 +117,6 @@ train.py \
117117
--prefetch_factor=16 \
118118
--noise_apply_strength 0.3 \
119119
--noise_apply_layers 13 \
120-
--apply_spatial_patchify 1 \
120+
--apply_spatial_patchify 0 \
121121
--use_flex_attn=True \
122122
--pad=128

trainer.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@ def __init__(
5151

5252
self.gpt: Union[DDP, FSDP, nn.Module]
5353
self.gpt, self.vae_local, self.quantize_local = gpt, vae_local, vae_local.quantize
54-
self.quantize_local: VectorQuantizer2
5554
self.gpt_opt: AmpOptimizer = gpt_opt
5655
self.gpt_wo_ddp: Union[Infinity, torch._dynamo.eval_frame.OptimizedModule] = gpt_wo_ddp # after torch.compile
5756
self.gpt_wo_ddp_ema = gpt_wo_ddp_ema
@@ -208,7 +207,7 @@ def train_step(
208207
last_scale_area = np.sqrt(scale_schedule[-1].prod())
209208
for (pt, ph, pw) in scale_schedule[:training_scales]:
210209
this_scale_area = np.sqrt(pt * ph * pw)
211-
lw.extend([last_scale_area / this_scale_area for _ in range(ph * pw)])
210+
lw.extend([last_scale_area / this_scale_area for _ in range(pt * ph * pw)])
212211
lw = torch.tensor(lw, device=loss.device)[None, ...]
213212
lw = lw / lw.sum()
214213
else:

0 commit comments

Comments
 (0)