Skip to content

Commit cdc4d5f

Browse files
authored
Update mixed precision training to bf16 (#78)
1 parent bab4016 commit cdc4d5f

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ torchrun --standalone --nproc_per_node=2 train.py \
107107
We disable all speedup methods by default. Here are details of some key arguments for training:
108108
- `--nproc_per_node`: The GPU number you want to use for the current node.
109109
- `--plugin`: The booster plugin used by ColossalAI, `zero2` and `ddp` are supported. The default value is `zero2`. Recommend to enable `zero2`.
110-
- `--mixed_precision`: The data type for mixed precision training. The default value is `fp16`.
110+
- `--mixed_precision`: The data type for mixed precision training. The default value is `bf16`.
111111
- `--grad_checkpoint`: Whether enable the gradient checkpointing. This saves the memory cost during training process. The default value is `False`. Recommend to disable it when memory is enough.
112112
- `--enable_layernorm_kernel`: Whether enable the layernorm kernel optimization. This speeds up the training process. The default value is `False`. Recommend to enable it.
113113
- `--enable_flashattn`: Whether enable the FlashAttention. This speeds up the training process. The default value is `False`. Recommend to enable.
@@ -165,7 +165,7 @@ torchrun --standalone --nproc_per_node=2 train.py \
165165
--frame_interval 3
166166

167167
# preprocess
168-
# our code read video from csv as the demo shows
168+
# our code read video from csv using our toy data
169169
# we provide a code to transfer ucf101 to csv format
170170
python preprocess.py
171171
```
@@ -188,7 +188,7 @@ python sample.py \
188188
--frame_interval 3
189189
```
190190

191-
Inference tips: 1) EMA model requires quite long time to converge and produce meaningful results. So you can sample base model (`--ckpt /epochXX-global_stepXX/model`) instead of ema model (`--ckpt /epochXX-global_stepXX/ema.pt`) to check your training process. 2) Modify the text condition in `sample.py` which aligns with your datasets helps to produce better results in the early stage of training.
191+
Inference tips: 1) EMA model requires quite long time to converge and produce meaningful results. So you can sample base model (`--ckpt /epochXX-global_stepXX/model`) instead of ema model (`--ckpt /epochXX-global_stepXX/ema.pt`) to check your training process. But ema model should be your final result. 2) Modify the text condition in `sample.py` which aligns with your datasets helps to produce better results in the early stage of training.
192192

193193
## FastSeq
194194

@@ -225,7 +225,7 @@ torchrun --standalone --nproc_per_node=8 train.py \
225225
--batch_size 180 \
226226
--enable_layernorm_kernel \
227227
--enable_flashattn \
228-
--mixed_precision fp16 \
228+
--mixed_precision bf16 \
229229
--num_classes 1000
230230
```
231231

train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -365,7 +365,7 @@ def main(args):
365365
parser.add_argument("--log_every", type=int, default=10)
366366
parser.add_argument("--ckpt_every", type=int, default=1000)
367367

368-
parser.add_argument("--mixed_precision", type=str, default="fp16", choices=["bf16", "fp16", "fp32"])
368+
parser.add_argument("--mixed_precision", type=str, default="bf16", choices=["bf16", "fp16", "fp32"])
369369
parser.add_argument("--grad_clip", type=float, default=1.0, help="Gradient clipping value")
370370
parser.add_argument("--lr", type=float, default=1e-4, help="Gradient clipping value")
371371
parser.add_argument("--grad_checkpoint", action="store_true", help="Use gradient checkpointing")

0 commit comments

Comments
 (0)