-
Notifications
You must be signed in to change notification settings - Fork 1k
loss increase sharply after the first epoch #287
Description
My training loss will experience a surge after the first epoch. I tried adjusting LR and momentum_treather, but it always appears, although the occurrence of iter is different. My dataset contains 40 million images.
torchrun $DISTRIBUTED_ARGS main_dino.py
--nproc_per_node $NPUS_PER_NODE
--node_rank $NODE_RANK
--arch vit_large
--patch_size 16
--out_dim 65536
--norm_last_layer True
--momentum_teacher 0.994
--use_bn_in_head False
--warmup_teacher_temp 0.04
--teacher_temp 0.04
--warmup_teacher_temp_epochs 6
--use_fp16 True
--weight_decay 0.04
--weight_decay_end 0.4
--clip_grad 3.0
--batch_size_per_gpu 64
--epochs 300
--freeze_last_layer 1
--lr 0.00001
--warmup_epochs 5
--min_lr 1e-6
--optimizer adamw
--drop_path_rate 0.1
--global_crops_scale 0.4 1.0
--local_crops_number 8
--local_crops_scale 0.05 0.4
--restart_strict
--data_path ${DATA_DIR}
--output_dir ${CKPT_DIR}
--saveckp_freq 1
--seed 10086
--num_workers 16
--dist_url env://
chaning lr->0.0005, momentum_treather->0.997
dataset size = 40m*0.125

attachments/assets/d4bfddbc-9079-4cea-b8ce-98f32c8a1e65)
