ResNet-a2
#1409
Replies: 2 comments 4 replies
-
When I use this command run on the dgx sever, the GPU utile is not 100% at the most time, GPU wait for CPU(load data). So it is very slow than just traing on my desktop (2 1080Ti). Do es anybody know why it is? @rwightman |
Beta Was this translation helpful? Give feedback.
4 replies
-
FYI the best cloud setup I've found for training is Lambda Labs GPU cloud, their 4 GPU A100 or A6000 instances have a decent number of CPUs and fast local SSD that's good enough for a standard imagenet (as files and folder) dataset. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
@rwightman
I'm currently reproducing ResNet50 using a2 procedure(ResNet strikes back: An improved training procedure in timm). I would like to ask whether the following instruction can reproduce ResNet50 using a2 procedure to train perfectly?
I used 4 Tesla V100 to train:
./distributed_train.sh 4 imagenet/ --model resnet50 --aa rand-m7-mstd0.5-inc1 --mixup .1 --cutmix 1.0 --aug-repeats 3 --remode pixel --reprob 0.0 --crop-pct 0.95 --drop-path .05 --smoothing 0.0 --bce-loss --bce-target-thresh 0.2 --opt lamb --weight-decay .02 --sched cosine --epochs 300 --warmup-epochs 5 --lr 5e-3 --warmup-lr 1e-4 -b 512 -j 16 --amp --channels-last --seed 42
Beta Was this translation helpful? Give feedback.
All reactions