Using 4  GPUs for training takes the same time as using just 1

I'm training a BigGAN with differential augmentation and LeCam optimization on a custom dataset. My setup features 4 NVIDIA RTX 3070 and I'm running on Ubuntu 20.04. I observe that running the training on the 4 GPUs, using Distributed Data Parallel takes the same time as performing the training using a single GPU. Am I doing something wrong?

For training using a single GPU, I'm using the following command: 
`CUDA_VISIBLE_DEVICES=0 python3 src/main.py -t -hdf5 -l -std_stat -std_max 64 -std_step 64 -metrics fid is prdc -ref "train" -cfg src/configs/VWW/BigGAN-DiffAug-LeCam.yaml -data ../Datasets/vw_coco2014_96_GAN -save SAVE_PATH_VWW -mpc --post_resizer "friendly" --eval_backbone "InceptionV3_tf"`

For training using the 4 GPUs, I'm using the following commands:
`export MASTER_ADDR=localhost`
`export MASTER_PORT=1234`
`CUDA_VISIBLE_DEVICES=0,1,2,3 python3 src/main.py -t -DDP -tn 1 -cn 0 -std_stat -std_max 64 -std_step 64 -metrics fid is prdc -ref "train" -cfg src/configs/VWW/BigGAN-DiffAug-LeCam.yaml -data ../Datasets/vw_coco2014_96_GAN -save SAVE_PATH_VWW -mpc --post_resizer "friendly" --eval_backbone "InceptionV3_tf"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using 4 GPUs for training takes the same time as using just 1 #202

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using 4 GPUs for training takes the same time as using just 1 #202

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions