-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
Hello,
I followed closely the README and launched a training using the following command on a server with 8 V100 GPUs:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train_imagenet.py --config-file rn50_configs/rn50_88_epochs.yaml \
--data.train_dataset=$HOME/data/imagenet_ffcv/train_500_0.50_90.ffcv \
--data.val_dataset=$HOME/data/imagenet_ffcv/val_500_0.50_90.ffcv \
--data.num_workers=3 --data.in_memory=1 \
--logging.folder=$HOME/experiments/ffcv/rn50_88_epochsTraining took almost an hour per epoch, and the second epoch is almost as slow as the first one. The output of the log file is as follows:
cat ~/experiments/ffcv/rn50_88_epochs/d9ef0d7f-17a3-4e57-8d93-5e7c9a110d66/log
{"timestamp": 1650641704.0822473, "relative_time": 2853.3256430625916, "current_lr": 0.8473609134615385, "top_1": 0.07225999981164932, "top_5": 0.19789999723434448, "val_time": 103.72948884963989, "train_loss": null, "epoch": 0}
{"timestamp": 1650644358.3394542, "relative_time": 5507.582849979401, "current_lr": 1.6972759134615385, "top_1": 0.16143999993801117, "top_5": 0.3677400052547455, "val_time": 92.9171462059021, "train_loss": null, "epoch": 1}Is there anything I should check?
Thank you in advance for your response.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels