-
Notifications
You must be signed in to change notification settings - Fork 22
Training & Testing Details
Junyong Lee edited this page Mar 29, 2021
·
2 revisions
# multi GPU (with DistributedDataParallel) example
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
--is_train \
--mode PVDNet_DVD \
--config config_PVDNet \
--trainer trainer \
--data DVD \
-LRS CA \
-b 2 \
-th 8 \
-dl \
-ss \
-dist
# resuming example (trainer will load checkpoint saved at 100 epoch, training will resume form 101 epoch)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
... \
-th 8 \
-r 100 \
-ss \
-dist
# single GPU (with DataParallel) example
CUDA_VISIBLE_DEVICES=0 python -B run.py \
... \
-ss- Options
-
--is_train: If it is specified,run.pywill train the network. Default:False -
--mode: The name of a model to train. The logging folder named with the[mode]will be created as[LOG_ROOT]/PVDNet_TOG2021/[mode]/. Default:PVDNet_DVD -
--config: The name of config file located as in./config/[config].py. Default:None, and the default should not be changed. -
--trainer: The name of trainer file located as./models/trainers/[trainer].py. Default:trainer -
--data: The name of dataset:DVD|nah. Default:DVD- The data structure can be modified in the function
set_train_path(..)in./configs/config.py.
- The data structure can be modified in the function
-
--network: The name of network file (of PVDNet) located as./models/archs/[network].py. Default:PVDNet -
-LRS: Learning rate scheduler for training:CA(Cosine annealing scheduler) |LD(step decay schedule). Default:CA -
-b,--batch_size: The batch size. For the multi GPU (DistributedDataParallel), the total batch size will be,nproc_per_node * b. Default: 8 -
-th,--thread_num: The number of thread (num_workers) used for the data loader. Default: 8 -
-dl,--delete_log: The option whether to delete logs under[mode](i.e.,[LOG_ROOT]/PVDNet_TOG2021/[mode]/*). Option works only when--is_trainis specified. Default:False -
-r,--resume: Resume training with specified epoch # (e.g.,-r 100). Note that-dlshould not be specified with this option. -
-ss,--save_sample: Save sample images for both training and testing. Images will be saved in[LOG_ROOT]/PVDNet_TOG2022/[mode]/sample/. Default:False -
-dist: Enables multi-processing withDistributedDataParallel. Default:False
-
CUDA_VISIBLE_DEVICES=0 python run.py --mode [mode] --data [DATASET]
# e.g., CUDA_VISIBLE_DEVICES=0 python run.py --mode PVDNet_DVD --data DVDNote:
- Specify only
[mode]of the trained model.[config]doesn't have to be specified, as it will be automatically loaded.- Testing results will be saved in
[LOG_ROOT]/PVDNet_TOG2021/[mode]/result/quanti_quali/[mode]_[epoch]/[data]/.
- Options
-
--mode: The name of a model to test. -
--data: The name of a dataset to evaluate:DVD|nah|random. Default:DVD- The data structure can be modified in the function
set_eval_path(..)in./configs/config.py. -
randomis for testing models with any video frames, which should be placed as[DATASET_ROOT]/random/[video_name]/*.[jpg|png].
- The data structure can be modified in the function
-
-ckpt_name: Load the checkpoint with the name of the checkpoint under[LOG_ROOT]/PVDNet_TOG2021/[mode]/checkpoint/train/epoch/ckpt/(e.g.,python run.py --mode PVDNet_DVD --data DVD --ckpt_name PVDNet_DVD_00100.pytorch). -
-ckpt_abs_name. Loads the checkpoint of the absolute path (e.g.,python run.py --mode PVDNet_DVD --data DVD --ckpt_abs_name ./ckpt/PVDNet_DVD.pytorch). -
-ckpt_epoch: Loads the checkpoint of the specified epoch (e.g.,python run.py --mode PVDNet_DVD --data DVD --ckpt_epoch 100). -
-ckpt_sc: Loads the checkpoint with the best validation score (e.g.,python run.py --mode PVDNet_DVD --data DVD --ckpt_sc).
-