This document contains detailed training recipes for reproducing the results reported in the main README.
python customdata.py -a vgg16 --dataset dataset --checkpoint checkpoints/dataset/vgg16 --epochs 164 --schedule 81 122 --gamma 0.1 python customdata.py -a resnet50 --data dataset --epochs 90 --schedule 31 61 --gamma 0.1 -c checkpoints/dataset/resnet50python cifar.py -a alexnet --epochs 164 --schedule 81 122 --gamma 0.1 --checkpoint checkpoints/cifar10/alexnet python cifar.py -a vgg19_bn --epochs 164 --schedule 81 122 --gamma 0.1 --checkpoint checkpoints/cifar10/vgg19_bn python cifar.py -a resnet --depth 110 --epochs 164 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar10/resnet-110 python cifar.py -a resnet --depth 1202 --epochs 164 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar10/resnet-1202 python cifar.py -a preresnet --depth 110 --epochs 164 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar10/preresnet-110 python cifar.py -a resnext --depth 29 --cardinality 8 --widen-factor 4 --schedule 150 225 --wd 5e-4 --gamma 0.1 --checkpoint checkpoints/cifar10/resnext-8x64d python cifar.py -a resnext --depth 29 --cardinality 16 --widen-factor 4 --schedule 150 225 --wd 5e-4 --gamma 0.1 --checkpoint checkpoints/cifar10/resnext-16x64d python cifar.py -a wrn --depth 28 --widen-factor 10 --drop 0.3 --epochs 200 --schedule 60 120 160 --wd 5e-4 --gamma 0.2 --checkpoint checkpoints/cifar10/WRN-28-10-dropNote:
- DenseNet uses weight decay value
1e-4. Larger weight decay (5e-4) is harmful for accuracy (95.46% vs. 94.05%) - Official batch size is 64. There is no significant difference using batch size 64 or 128 (95.46% vs 95.11%)
python cifar.py -a densenet --depth 100 --growthRate 12 --train-batch 64 --epochs 300 --schedule 150 225 --wd 1e-4 --gamma 0.1 --checkpoint checkpoints/cifar10/densenet-bc-100-12python cifar.py -a densenet --depth 190 --growthRate 40 --train-batch 64 --epochs 300 --schedule 150 225 --wd 1e-4 --gamma 0.1 --checkpoint checkpoints/cifar10/densenet-bc-L190-k40python cifar.py -a alexnet --dataset cifar100 --checkpoint checkpoints/cifar100/alexnet --epochs 164 --schedule 81 122 --gamma 0.1 python cifar.py -a vgg19_bn --dataset cifar100 --checkpoint checkpoints/cifar100/vgg19_bn --epochs 164 --schedule 81 122 --gamma 0.1 python cifar.py -a resnet --dataset cifar100 --depth 110 --epochs 164 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar100/resnet-110 python cifar.py -a resnet --dataset cifar100 --depth 1202 --epochs 164 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar100/resnet-1202 python cifar.py -a preresnet --dataset cifar100 --depth 110 --epochs 164 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar100/preresnet-110 python cifar.py -a resnext --dataset cifar100 --depth 29 --cardinality 8 --widen-factor 4 --checkpoint checkpoints/cifar100/resnext-8x64d --schedule 150 225 --wd 5e-4 --gamma 0.1python cifar.py -a resnext --dataset cifar100 --depth 29 --cardinality 16 --widen-factor 4 --checkpoint checkpoints/cifar100/resnext-16x64d --schedule 150 225 --wd 5e-4 --gamma 0.1python cifar.py -a wrn --dataset cifar100 --depth 28 --widen-factor 10 --drop 0.3 --epochs 200 --schedule 60 120 160 --wd 5e-4 --gamma 0.2 --checkpoint checkpoints/cifar100/WRN-28-10-droppython cifar.py -a densenet --dataset cifar100 --depth 100 --growthRate 12 --train-batch 64 --epochs 300 --schedule 150 225 --wd 1e-4 --gamma 0.1 --checkpoint checkpoints/cifar100/densenet-bc-100-12python cifar.py -a densenet --dataset cifar100 --depth 190 --growthRate 40 --train-batch 64 --epochs 300 --schedule 150 225 --wd 1e-4 --gamma 0.1 --checkpoint checkpoints/cifar100/densenet-bc-L190-k40python imagenet.py -a resnet18 --data ~/dataset/ILSVRC2012/ --epochs 90 --schedule 31 61 --gamma 0.1 -c checkpoints/imagenet/resnet18python imagenet.py -a resnet50 --data ~/dataset/ILSVRC2012/ --epochs 90 --schedule 31 61 --gamma 0.1 -c checkpoints/imagenet/resnet50Originally trained on 8 GPUs
python imagenet.py -a resnext50 --base-width 4 --cardinality 32 --data ~/dataset/ILSVRC2012/ --epochs 90 --schedule 31 61 --gamma 0.1 -c checkpoints/imagenet/resnext50-32x4dpython imagenet.py -a efficientnet_b0 --data ~/dataset/ILSVRC2012/ --epochs 350 --schedule 175 262 --gamma 0.1 --lr 0.256 --wd 1e-5 -c checkpoints/imagenet/efficientnet_b0python imagenet.py -a vit_b_16 --data ~/dataset/ILSVRC2012/ --epochs 300 --schedule 150 225 --gamma 0.1 --lr 3e-3 --opt adamw --wd 0.3 -c checkpoints/imagenet/vit_b_16Common Arguments:
-a, --arch: Model architecture (e.g.,resnet,vgg19_bn,densenet)--depth: Model depth (for depth-configurable architectures)--epochs: Number of training epochs--schedule: Learning rate decay milestones (epochs)--gamma: Learning rate decay factor--lr: Initial learning rate (default: 0.1)--wd: Weight decay (default: 1e-4)--train-batch: Training batch size (default: 128)--test-batch: Testing batch size (default: 100)-c, --checkpoint: Path to save checkpoints--resume: Path to checkpoint to resume training--gpu-id: GPU device ID(s) to use
Architecture-Specific:
--cardinality: Number of groups for ResNeXt--widen-factor: Width multiplier for WRN and ResNeXt--growthRate: Growth rate for DenseNet--drop: Dropout rate for WRN
DataParallel (Single Node):
CUDA_VISIBLE_DEVICES=0,1,2,3 python cifar.py -a resnet --depth 110 --gpu-id 0,1,2,3DistributedDataParallel (Recommended):
python -m torch.distributed.launch --nproc_per_node=4 cifar.py -a resnet --depth 110To resume from a checkpoint:
python cifar.py -a resnet --depth 110 --resume checkpoints/cifar10/resnet-110/checkpoint.pth.tarTraining logs are automatically saved to the checkpoint directory. To visualize training curves:
from utils.logger import Logger
logger = Logger('checkpoints/cifar10/resnet-110')
logger.plot()If you encounter out-of-memory errors:
- Reduce batch size:
--train-batch 64 - Use gradient accumulation (modify training script)
- Enable mixed precision training (requires apex or native PyTorch AMP)
- Use smaller model variants
- Learning Rate: Start with 0.1 for SGD, 3e-4 for Adam/AdamW
- Weight Decay: Use 1e-4 for most models, 5e-4 for ResNeXt/WRN
- Batch Size: 128 for CIFAR, 256 for ImageNet (adjust based on GPU memory)
- Data Augmentation: Random crop, horizontal flip for CIFAR; standard ImageNet augmentation
- Warmup: Consider LR warmup for large batch training or Vision Transformers
For exact reproduction:
- Use the same PyTorch and CUDA versions
- Set random seeds:
--manualSeed 42 - Use the provided hyperparameters exactly
- Ensure proper data preprocessing
Poor Convergence:
- Check learning rate (try reducing by 10x)
- Verify data normalization
- Check for gradient explosion (reduce LR or add gradient clipping)
Overfitting:
- Increase weight decay
- Add dropout (for supported architectures)
- Use data augmentation
- Consider early stopping
Underfitting:
- Increase model capacity (depth/width)
- Train for more epochs
- Reduce regularization (weight decay, dropout)