Reference Paper: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- Written in PyTorch, this model has a total of 26 high-level blocks and can classify upto 1001 different classes of images.
- It has a complete depth of 164 layers
- Input-size =
3 x 299 x 299
- The individual blocks have been defined seperately with explicit mention of
in_channelsandout_channelsfor each layer to maintain a visual flow of how images are moving around in the network - A custom
LambdaScalelayer has been introduced to scale the residuals, as discussed in the original paper to tackle the problem of the later layers dying early in training - Batch Normalization has been done to ensure regularization

Layer design - Overview
- Loss function :
torch.nn.CrossEntropyLoss() - Optimizer :
torch.optim.Adam(amsgrad=True) - Scheduler :
torch.optim.lr_scheduler.ReduceLROnPlateau(mode='min', factor=0.2, threshold=0.01, patience=5)
prefetch_generator.BackgroundGeneratorhas been used to bring about computational efficiency by pre-loading the next mini-batch during training- The
state_dictof each epoch is stored in theresnet-v2-epochsdirectory (created if does not exist) - By default, it will try to run training using a CUDA GPU, but it will back up to a CPU on not being able to detect the presence of one
- Parallelization has not been implemented as a design choice in order to keep the training function readable and easy to implement
- The results of the training session can be viewed interactively using
TensorBoardwith logs being stored in/runsdirectory - A benchmark of 00:30:03 hours was seen on a NVIDIA GTX 1650Ti 4GB, Intel i7-10750H, 16GB RAM, SSD-enabled computer to train 1 epoch
- Using the Face-Expression-Recognition-Dataset from
jonathanoheixon Kaggle, we train on a total of 28,821 images of 7 different classes including 'Anger', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad' and 'Surprise' - We perform some simple preprocessing using
torchvision.transforms.Resize(),torchvision.transforms.ToTensor()andtorchvision.transforms.Normalize()to get a good representation for the images in tensor format. - We make use of
torch.utils.data.DataLoader()to improve load times and process images in random mini-batches in an efficient and optimized manner
You can choose to run either the Jupyter notebook, or the scripts present within the Scripts folder of the repository
- Run the cells in order. Adjust parameters as you may see fit. Preferable number of
epochscould be easily increased with availability of hardware - There are helper functions present within the cells that you can use to generate predictions for images using the models. Feel free to use them
- Make sure you have the dependencies set up. For being on the safe side, you can run
pip install -r requirements.txt --no-index - Make changes as needed to the parameters in
train.pyas it contains the required code for training the model present inresnet_model.py. - If using VS Code, you can deploy a Tensorboard session directly by clicking on
Launch TensorBoard sessionabove theTensorboardimport present in the file. - Else, you can deploy by following the steps here. Using TensorBoard with PyTorch
- Paper: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- Authors: Christian Szegedy, Sergey Ioffe and Vincent Vanhoucke
- Images dataset - Source : Face Expression Recognition Dataset