Skip to content

Ignatiusboadi/Resnet_Imagenet_with_Cutout

Repository files navigation

Resnet_Imagenet_with_Cutout

Overview of Resnet

ResNet, short for Residual Network, is a type of deep neural network architecture in this roject we explores the challenges faced by researchers prior to the advent of ReNet architecture in training deep neural networks with a higher number of layers. The primary obstacle was the vanishing gradient problem during the backpropagation process, which hindered the efficient updating of kernel values once the network's layers exceeded a certain threshold.

Vanishing Gradient Problem

The vanishing gradient problem refers to the phenomenon where gradients used for updating neural network weights diminish exponentially as they are propagated back through the network. This results in very small updates to the weights of the initial layers, causing the network to learn very slowly, if at all. This problem was particularly evident in deeper networks, making it difficult to train networks effectively beyond a certain depth.

Impact on Training Deep Neural Networks

Before the introduction of ReNet architecture, traditional neural network architectures struggled to maintain performance as the number of layers increased. As shown in the graph below, the training and testing errors were higher for a 56-layer model compared to a 20-layer model. This indicates that increasing the number of layers did not necessarily lead to better performance and, in fact, often resulted in worse performance due to the vanishing gradient problem.

Graph: Training and Testing Error

Screenshot 2024-06-19 at 22 17 02

Figure 1: Training and Testing Error Comparison Between 20-layer and 56-layer Models

ReNet Architecture

ReNet architecture was developed to address these issues, enabling the training of much deeper neural networks by mitigating the vanishing gradient problem. ReNet introduced various innovations that helped in maintaining effective gradient flow, thus allowing for the training of networks with significantly more layers without the degradation in performance that was previously observed.

Importance of Skip Connections

Skip connections, also known as identity shortcuts, play a pivotal role in deep learning models like ReNet and ResNet by addressing the vanishing gradient problem. By allowing gradients to flow more effectively through the network, skip connections ensure that earlier layers receive meaningful updates during training, thereby mitigating the issue of gradients diminishing to insignificance. This not only improves the training efficiency by enabling more efficient kernel learning across layers but also enhances the model's capability to extract complex features from the data. Furthermore, skip connections enable the training of deeper networks without compromising performance, leading to more sophisticated and accurate predictive models.

cutout data augmentation

Data augmentation is one solution to avoid overfitting a training dataset of images.

Cutout augmentation is a technique used in image data augmentation where random square patches of pixels are masked out during training. This method helps in regularizing deep learning models by preventing overfitting and encouraging the learning of more robust features. By obscuring different parts of the images in each training iteration, cutout augmentation diversifies the training dataset and improves the model's ability to generalize to unseen data. It is a cost-effective and straightforward approach to enhancing the performance and robustness of image classification models.

Screenshot 2024-06-19 at 22 41 52

Result

in this poject we compares the performance of three convolutional neural network architectures created for image classification: Plain ConvNet, ResNet-18, and ResNet-18 with Cutout. The evaluation focuses on their training dynamics, final validation metrics, and the impact of augmentation strategies on model performance.

Models Created

1. Plain ConvNet

  • Started with a high initial loss of 11.02.
  • Plateaued in performance after epoch 10 with a final validation loss of 11%, indicating poor generalization.
  • Highlighted the challenges of deep networks without effective optimization strategies.

2. ResNet-18

  • Initially exhibited instability with a high loss of 6.9 but improved steadily.
  • Achieved a final validation loss of 4.3 and a validation accuracy of 39%.
  • Demonstrated robust learning and effective classification capabilities compared to Plain ConvNet.

3. ResNet-18 with Cutout

  • Implemented with cutout data augmentation to enhance robustness and generalization.
  • Showed significant improvement early in training, achieving a validation accuracy of 46.28% by epoch 10.
  • Highlighted the effectiveness of cutout augmentation in improving model performance and potential for further optimization.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages