Skip to content

Commit f706a64

Browse files
committed
modify : VGG tutorial
Lint and Technical Details and flexible code style
1 parent 7938635 commit f706a64

File tree

2 files changed

+74
-56
lines changed

2 files changed

+74
-56
lines changed

.ci/docker/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# --extra-index-url https://download.pytorch.org/whl/cu117/index.html # Use this to run/publish tutorials against the latest binaries during the RC stage. Comment out after the release. Each release verify the correct cuda version.
22
# Refer to ./jenkins/build.sh for tutorial build instructions
33

4+
albumentations
45
sphinx==5.0.0
56
sphinx-gallery==0.11.1
67
sphinx_design

beginner_source/Pretraining_Vgg_from_scratch.py

Lines changed: 73 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,33 @@
55
66
**Author:** `WoongJoon Choi <https://github.com/woongjoonchoi>`_
77
8-
In this tutorial, we will embark on an exciting journey to build and
9-
train a VGG network from scratch using Python and popular deep learning
10-
libraries such as PyTorch. We will dive into the details of the VGG
8+
VGG (Visual Geometry Group) is a convolutional neural network architecture that is particularly
9+
efficient in image classification tasks. In this tutorial, we will guide you through building
10+
and training a VGG network from scratch using Python and PyTorch. We will dive into the details of the VGG
1111
architecture, understanding its components and the rationale behind its
1212
design.
1313
1414
Our tutorial is designed for both beginners who are new to deep learning
1515
and seasoned practitioners looking to deepen their understanding of CNN
1616
architectures.
1717
18-
Before you start
18+
.. grid:: 2
19+
20+
.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
21+
:class-card: card-prerequisites
22+
23+
* Understand the VGG architecture and train it from scratch using PyTorch.
24+
* Use PyTorch tools to evaluate the VGG model's performance
25+
26+
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
27+
:class-card: card-prerequisites
28+
29+
* Complete the `Learn the Basics tutorials <https://pytorch.org/tutorials/beginner/basics/intro.html>`__
30+
* Familiarity with basic machine learning concepts and terms
31+
32+
If you are running this in Google Colab, install albumentations
1933
20-
.. code-block:: sh
2134
22-
pip install albumentations
2335
"""
2436
import subprocess
2537
import sys
@@ -55,7 +67,7 @@
5567

5668

5769
######################################################################
58-
# Worth point of this tutorial
70+
# Purpose point of this tutorial
5971
# ----------------------------
6072
#
6173

@@ -77,17 +89,23 @@
7789

7890

7991
######################################################################
80-
# Why VGG is so popular ?
92+
# Background
8193
# -----------------------
8294
#
8395

8496

8597
######################################################################
8698
# VGG became a model that attracted attention because it succeeded in
8799
# building deeper layers and dramatically shortening the training time
88-
# compared to ``alexnet``, which was the SOTA model at the time.:
100+
# compared to ``AlexNet``, which was the SOTA model at the time.
89101
#
90-
102+
# Unlike ``AlexNet``'s 5x5 9x9 filters, VGG only uses 3x3 filters.
103+
# Using multiple 3x3 filters can obtain the same receptive field as using a 5x5 filter, but it is effective in reducing the number of parameters.
104+
# In addition, since it passes through multiple nonlinear functions, the nonlinearity increases even more.
105+
#
106+
# VGG applied a max pooling layer after multiple convolutional layers to reduce the spatial size.
107+
# This allowed the feature map to be downsampled while preserving important information.
108+
# Thanks to this, the network could learn high-dimensional features in deeper layers and prevent overfitting.
91109

92110
######################################################################
93111
# VGG Configuration
@@ -96,11 +114,11 @@
96114

97115

98116
######################################################################
99-
# We define some configurations suggested in VGG paper . Details about
117+
# We define some configurations suggested in VGG paper. Details of
100118
# this configuration will be explained below section.
101119
#
102120

103-
DatasetName = 'Cifar' # CIFAR ,CIFAR10, MNIST , ImageNet
121+
DatasetName = 'CIFAR' # CIFAR, CIFAR10, MNIST, ImageNet
104122

105123
## model configuration
106124

@@ -143,8 +161,8 @@
143161

144162

145163
######################################################################
146-
# | If your GPU memory is 24GB ,The maximum batch size is 128. But, if you
147-
# use Colab , We recommend using 32 .
164+
# | If your GPU memory is 24GB, the maximum batch size is 128. But if you
165+
# use Colab, we recommend using 32GB.
148166
# | You can modify the batch size according to your preference.
149167
#
150168

@@ -156,10 +174,15 @@
156174

157175

158176
######################################################################
159-
# We use ``CIFAR100`` Dataset in this tutorial. In VGG paper , the authors
160-
# scales image ``isotropically`` . Then , they apply
161-
# ``Normalization``,``RandomCrop``,``HorizontalFlip`` . So , we need to override
162-
# CIFAR100 class to apply preprocessing.
177+
# We use the ``CIFAR100`` dataset in this tutorial. In VGG paper, the authors scales image ``isotropically`` .
178+
# ``Isotropiclay scale up`` is a method of increasing the size of an image while maintaining its proportions, preventing distortion and maintaining the consistency of the object.
179+
# Then they apply ``Normalization``,``RandomCrop``,``HorizontalFlip`` .
180+
# Normalizing input data to a range of 0 to 1 tends to lead to faster convergence of the model. This is because the weight updates are more uniform. In particular, neural network models can have significantly different weight updates depending on the range of input values.
181+
# Neural network models generally work best when the input data is within a certain range (e.g. 0 to 1). If RGB values ​​are not normalized, the model is fed input values ​​of different ranges, which makes it difficult for the model to process the data in a consistent manner. Normalization allows all data to be scaled to the same scale, which allows the model to treat each feature more evenly, improving performance.
182+
# If the training and test data have different ranges, the model may have difficulty generalizing. Therefore, it is important to fit the values ​​to the same range across all data. This allows the model to perform well on both test data and real data.
183+
# Using normalized images as input allows the neural network to learn more effectively and show stable performance.
184+
# Data augmentation, such as ``RandomCrop`` and ``HorizontalFlip``, is a very useful technique for improving the performance of deep learning models, preventing overfitting, and helping models to work robustly in various environments. In particular, when the dataset is small or limited, data augmentation can secure more data, and the model can show better generalization performance by learning various transformed data.
185+
# So we need to override CIFAR100 class to apply preprocessing.
163186
#
164187

165188
class Custom_Cifar(CIFAR100) :
@@ -224,10 +247,8 @@ def __getitem__(self, index: int) :
224247

225248

226249
######################################################################
227-
# | In VGG paper, they do experiment over 6 models. model A is 11 layers,
228-
# model B is 13 layers, model C is 16 layers , model D is 16 layers and
229-
# model D is 19 layers . you can train all version of models to
230-
# reproduce VGG .
250+
# | The VGG paper experiments over 6 models of varying layer depth. The various configurations
251+
# | are enumerated below for full reproduction of the results.
231252
# | ``Config_Channels`` means output channels and ``Config_kernels`` means
232253
# kernel size.
233254
#
@@ -236,26 +257,26 @@ def __getitem__(self, index: int) :
236257
from torch import nn
237258

238259

239-
Config_channels = {
240-
"A" : [64,"M" , 128, "M" , 256,256,"M" ,512,512 ,"M" , 512,512,"M"] ,
241-
"A_lrn" : [64,"LRN","M" , 128, "M" , 256,256,"M" ,512,512 ,"M" , 512,512,"M"] ,
242-
"B" :[64,64,"M" , 128,128, "M" , 256,256,"M" ,512,512 ,"M" , 512,512,"M"] ,
243-
"C" : [64,64,"M" , 128,128, "M" , 256,256,256,"M" ,512,512 ,512,"M" , 512,512,512,"M"] ,
244-
"D" :[64,64,"M" , 128,128, "M" , 256,256,256,"M" ,512,512 ,512,"M" , 512,512,512,"M"] ,
245-
"E" :[64,64,"M" , 128,128, "M" , 256,256,256,256,"M" ,512,512 ,512,512,"M" , 512,512,512,512,"M"] ,
260+
# Config_channels -> number : output_channels , "M": max_pooling layer
246261

262+
Config_channels = {
263+
"A":[64,"M",128,"M",256,256,"M",512,512,"M",512,512,"M"],
264+
"A_lrn":[64,"LRN","M",128,"M",256,256,"M",512,512,"M",512,512,"M"],
265+
"B":[64,64,"M",128,128,"M",256,256,"M",512,512,"M",512,512,"M"],
266+
"C":[64,64,"M",128,128,"M",256,256,256,"M",512,512,512,"M",512,512,512,"M"],
267+
"D":[64,64,"M",128,128,"M",256,256,256,"M",512,512,512,"M",512,512,512,"M"],
268+
"E":[64,64,"M",128,128,"M",256,256,256,256,"M",512,512,512,512,"M",512,512,512,512,"M"],
247269
}
248270

249271

250-
272+
# Config_kernel -> kernel_size
251273
Config_kernel = {
252-
"A" : [3,2 , 3, 2 , 3,3,2 ,3,3 ,2 , 3,3,2] ,
253-
"A_lrn" : [3,2,2 , 3, 2 , 3,3,2 ,3,3 ,2 , 3,3,2] ,
254-
"B" :[3,3,2 , 3,3, 2 , 3,3,2 ,3,3 ,2 , 3,3,2] ,
255-
"C" : [3,3,2 , 3,3, 2 , 3,3,1,2 ,3,3 ,1,2 , 3,3,1,2] ,
256-
"D" :[3,3,2 , 3,3, 2 , 3,3,3,2 ,3,3 ,3,2 , 3,3,3,2] ,
257-
"E" :[3,3,2 , 3,3, 2 , 3,3,3,3,2 ,3,3 ,3,3,2 , 3,3,3,3,2] ,
258-
274+
"A":[3,2,3,2,3,3,2,3,3,2,3,3,2],
275+
"A_lrn":[3,2,2,3,2,3,3,2,3,3,2,3,3,2],
276+
"B":[3,3,2,3,3,2,3,3,2,3,3,2,3,3,2],
277+
"C":[3,3,2,3,3,2,3,3,1,2,3,3,1,2,3,3,1,2],
278+
"D":[3,3,2,3,3,2,3,3,3,2,3,3,3,2,3,3,3,2],
279+
"E":[3,3,2,3,3,2,3,3,3,3,2,3,3,3,3,2,3,3,3,3,2],
259280
}
260281

261282

@@ -284,7 +305,8 @@ def make_feature_extractor(cfg_c,cfg_k):
284305

285306

286307
class Model_vgg(nn.Module) :
287-
def __init__(self,version , num_classes):
308+
# def __init__(self,version , num_classes):
309+
def __init__(self,conf_channels,conf_kernels , num_classes):
288310
conv_5_out_w ,conv_5_out_h = 7,7
289311
conv_5_out_dim =512
290312
conv_1_by_1_1_outchannel = 4096
@@ -296,7 +318,7 @@ def __init__(self,version , num_classes):
296318
self.except_xavier = except_xavier
297319

298320
super().__init__()
299-
self.feature_extractor = make_feature_extractor(Config_channels[version] , Config_kernel[version])
321+
self.feature_extractor = make_feature_extractor(conf_channels, conf_kernels)
300322
self.avgpool = nn.AdaptiveAvgPool2d((1,1))
301323
self.output_layer = nn.Sequential(
302324
nn.Conv2d(conv_5_out_dim ,conv_1_by_1_1_outchannel ,7) ,
@@ -354,17 +376,12 @@ def _init_weights(self,m):
354376
nn.init.constant_(m.bias, 0)
355377

356378

357-
######################################################################
358-
# Parameter Initialization
359-
# ~~~~~~~~~~~~~~~~~~~~~~~~
360-
#
361-
362379

363380
######################################################################
364-
# When training VGG , the authors first train model A , then initialized
365-
# the weights of other models with the weights of model A. Waiting for
381+
# When training VGG, the authors first train model A, then continue training from
382+
# the resultant weights for other variants. Waiting for
366383
# Model A to be trained takes a long time . The authors mention how to
367-
# train with ``xavier`` initialization rather than initializing with the
384+
# train with ``Xavier`` initialization rather than initializing with the
368385
# weights of model A. But, they do not mention how to initialize .
369386
#
370387
# | To Reproduce VGG , we use ``xavier`` initialization method to initialize
@@ -418,7 +435,7 @@ def accuracy(output, target, topk=(1,)):
418435
#
419436

420437
model_version='B'
421-
model = Model_vgg(model_version,num_classes)
438+
model = Model_vgg(Config_channels[model_version],Config_kernel[model_version],num_classes)
422439
criterion = nn.CrossEntropyLoss()
423440

424441
optimizer = optim.SGD(model.parameters(), lr=lr, weight_decay=weight_decay,momentum=momentum)
@@ -575,7 +592,7 @@ def accuracy(output, target, topk=(1,)):
575592
# -------------------
576593
#
577594

578-
class Cusotm_ImageNet(ImageNet) :
595+
class Custom_ImageNet(ImageNet) :
579596
def __init__(self,root,transform = None,multi=False,s_max=None,s_min=256,split=None,val=False):
580597

581598
self.multi = multi
@@ -615,7 +632,6 @@ def __getitem__(self, index: int) :
615632
if img.mode == 'L' : img = img.convert('RGB')
616633
img=np.array(img,dtype=np.float32)
617634

618-
# print(self.transform)
619635

620636
if self.transform is not None:
621637
img = self.transform(image=img)
@@ -631,8 +647,8 @@ def __getitem__(self, index: int) :
631647
return img, target
632648

633649
if DatasetName == 'ImageNet' :
634-
train_data= Cusotm_ImageNet(root='ImageNet',split='train')
635-
val_data= Cusotm_ImageNet('ImageNet',split='val',val=True)
650+
train_data= Custom_ImageNet(root='ImageNet',split='train')
651+
val_data= Custom_ImageNet('ImageNet',split='val',val=True)
636652
val_data.val= True
637653
val_data.s_min = test_min
638654
val_data.transform= A.Compose(
@@ -647,14 +663,15 @@ def __getitem__(self, index: int) :
647663
######################################################################
648664
# Conclusion
649665
# ----------
650-
# We have seen how ``pretraining`` VGG from scratch . This Tutorial will be helpful to reproduce another Foundation Model .
666+
# We have seen how ``pretraining`` VGG from scratch .
667+
# This Tutorial will be helpful to reproduce another Foundation Model .
651668

652669
######################################################################
653670
# More things to try
654671
# ------------------
655-
# - Trying On ImageNet
656-
# - Try All version of Model
657-
# - Try All evaluation method in VGG paper
672+
# - Apply model to ImageNet
673+
# - Try all model variants
674+
# - Try additional evaluation method
658675

659676

660677
######################################################################

0 commit comments

Comments
 (0)