Skip to content

Latest commit

 

History

History
125 lines (103 loc) · 3.74 KB

File metadata and controls

125 lines (103 loc) · 3.74 KB

MNIST Classification using CNN

This repository contains code for training a Convolutional Neural Network (CNN) on the MNIST dataset.

Model Architecture

The model uses a sequential CNN architecture with BatchNorm layers:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 8, 26, 26]              72
              ReLU-2            [-1, 8, 26, 26]               0
       BatchNorm2d-3            [-1, 8, 26, 26]              16
            Conv2d-4           [-1, 16, 24, 24]           1,152
              ReLU-5           [-1, 16, 24, 24]               0
       BatchNorm2d-6           [-1, 16, 24, 24]              32
         MaxPool2d-7           [-1, 16, 12, 12]               0
            Conv2d-8            [-1, 8, 12, 12]             128
              ReLU-9            [-1, 8, 12, 12]               0
      BatchNorm2d-10            [-1, 8, 12, 12]              16
           Conv2d-11           [-1, 16, 10, 10]           1,152
             ReLU-12           [-1, 16, 10, 10]               0
      BatchNorm2d-13           [-1, 16, 10, 10]              32
        MaxPool2d-14             [-1, 16, 5, 5]               0
           Linear-15                   [-1, 10]           4,010
================================================================
Total params: 6,610
Trainable params: 6,610
Non-trainable params: 0
================================================================

Architecture Details

  1. Input Layer: 28x28x1 (MNIST images)
  2. First Convolution Block:
    • Conv2d(1, 8, 3) → ReLU → BatchNorm2d → 26x26x8
    • Conv2d(8, 16, 3) → ReLU → BatchNorm2d → 24x24x16
    • MaxPool2d(2,2) → 12x12x16
  3. Second Convolution Block:
    • Conv2d(16, 8, 1) → ReLU → BatchNorm2d → 12x12x8
    • Conv2d(8, 16, 3) → ReLU → BatchNorm2d → 10x10x16
    • MaxPool2d(2,2) → 5x5x16
  4. Output Block:
    • Flatten → 1655 = 400
    • Linear(400, 10)
    • LogSoftmax

Training Details

  • Batch Size: 512
  • Optimizer: SGD with momentum (lr=0.1, momentum=0.9)
  • Scheduler: StepLR (step_size=15, gamma=0.1)
  • Loss Function: Cross Entropy Loss
  • Epochs: 1

Results

Training metrics are collected for:

  • Training Loss
  • Training Accuracy
  • Test Loss
  • Test Accuracy

Training Metrics

The code includes plotting functionality (currently commented out) for visualizing:

  • Training and Test Loss curves
  • Training and Test Accuracy curves

Key Features

  1. BatchNorm Integration: Each convolution layer is followed by BatchNorm
  2. Parameter Efficient: ~6.6K total parameters
  3. Sequential Blocks: Organized architecture using nn.Sequential
  4. Data Augmentation:
    • Random Center Crop (22x22)
    • Resize to 28x28
    • Random Rotation (-15° to +15°)
    • Normalization (mean=0.3081, std=0.1307)

Requirements

  • Python 3.x
  • PyTorch
  • torchvision
  • matplotlib
  • tqdm
  • torchsummary

Usage

python erav4session4assignment1cnn.py

Model Strategy

  1. Uses BatchNorm for better training stability
  2. Efficient channel management (1→8→16→8→16)
  3. Strategic MaxPooling after convolution blocks
  4. Mix of 3x3 and 1x1 convolutions
  5. Single fully connected layer at the end

Data Pipeline

Training Transforms

transforms.Compose([
    transforms.RandomApply([transforms.CenterCrop(22)], p=0.1),
    transforms.Resize((28, 28)),
    transforms.RandomRotation((-15., 15.), fill=0),
    transforms.ToTensor(),
    transforms.Normalize((0.3081,), (0.1307,)),
])

Test Transforms

transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.3081,), (0.1307,))
])