MNIST Classification using CNN

This repository contains code for training a Convolutional Neural Network (CNN) on the MNIST dataset.

Model Architecture

The model uses a sequential CNN architecture with BatchNorm layers:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 8, 26, 26]              72
              ReLU-2            [-1, 8, 26, 26]               0
       BatchNorm2d-3            [-1, 8, 26, 26]              16
            Conv2d-4           [-1, 16, 24, 24]           1,152
              ReLU-5           [-1, 16, 24, 24]               0
       BatchNorm2d-6           [-1, 16, 24, 24]              32
         MaxPool2d-7           [-1, 16, 12, 12]               0
            Conv2d-8            [-1, 8, 12, 12]             128
              ReLU-9            [-1, 8, 12, 12]               0
      BatchNorm2d-10            [-1, 8, 12, 12]              16
           Conv2d-11           [-1, 16, 10, 10]           1,152
             ReLU-12           [-1, 16, 10, 10]               0
      BatchNorm2d-13           [-1, 16, 10, 10]              32
        MaxPool2d-14             [-1, 16, 5, 5]               0
           Linear-15                   [-1, 10]           4,010
================================================================
Total params: 6,610
Trainable params: 6,610
Non-trainable params: 0
================================================================

Architecture Details

Input Layer: 28x28x1 (MNIST images)
First Convolution Block:
- Conv2d(1, 8, 3) → ReLU → BatchNorm2d → 26x26x8
- Conv2d(8, 16, 3) → ReLU → BatchNorm2d → 24x24x16
- MaxPool2d(2,2) → 12x12x16
Second Convolution Block:
- Conv2d(16, 8, 1) → ReLU → BatchNorm2d → 12x12x8
- Conv2d(8, 16, 3) → ReLU → BatchNorm2d → 10x10x16
- MaxPool2d(2,2) → 5x5x16
Output Block:
- Flatten → 1655 = 400
- Linear(400, 10)
- LogSoftmax

Training Details

Batch Size: 512
Optimizer: SGD with momentum (lr=0.1, momentum=0.9)
Scheduler: StepLR (step_size=15, gamma=0.1)
Loss Function: Cross Entropy Loss
Epochs: 1

Results

Training metrics are collected for:

Training Loss
Training Accuracy
Test Loss
Test Accuracy

Training Metrics

The code includes plotting functionality (currently commented out) for visualizing:

Training and Test Loss curves
Training and Test Accuracy curves

Key Features

BatchNorm Integration: Each convolution layer is followed by BatchNorm
Parameter Efficient: ~6.6K total parameters
Sequential Blocks: Organized architecture using nn.Sequential
Data Augmentation:
- Random Center Crop (22x22)
- Resize to 28x28
- Random Rotation (-15° to +15°)
- Normalization (mean=0.3081, std=0.1307)

Requirements

Python 3.x
PyTorch
torchvision
matplotlib
tqdm
torchsummary

Usage

python erav4session4assignment1cnn.py

Model Strategy

Uses BatchNorm for better training stability
Efficient channel management (1→8→16→8→16)
Strategic MaxPooling after convolution blocks
Mix of 3x3 and 1x1 convolutions
Single fully connected layer at the end

Data Pipeline

Training Transforms

transforms.Compose([
    transforms.RandomApply([transforms.CenterCrop(22)], p=0.1),
    transforms.Resize((28, 28)),
    transforms.RandomRotation((-15., 15.), fill=0),
    transforms.ToTensor(),
    transforms.Normalize((0.3081,), (0.1307,)),
])

Test Transforms

transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.3081,), (0.1307,))
])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNIST Classification using CNN

Model Architecture

Architecture Details

Training Details

Results

Training Metrics

Key Features

Requirements

Usage

Model Strategy

Data Pipeline

Training Transforms

Test Transforms

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MNIST Classification using CNN

Model Architecture

Architecture Details

Training Details

Results

Training Metrics

Key Features

Requirements

Usage

Model Strategy

Data Pipeline

Training Transforms

Test Transforms