Image Classification with Vision Transformers: An Experimental Study

This project implements an image classification pipeline using the CIFAR-100 dataset by leveraging a Vision Transformer (ViT) model, as described in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Dosovitskiy et al., 2021). The project includes scripts for preprocessing, training, and testing the model.

Environment Setup

Python Version:
This project requires Python 3.10+.
Dependencies: Create a new Conda (or Miniconda) environment and install the required Python packages by running:

conda env create -f environment.yml -n vit

Key Libraries

PyTorch
torchvision
OpenCV
scikit-learn
tqdm
numpy
Pillow

Hardware Requirements

A CUDA-enabled GPU is recommended for training. The code automatically detects GPU availability.

Dataset Preparation

Preprocessing and Partitioning the Data

Before training, the raw images must be resized, normalized, and then partitioned into the training, validation, and test datasets. The training dataset also has data augmentations applied to increase image diversity (RandomHorizontalFlip). The preprocessing module includes functions for:

To run the preprocessing script, call the following in the Command Line:

python preprocess.py

The preprocess.py script will download the CIFAR-100 dataset from PyTorch and create the Dataloaders for the training, validation, and test datasets.

Training the Model

Step 1: Run Training

Execute the training script from your terminal:

python run_train_test.py --mode 'train' | Out-File train_log.txt

During Training, the Script Will:

Load the frame dataset.
Split the dataset into training, validation, and test sets using stratified sampling.
Apply data augmentation techniques (resizing, random flips, normalization).
Create custom PyTorch Datasets and DataLoaders.
Initialize the ViT model using the ViT Base-16 architecture.
Set up the loss function, optimizer, and learning rate scheduler.
Run the training loop while tracking loss and accuracy, saving the best model weights.

Testing and Evaluation

Step 2: Run Testing

Run the Testing Script:
Execute the testing script from your terminal:

python run_train_test.py --mode test | Out-File test_log.txt

Testing Script Overview

The testing script will:

Load the dataset splits (previously saved during training).
Create a DataLoader for the test set.
Load the trained model checkpoint.
Evaluate the model on the test data by computing overall accuracy, generating classification reports, and optionally producing confusion matrices.

Customization and Hyperparameters

You can modify several parameters to experiment with different settings:

Model Parameters

--mode: Choose between train (default) or test.

Training Parameters

--batch_size, --learning_rate, --n_epochs, and --dropout control the training dynamics.

By tweaking these parameters, you can study their impact on model performance and experiment with different network configurations.

Summary of Steps

Step 0: Dataset Preparation
Download and organize the CIFAR-100 dataset. The preprocess.py script will do this for you.
Step 1: Run Training
Execute run_train_test.py after configuring the --mode and other hyperparameters to train the model.
Step 2: Run Testing
Execute run_train_test.py after updating the --mode argument. The script will load the weights from the best model checkpoint to evaluate the model.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
results		results
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
models.py		models.py
preprocess.py		preprocess.py
run_train_test.py		run_train_test.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Classification with Vision Transformers: An Experimental Study

Table of Contents

Environment Setup

Key Libraries

Hardware Requirements

Dataset Preparation

Preprocessing and Partitioning the Data

Training the Model

Step 1: Run Training

During Training, the Script Will:

Testing and Evaluation

Step 2: Run Testing

Testing Script Overview

Customization and Hyperparameters

Model Parameters

Training Parameters

Summary of Steps

About

Uh oh!

Releases

Packages

Languages

License

dna-witch/vision-transformer-classifier

Folders and files

Latest commit

History

Repository files navigation

Image Classification with Vision Transformers: An Experimental Study

Table of Contents

Environment Setup

Key Libraries

Hardware Requirements

Dataset Preparation

Preprocessing and Partitioning the Data

Training the Model

Step 1: Run Training

During Training, the Script Will:

Testing and Evaluation

Step 2: Run Testing

Testing Script Overview

Customization and Hyperparameters

Model Parameters

Training Parameters

Summary of Steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages