This repository implements a Vision Transformer (ViT) from scratch, designed for medical image classification using the NIH Chest X-ray 14 (CXR14) dataset. It includes everything from data preprocessing to model training and evaluation, providing a complete pipeline for experimenting with ViTs in medical imaging. This can be found in vit_base_model.
It also implements the fine tuning of a vision transformer (based on google's seminal vision transformer paper), with associated evaluation metrics. This can be found in vit_pre_trained_fine_tune.