I am using publicly available KID datasets for training and testing purpose.
KID1 dataset consists of nine abnormalities: Angioectasias, Aphthae, Bleeding, ChylousCysts, Lymphangectasias, Polypoids, Stenoses, Ulcers, and Villous Oedemas.
KID2 dataset contains four abnormalities: Ampulla of Vater, Inflammatory, Polypoids, and Vascular.
Since annotated data is inadequate to get good classification accuracy, I am using Albumentations for data augmentation.
Figure 1: Images used for training the model.
Figure 2: Images used for testing the model.
Figure 3: Outputs of the filters performing various image processing techniques like edge detection, colour invervion etc.
Figure 4: Learning rate finder - around 1x10^{-3} loss is mimimum.
Implementation of Vision Transformers paper for the classification task of the trails generated by a Wireless Capsule Endoscopy (WCE).
Figure 5: Illustrates the overview of the model. The images are first divided into patches followed by position embedding, which then passed to the Transformer encoder. To perform classification, an extra learnable “classification token” is added to the sequence.
Abnormalities: Ampulla of Vater, Inflammatory, Polypids, and Vascular.
Train images = 478 | Validation images = 54 | Testing images = 61
Table 1: Comparative analysis of multiclass classification of ResNet and Vision Transformer models on KID2.
| Model | #Images | Accuracy |
|---|---|---|
| Resnet50 | 593 | 0.84 |
| Vision Transformer | 593 | 0.85 |
Official Vision Transformer Github Repo | Pytorch Model of Vision Transformer