Polyp Classification with PIBAdb Images

This project contains a process for training polyp classification models using images from the PIBAdb database.
The pipeline performs the following main tasks:

Generation of polyp-level stratified k-fold partitions
Optional inclusion of localized images of the same polyp to increase training data (automatically detected frames containing the same polyp, cropped using the bounding box metadata)
Oversampling of minority classes in the training split
Training of ImageNet-pretrained CNN models using GluonCV
Automatic generation of metrics, plots, and HTML reports

1. PIBAdb datasets

To use the pipeline, you must request access to the PIBAdb Cohort through the Biobank of the Instituto de Investigaciones Sanitarias Galicia Sur. Images for polyp classification must be cropped according to the bounding boxes in metadata.csv and placed in the experiments/datasets folder.

The required fields in the metadata.csv file are:

polyp_id: Unique identifier of the polyp
image_id: Identifier of the source image
x, y: Top-left corner of the bounding box
width, height: Dimensions of the bounding box
light: Illumination type (WL = White Light, NBI = Narrow-Band Imaging)
condition: Histological type of the polyp (e.g., adenoma, hyperplastic, sessile serrated adenoma, etc.)

2. Thesis Experiments

Based on Nogueira-Rodríguez et al., 2022:

Nogueira Rodríguez, A.; Daniel González, D.; Hugo López Fernández, P.D. Deep Learning Techniques for Computer-Aided Diagnosis in Colorectal Cancer. University of Vigo, 2022.

2.1 Classification classes

Adenoma = Adenoma, Sessile Serrated Adenoma (SSA), Traditional Serrated Adenoma (TSA)
Hyperplastic = Hyperplastic

Images from other histological categories were excluded (Non-Epithelial Neoplastic, Invasive, No Histology).

2.2 Experiment Datasets

The datasets differ in how the images were collected and whether additional automatically detected frames are included.

Dataset	Description	#Polyps
D1	Manual selection from full withdrawal videos	358
D2	D1 + automatically localized images (training only)	358
D3	Manual selection of large/close polyps from new videos focused on the classification task	129
D4	Combination D1 + D3	487
D5	Combination D2 + D3	487

Note: The dataset names D1, D2, D3, D4, and D5 are labels used in the thesis for clarity and do not correspond to actual folders or dataset identifiers in this repository. To reproduce the experiments, you must prepare your own dataset folders inside experiments/datasets/ and reference them through the dataset_name parameter in pipeline.params.

2.3 CNN Architectures

All models are pretrained on ImageNet:

ResNet50
VGG19
InceptionV

2.4 Training Strategy

The training process consists of:

150 epochs
Optional oversampling of minority classes in the training partition
5-fold stratified cross-validation at the polyp level (images from the same polyp cannot appear in different partitions)

3. Environment Setup

3.1 Requirements

Docker
Access to PIBAdb datasets (via request to the PIBAdb Cohort)

3.2 Build the Docker image

The pipeline runs inside a Docker container to ensure reproducibility. To build the Docker image:

docker build -t polydeep/classification .

4. Project Structure

It is recommended to organize datasets and experiments as follows:

/classification-project
│
├── experiments/
│   ├── datasets/
│   │   └── cropped_polyps_dataset/
│   │       ├── images/
│   │       ├── metadata.csv
│   │       └── polyp-metadata.csv
│   ├── Exp_K1_CNN/
│   │   ├── pipeline.params
│   │   └── ...
│   ├── Exp_K2_CNN/
│   │   ├── pipeline.params
│   │   └── ...
│   └── ...
└── ...

Note: The folder experiments/datasets/ contains all datasets used by the different experiments.

Note: Each experiment folder (Exp_K1_CNN, Exp_K2_CNN, etc.) contains a pipeline.params file specifying its configuration. Example:

dataset_name=cropped_polyps_dataset
model_name=vgg19
num_gpus=1
epochs=150
kfolds=5
fold=1
with_balanced_train
batch_size=96
learning_rate=0.00001

5. Basic Pipeline Execution

To run an experiment:

docker run --rm \
    -v /local/path/to/experiment:/experiment \
    polydeep/classification \
    -p /pipeline.xml \
    --params /experiment/pipeline.params

Where:

-v /local/path/to/experiment:/experiment mounts the local experiment folder inside the container.
-p /pipeline.xml specifies the pipeline to run (already included in the container).
--params /experiment/pipeline.params provides the experiment configuration.

Optional flags in pipeline.params

Flag	Description
with_balanced_train	Enables oversampling in the training set
with_localized_images	Adds additional automatically detected images of the same polyp (localized images) from another dataset exclusively to the training set
with_merge_ttv	Merges train/val partitions from two prior experiments

6. Running the Thesis Experiments

The experiments described in the thesis fall into three categories, depending on dataset composition and training strategy. This section describes how to reproduce each one from this repository using the Compi pipeline.

6.1. Standard Dataset (No Additional Images)

Applies to:

D1 – PIBAdb Manually Selected NBI
D3 – PIBAdb Manually Selected for Classification NBI

Example configuration

In the experiment folder (e.g., Exp_K1_ResNet50/), the pipeline.params file must specify the dataset name. For example, if we name the dataset D1 or D3, it would be:

dataset_name = D1     # or D3

Run with:

docker run --rm \
    -v /local/path/to/experiment:/experiment \
    polydeep/classification \
    -p /pipeline.xml \
    --params /experiment/pipeline.params

6.2 Experiments augmented with additional images during training

These experiments include additional images of the same polyps obtained from a secondary dataset (e.g., “PIBAdb Automatically Selected NBI”) and include them only in the training split.

Applicable to:

D2 – D1 + localized images (training only)

Example configuration

dataset_name = D2
with_localized_images

Run with:

docker run --rm \
    -v /local/path/to/dataset:/datasets-localized \
    -v /local/path/to/experiment:/experiment \
    polydeep/classification \
    -p /pipeline.xml \
    --params /experiment/pipeline.params

6.3 Combined Datasets

These experiments require merging partitions from two distinct preliminary experiments.

Steps:

Create two baseline experiments. For example:
- Exp_K1_D1_ResNet50
- Exp_K1_D3_ResNet50

Enable with_merge_ttv to merge the train and val partitions before training.

Example configuration

exp1=Exp_K1_D1_ResNet50
exp2=Exp_K1_D3_ResNet50
model_name=resnet50
epochs=150
kfolds=5
fold=1
with_merge_ttv

7. Generated Outputs

After execution, the experiment directory contains:

results/: Training and validation metrics (CSV format)
summaries/: Performance and loss plots (PNG format)
index.html: Summary report with all metrics and plots
ttv/: train/ and validation/ folders with metadata and image list

Note: Results are stored inside the mounted experiment folder within the container.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
polyps		polyps
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compi.project		compi.project
pipeline.xml		pipeline.xml
run-several-experiments.sh		run-several-experiments.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Polyp Classification with PIBAdb Images

1. PIBAdb datasets

2. Thesis Experiments

2.1 Classification classes

2.2 Experiment Datasets

2.3 CNN Architectures

2.4 Training Strategy

3. Environment Setup

3.1 Requirements

3.2 Build the Docker image

4. Project Structure

5. Basic Pipeline Execution

Optional flags in pipeline.params

6. Running the Thesis Experiments

6.1. Standard Dataset (No Additional Images)

Example configuration

6.2 Experiments augmented with additional images during training

Example configuration

6.3 Combined Datasets

Example configuration

7. Generated Outputs

About

Uh oh!

Releases

Packages

Languages

License

sing-group/pibadb-classification

Folders and files

Latest commit

History

Repository files navigation

Polyp Classification with PIBAdb Images

1. PIBAdb datasets

2. Thesis Experiments

2.1 Classification classes

2.2 Experiment Datasets

2.3 CNN Architectures

2.4 Training Strategy

3. Environment Setup

3.1 Requirements

3.2 Build the Docker image

4. Project Structure

5. Basic Pipeline Execution

Optional flags in pipeline.params

6. Running the Thesis Experiments

6.1. Standard Dataset (No Additional Images)

Example configuration

6.2 Experiments augmented with additional images during training

Example configuration

6.3 Combined Datasets

Example configuration

7. Generated Outputs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages