Skip to content

dionvou/vesuvius_ink_detection

Repository files navigation

Vesuvius Ink Detection

About

Deep learning models for detecting ink on ancient Vesuvius scroll fragments.

Overview

This project implements state-of-the-art computer vision models to identify ink writing on 2,000-year-old papyrus scrolls from Herculaneum (destroyed by Mount Vesuvius in 79 AD). The models process 3D volumetric data from CT scans, treating stacked depth layers as temporal sequences to perform semantic segmentation of ink locations.

Key Features

  • Multiple Model Architectures: SWIN Transformer, VideoMAE, TimeSformer, 3D ResNet
  • Self-Supervised Pretraining: VideoMAE for learning from unlabeled scroll data
  • Experiment Tracking: Full integration with Weights & Biases

Project Structure

vesuvius_ink_detection/
├── models/                      # Model architectures
│   ├── swin.py                 # SWIN Transformer (primary model)
│   ├── vmae.py                 # VideoMAE
│   ├── timesformer_hug.py      # TimeSformer (HuggingFace)
│   ├── resnetall.py            # 3D ResNet variants
│   ├── i3dallnl.py             # I3D with non-local blocks
│   └── unetr.py                # UNETR segmentation
│
├── pretraining/                 # Self-supervised pretraining
│   ├── mae.py                  # VideoMAE pretraining
│   ├── mae_swin.py             # MAE for SWIN
│   ├── prepare_data.py         # Tile extraction for pretraining
│   └── download.sh             # Download pretraining segments
│
├── train_scripts/               # Training utilities
│   ├── vmae_train.py           # VideoMAE training wrapper
│   └── utils.py                # Helper functions
│
├── Training Scripts (root):
│   ├── swin_train.py           # SWIN Transformer training
│   ├── timesformer_hug_train.py # TimeSformer training
│   ├── train_resnet3d.py       # 3D ResNet training
│   ├── z_cv.py                 # Cross-validation experiments
│   └── utils.py                # Shared utilities
│
├── train_scrolls/               # Training data (per-fragment)
│   ├── frag5/                  # Fragment 5 (primary)
│   │   ├── layers/             # CT scan layers (22.tif, 23.tif, ...)
│   │   ├── frag5_inklabels.png # Ground truth annotations
│   │   └── frag5_mask.png      # Fragment boundary mask
│   └── [other fragments]
│
├── checkpoints/                 # Saved model weights
├── outputs/                     # Predictions and results
├── notebooks/                   # Exploratory analysis

Installation

Requirements

  • Python 3.8+
  • CUDA-capable GPU (recommended: 16GB+ VRAM)
  • PyTorch with CUDA support

Setup

# Clone the repository
git clone <repository-url>
cd vesuvius_ink_detection

# Install dependencies
pip install -r requirements.txt

Data Organization

Downloading Training Data

The project includes an automated download script (download.sh) to fetch Vesuvius Challenge data from the official repository.

Quick Start

# Make the script executable
chmod +x download.sh

# Run the download script
./download.sh

What Gets Downloaded

The script downloads two types of data:

1. Fragment Data (smaller pieces with known ink labels):

  • Fragment 1 (Frag1): PHercParis2Fr47 scanned at 54keV with 3.24um resolution
  • Fragment 5 (Frag5): PHerc1667Cr1Fr3 scanned at 70keV with 3.24um resolution

2. Full Scroll Data (larger intact scrolls):

  • Scroll 4 (20231210132040): PHerc1667 segment from full scroll

For each dataset, the script downloads:

  • Layer files: CT scan slices (layers 15-45) in TIF/PNG format
  • Auxiliary files:
    • *_mask.png - Fragment boundary masks
    • *_inklabels.png - Ground truth ink annotations

How It Works

The download script uses two main functions:

download_layers: Downloads a range of numbered layer files

  • Tries multiple file extensions (tif, png, jpg) until finding the correct format
  • Checks file existence before downloading to avoid errors
  • Downloads layers 15-45 by default (configurable)

download_aux_files: Downloads mask and inklabels files

  • Searches directory listings for files ending in mask or inklabels
  • Handles unknown filename prefixes automatically
  • Renames files to standardized format: {fragment_id}_{suffix}.{ext}

Download Structure

Downloaded data is organized in train_scrolls/:

train_scrolls/
├── Frag1/
│   ├── layers/
│   │   ├── 15.tif
│   │   ├── 16.tif
│   │   └── ... (through 45.tif)
│   ├── Frag1_mask.png
│   └── Frag1_inklabels.png
├── Frag5/
│   ├── layers/
│   │   ├── 15.tif
│   │   └── ...
│   ├── Frag5_mask.png
│   └── Frag5_inklabels.png
└── 20231210132040/
    ├── layers/
    │   ├── 15.tif
    │   └── ...
    ├── 20231210132040_mask.png
    └── 20231210132040_inklabels.png

Customization

To download additional fragments, edit download.sh:

# Add new fragment
fragments=("Frag1" "Frag2" "Frag3")  # Add to array

# Change layer range
download_layers "$layers_url" "$out_dir" 10 50 extensions1[@]  # Layers 10-50

# Change file extensions to try
extensions=(tif png jpg jpeg)

Authentication

The script uses default public credentials for the Vesuvius Challenge data repository:

  • Username: ...
  • Password: ...

These are publicly available credentials for accessing competition data.

Fragment Directory Structure

Each fragment directory follows this structure:

fragment_id/
├── layers/                      # Volumetric CT scan data
│   ├── 15.tif                  # Individual depth layers
│   ├── 16.tif
│   └── ... (15-45 or more layers)
├── {fragment_id}_inklabels.png # Ground truth ink labels (binary mask)
└── {fragment_id}_mask.png      # Fragment boundary mask

Quick Start

Training with Unified Script

The project includes a unified training script (train.py) with command-line argument support for easy experimentation. Use the provided run.sh script to launch training:

# Make the script executable
chmod +x run.sh

# Run training with default configuration
./run.sh

The run.sh script trains a SWIN Transformer model with the following default configuration:

python train.py \
  --model swin \
  --segment_path ./train_scrolls/ \
  --segments Frag5 s4 \
  --valid_id Frag5 \
  --start_idx 24 \
  --in_chans 16 \
  --valid_chans 16 \
  --size 224 \
  --tile_size 224 \
  --stride_divisor 8 \
  --train_batch_size 2 \
  --valid_batch_size 2 \
  --lr 5e-5 \
  --epochs 40 \
  --scheduler cosine \
  --weight_decay 1e-6 \
  --warmup_factor 10 \
  --norm true \
  --aug fourth \
  --num_workers 8 \
  --seed 0 \
  --max_grad_norm 1.0 \
  --comp_name vesuvius \
  --wandb_project vesuvius \
  --save_top_k -1 \
  --devices -1 \
  --strategy ddp_find_unused_parameters_true

Training with Custom Configuration

You can customize training by modifying run.sh or calling train.py directly:

# Example: Train VideoMAE on different fragments
python train.py \
  --model vmae \
  --segments Frag1 Frag5 \
  --valid_id Frag1 \
  --in_chans 24 \
  --size 64 \
  --epochs 50 \
  --lr 1e-4

# Example: Train with higher resolution
python train.py \
  --model swin \
  --size 448 \
  --tile_size 448 \
  --train_batch_size 1

Available Arguments

Key command-line arguments for train.py:

  • Model: --model (choices: swin, vmae, timesformer_hug, resnet)
  • Data: --segment_path (path to training scrolls), --segments (training fragments), --valid_id (validation fragment)
  • Input: --start_idx (first layer), --in_chans (number of channels), --valid_chans (validation channels)
  • Resolution: --size (input size), --tile_size (tile size), --stride_divisor (stride calculation)
  • Training: --train_batch_size, --valid_batch_size, --lr, --min_lr, --epochs, --scheduler, --weight_decay, --warmup_factor
  • Augmentation: --aug (choices: none, shift, fourth, None), --norm (apply normalization)
  • Distributed: --devices (GPU count), --strategy (DDP strategy), --precision (training precision)
  • Output: --comp_name (competition name), --wandb_project (W&B project name)
  • Checkpoint: --checkpoint_path (resume from checkpoint), --save_top_k (save top k models)

Legacy Training Scripts

Individual training scripts are still available in train_scripts/:

# SWIN Transformer (legacy)
python train_scripts/swin_train.py

# TimeSformer (legacy)
python train_scripts/timesformer_hug_train.py

# 3D ResNet (legacy)
python train_scripts/train_resnet3d.py

# VideoMAE training
python train_scripts/vmae_train.py

Model Architectures

1. SWIN Transformer (models/swin.py)

Primary model - Shifted Window Vision Transformer adapted for volumetric data.

  • Input: 224×224 spatial, 16-24 depth channels
  • Output: Binary segmentation mask (ink vs. no-ink)
  • Features:
    • Hierarchical shifted window attention
    • Variable input channels (8-54)
    • Combined loss: DiceLoss + SoftBCEWithLogitsLoss
  • Training: swin_train.py

2. VideoMAE (models/vmae.py)

Video Masked Autoencoder for self-supervised pretraining.

  • Input: 64×64 or 224×224, 16-24 frames
  • Pretraining: 75-90% mask ratio, pixel reconstruction
  • Fine-tuning: Linear classifier head
  • Training: pretraining/mae.py

3. TimeSformer (models/timesformer_hug.py)

Transformer designed for video/temporal understanding.

  • Variants: HuggingFace and Facebook implementations
  • Features: Divided space-time attention
  • Training: timesformer_hug_train.py

4. 3D ResNet (models/resnetall.py)

ResNet extended to 3D convolutions.

  • Depths: 10, 18, 34, 50, 101, 152, 200 layers
  • Pretrained: r3d101_KM_200ep.pth (Kinetics-400)
  • Training: train_resnet3d.py

Self-Supervised Pretraining

Pretraining on unlabeled scroll data improves downstream performance.

Masked Autoencoder (MAE)

cd pretraining
python mae.py
  • Method: Mask 75-90% of patches, reconstruct pixel values
  • Configuration: 16-channel input, 16-24 frames
  • Loss: L1 norm pixel prediction
  • Checkpoints: videomae_epoch=063_val_loss=0.3684.ckpt

Training Infrastructure

PyTorch Lightning Framework

All training scripts use PyTorch Lightning with:

  • Distributed Training: DDP (multi-GPU)
  • Mixed Precision: FP16 for memory efficiency
  • Gradient Clipping: Max norm 1.0
  • Learning Rate Scheduling: Cosine with warmup
  • Checkpointing: Automatic saves with encoded metadata

Weights & Biases Integration

Experiment tracking and hyperparameter logging:

wandb.init(project='vesuvius', name='experiment_name')

View runs at: wandb.ai (requires login)

Checkpoint Naming Convention

Checkpoints encode complete hyperparameter information:

{MODEL}_{FRAGMENTS}_valid={VALID_ID}_size={SIZE}_lr={LR}_in_chans={CHANS}_norm={NORM}_fourth={AUG}_epoch={EPOCH}.ckpt

Example:

SWIN_['frag5','s4']_valid=frag5_size=224_lr=2e-05_in_chans=16_norm=True_epoch=7.ckpt

This enables:

  • Easy checkpoint identification
  • Reproducible experiment tracking
  • Automated checkpoint selection

Advanced Configuration

Multi-Fragment Training

Train on multiple fragments simultaneously:

segments = ['frag5', 's4', 'rect5']  # Training fragments
valid_id = 'frag5'                    # Hold out for validation

Variable Input Channels

Experiment with different depth ranges:

start_idx = 22      # First layer to use
in_chans = 18       # Total channels (22-39 inclusive)
valid_chans = 16    # Subset for validation (center crop)

Resolution Scaling

Different fragments may require different scaling:

frags_ratio1 = ['frag', 're']  # Scale by ratio1
frags_ratio2 = ['s4', '202']   # Scale by ratio2
ratio1 = 2  # Divide by 2
ratio2 = 1  # No scaling

Output Structure

Model Checkpoints

outputs/models/
└── SWIN_['frag5','s4']_valid=frag5_size=224_lr=2e-05_in_chans=16_norm=True_epoch=7.ckpt

Predictions

wand/run/files/media/images
    └── mask.png

Utilities

Core Functions (utils.py)

# Load volumetric data with mask
read_image_mask(fragment_id, s=22, e=38, rotate=0)

# Split data by fragment
get_train_valid_dataset(segments, valid_id)

# Initialize configuration
cfg_init(CFG, mode='train')

Common Issues and Solutions

CUDA Out of Memory

# Reduce batch size
train_batch_size = 3  # Instead of 5

# Reduce spatial resolution
size = 64  # Instead of 224

# Enable gradient checkpointing (in model code)

Large Image Files

Increase PIL limit (already done in training scripts):

import PIL.Image
PIL.Image.MAX_IMAGE_PIXELS = 933120000

Experiment Tracking

Login to Weights & Biases:

wandb login

Or disable:

wandb.init(mode='disabled')

License

[Specify license]

Acknowledgments

This repository is based on and adapted from the villa ink-detection repository, which contains the First Place Vesuvius Grand Prize solution. The original repository is part of the First Place Grand Prize Submission to the Vesuvius Challenge 2023 from Youssef Nader, Luke Farritor, and Julian Schilliger.

  • Vesuvius Challenge organizers.
  • Youssef Nader, Luke Farritor, and Julian Schilliger for their groundbreaking work.
  • AWS resources were provided by the National Infrastructures for Research and Technology GRNET and funded by the EU Recovery and Resiliency Facility.

Contact

For questions or issues, please open a GitHub issue or contact voulgarakisdion@gmail.com .


Project Status: Active Development

Last Updated: 2025

Contributors: Voulgarakis Dionysios, Pavlopoulos John

About

Deep learning models for 3d volumetric ink detection on ancient Vesuvius scroll fragments.

Topics

Resources

Stars

Watchers

Forks

Contributors