Skip to content

maxboels/SWAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

107 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

SWAG: Surgical Workflow Anticipative Generation

Paper Project Page

Long-term Surgical Workflow Prediction with Generative-Based Anticipation

Maxence BoelsΒΉ Β· Yang LiuΒΉ Β· Prokar DasguptaΒΉ Β· Alejandro GranadosΒΉ Β· Sebastien OurselinΒΉ

ΒΉSurgical and Interventional Engineering, School of Biomedical Engineering and Imaging Sciences, King's College London


πŸ“‹ Abstract

SWAG is a unified encoder-decoder framework for surgical phase recognition and long-term anticipation that addresses a critical gap in intraoperative decision support. While existing approaches excel at recognizing current surgical phases, they provide limited foresight into future procedural steps. SWAG combines phase recognition and anticipation using a generative approach, predicting sequences of future surgical phases at minute intervals over horizons up to 60 minutes.

πŸ“š Documentation

For contribution guidelines and community standards, see the docs/ directory.

🎯 Key Features

  • Unified Recognition and Anticipation: Jointly addresses surgical phase recognition and long-term workflow prediction
  • Dual Generative Approaches: Implements both single-pass (SP) and autoregressive (AR) decoding methods
  • Prior Knowledge Embedding: Novel embedding approach using class transition probabilities (SP*)
  • Regression-to-Classification (R2C): Framework for converting remaining time predictions to discrete phase sequences
  • Long-horizon Predictions: Extends anticipation from typical 5-minute limits to 20-60 minute horizons
  • Multi-dataset Validation: Evaluated on Cholec80 and AutoLaparo21 datasets

πŸ—οΈ Architecture Overview

Key Components:

  • Vision Encoder: Fine-tuned ViT with AVT approach for spatial-temporal features
  • WSA: Sliding window self-attention (W=20, L=1440 frames)
  • Compression: Global key-pooling (SP) and interval-pooling (AR)
  • Decoders:
    • SP: Single-pass transformer decoder with parallel prediction
    • AR: GPT-2-based autoregressive generation
    • SP*: Enhanced with prior knowledge embeddings

πŸ“ Project Structure

SWAG-surgical-workflow-anticipative-generation/
β”œβ”€β”€ train_net.py                # Main training entry point
β”œβ”€β”€ env.yaml                    # Conda environment specification
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ LICENSE                     # MIT License
β”‚
β”œβ”€β”€ conf/                       # Hydra configuration files
β”‚   β”œβ”€β”€ config.yaml            # Main config with all parameters
β”‚   β”œβ”€β”€ data/                  # Dataset-specific configs
β”‚   β”œβ”€β”€ model/                 # Model architecture configs
β”‚   β”œβ”€β”€ opt/                   # Optimizer configs
β”‚   └── train_eval_op/         # Training operation configs
β”‚
β”œβ”€β”€ src/                        # Source code (all Python modules)
β”‚   β”œβ”€β”€ models/                # Model architectures
β”‚   β”‚   β”œβ”€β”€ supra.py          # SWAG-SP/SP* implementation
β”‚   β”‚   β”œβ”€β”€ lstm.py           # LSTM-based AR model
β”‚   β”‚   β”œβ”€β”€ transformers.py   # Transformer decoder variants
β”‚   β”‚   └── base_model.py     # Base model class
β”‚   β”‚
β”‚   β”œβ”€β”€ datasets/              # Dataset loaders
β”‚   β”‚   β”œβ”€β”€ cholec80/         # Cholec80 dataset utilities
β”‚   β”‚   β”œβ”€β”€ autolaparo21/     # AutoLaparo21 dataset utilities
β”‚   β”‚   └── base_video_dataset.py # Base video dataset class
β”‚   β”‚
β”‚   β”œβ”€β”€ func/                  # Training and evaluation functions
β”‚   β”‚   β”œβ”€β”€ train.py          # Main training loop
β”‚   β”‚   └── train_eval_ops.py # Training operations
β”‚   β”‚
β”‚   β”œβ”€β”€ loss_fn/               # Loss function implementations
β”‚   β”‚   β”œβ”€β”€ multidim_xentropy.py   # Multi-dimensional cross-entropy
β”‚   β”‚   β”œβ”€β”€ remaining_time_loss.py # Remaining time regression loss
β”‚   β”‚   β”œβ”€β”€ mse.py            # Mean squared error
β”‚   β”‚   └── mae.py            # Mean absolute error
β”‚   β”‚
β”‚   └── common/                # Common utilities
β”‚       β”œβ”€β”€ utils.py          # General utilities
β”‚       β”œβ”€β”€ transforms.py     # Data transformations
β”‚       β”œβ”€β”€ sampler.py        # Data samplers
β”‚       └── scheduler.py      # Learning rate schedulers
β”‚
β”œβ”€β”€ scripts/                    # Execution scripts
β”‚   β”œβ”€β”€ launch.py              # Experiment launcher
β”‚   β”œβ”€β”€ run_experiments.sh     # Batch experiment runner
β”‚   └── runai.sh               # Cluster deployment script
β”‚
β”œβ”€β”€ experiments/                # Experiment tracking
β”‚   β”œβ”€β”€ configs/               # Experiment configuration files (formerly expts/)
β”‚   β”œβ”€β”€ top_runs*.json         # Best experiment results
β”‚   └── run_*.txt              # Experiment logs
β”‚
β”œβ”€β”€ docs/                       # Documentation
β”‚   β”œβ”€β”€ README files and guides
β”‚   β”œβ”€β”€ CODE_OF_CONDUCT.md     # Code of conduct
β”‚   β”œβ”€β”€ CONTRIBUTING.md        # Contribution guidelines
β”‚   └── assets/                # Media files (GIFs, images)
β”‚
β”œβ”€β”€ baselines/                  # Baseline implementations
β”‚   β”œβ”€β”€ R2A2/                  # R2A2 baseline
β”‚   └── Informer2020/          # Informer baseline
β”‚
└── OUTPUTS/                    # Training outputs (gitignored)
    └── expts/                 # Experiment outputs
        └── {experiment_name}/ # Individual experiment results
            β”œβ”€β”€ checkpoints/   # Model checkpoints
            β”œβ”€β”€ logs/          # TensorBoard logs
            └── plots/         # Evaluation plots

πŸ“Š Results

Phase Anticipation Performance

Method Cholec80 F1 (%) AutoLaparo21 F1 (%) Cholec80 SegF1 (%) AutoLaparo21 SegF1 (%)
SP* 32.1 41.3 29.8 34.8
R2C 36.1 32.9 32.5 29.2
AR 27.8 29.3 25.0 23.3

Remaining Time Regression (Cholec80)

Horizon wMAE (min) inMAE (min) outMAE (min)
2-min 0.32 0.54 0.09
3-min 0.48 0.77 0.17
5-min 0.80 1.26 0.34

Outperforms IIA-Net and Bayesian baselines without requiring additional instrument annotations

πŸ”§ Installation

Prerequisites

  • Python 3.7+
  • CUDA 11.0+ (for GPU support)
  • Conda (recommended)

Setup

# Clone repository
git clone https://github.com/maxboels/SWAG-surgical-workflow-anticipative-generation.git
cd SWAG-surgical-workflow-anticipative-generation

# Create environment from yaml file
conda env create -f env.yaml
conda activate avt

# The environment includes all necessary dependencies:
# - PyTorch with CUDA support
# - Hydra for configuration management
# - timm for vision transformers
# - faiss-cpu for efficient similarity search
# - and other required packages

Dataset Preparation

Download and prepare the datasets:

  1. Cholec80: Download from CAMMA
  2. AutoLaparo21: Download from AutoLaparo

Extract videos and annotations to:

datasets/
β”œβ”€β”€ cholec80/
β”‚   β”œβ”€β”€ videos/
β”‚   └── annotations/
└── autolaparo21/
    β”œβ”€β”€ videos/
    └── annotations/

πŸ“¦ Datasets

The model is evaluated on two publicly available datasets:

  • Cholec80: 80 cholecystectomy videos with 7 surgical phases

    • Split: 32 train / 8 val / 40 test (4-fold cross-validation available)
    • Average duration: 38 minutes
    • Sampled at 1 fps
  • AutoLaparo21: 21 laparoscopic hysterectomy videos

    • Split: 10 train / 4 val / 7 test
    • Average duration: 66 minutes
    • Sampled at 1 fps

Both datasets use 7 surgical phases + end-of-surgery (EOS) class for anticipation.

Dataset Organization:

datasets/
β”œβ”€β”€ cholec80/
β”‚   β”œβ”€β”€ videos/          # Video files or extracted frames
β”‚   β”œβ”€β”€ labels/          # Phase annotations
β”‚   └── splits/          # Train/val/test splits
└── autolaparo21/
    β”œβ”€β”€ videos/
    β”œβ”€β”€ labels/
    └── splits/

πŸš€ Usage

Training

The project uses Hydra for configuration management. All configurations are in conf/config.yaml.

Train SWAG-SP* (Single-Pass with Prior Knowledge)

# Train on Cholec80
python train_net.py \
    dataset_name=cholec80 \
    model_name=supra \
    conditional_probs_embeddings=true \
    eval_horizons=[30] \
    num_epochs=40

# Train on AutoLaparo21
python train_net.py \
    dataset_name=autolaparo21 \
    model_name=supra \
    conditional_probs_embeddings=true \
    eval_horizons=[30] \
    num_epochs=40

Train SWAG-AR (Autoregressive)

python train_net.py \
    dataset_name=cholec80 \
    model_name=lstm \
    decoder_type=autoregressive \
    eval_horizons=[30]

Train R2C (Regression-to-Classification)

python train_net.py \
    dataset_name=cholec80 \
    model_name=supra \
    decoder_anticipation=regression \
    probs_to_regression_method=first_occurrence

Batch Experiments with Launch Script

For running multiple experiments or hyperparameter sweeps:

# Create an experiment config file in experiments/configs/
# e.g., experiments/configs/my_experiment.txt with Hydra overrides

# Run locally
python scripts/launch.py -c experiments/configs/my_experiment.txt -l

# Run on cluster (SLURM)
python scripts/launch.py -c experiments/configs/my_experiment.txt -p gpu_partition

# Debug mode (single GPU)
python scripts/launch.py -c experiments/configs/my_experiment.txt -l -g

Evaluation

Evaluation metrics are computed during training and logged to TensorBoard:

# View training progress
tensorboard --logdir OUTPUTS/expts/YOUR_EXPERIMENT/local/logs/

Evaluate Saved Checkpoints

# Test mode uses the best checkpoint
python train_net.py \
    dataset_name=cholec80 \
    model_name=supra \
    test_only=true \
    finetune_ckpt=best

Configuration

Key configuration parameters in conf/config.yaml:

  • dataset_name: cholec80 or autolaparo21
  • model_name: supra (SP/SP*), lstm (AR), naive1, naive2
  • eval_horizons: List of anticipation horizons in minutes (e.g., [30])
  • conditional_probs_embeddings: Enable prior knowledge (SP*)
  • num_epochs: Training epochs
  • split_idx: For k-fold cross-validation (1-4)

See conf/config.yaml for all available options.

πŸ“ Citation

If you use this work in your research, please cite:

@article{boels2025swag,
  title={SWAG: long-term surgical workflow prediction with generative-based anticipation},
  author={Boels, Maxence and Liu, Yang and Dasgupta, Prokar and Granados, Alejandro and Ourselin, Sebastien},
  journal={International Journal of Computer Assisted Radiology and Surgery},
  year={2025},
  publisher={Springer},
  doi={10.1007/s11548-025-03452-8}
}

πŸ”¬ Related Work

This work builds upon and compares with several state-of-the-art methods:

  • Trans-SVNet: Transformer-based surgical workflow analysis
  • SKiT: Fast key information video transformer for surgical phase recognition
  • LoViT: Long Video Transformer for surgical phase recognition
  • IIA-Net: Instrument interaction anticipation network
  • Action Anticipation: Builds on concepts from video action anticipation literature

Our R2A2 baseline implementation is included in the R2A2/ directory.

πŸ“š Additional Resources

  • Project Page - Visualizations and supplementary materials
  • Paper - Full technical details
  • See CLEANUP_RECOMMENDATIONS.md for codebase maintenance guidelines

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Acknowledgments

  • Datasets: Cholec80 (Strasbourg University) and AutoLaparo21
  • Vision Transformer implementation based on timm library
  • Transformer architectures adapted from PyTorch

πŸ“§ Contact

For questions or collaboration inquiries:

About

"SWAG: Long-term Surgical Workflow Prediction with Generative-Based Anticipation", IJCARS 2025.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages