⚡ HSDT-Lightning

PyTorch Lightning ⚡ implementation of the HSDT model for removing dark noise from hyperspectral images (HSI) — developed for the FINCH satellite by the University of Toronto Aerospace Team.

🚀 Features

⚡ Built with PyTorch Lightning for clean, scalable research
🔁 Multi-GPU training via Distributed Data Parallel (DDP)
🌙 Metadata-driven dark-frame modelling with fixed-pattern noise estimation
📊 Expanded evaluation metrics including residual Shannon entropy
🧪 Synthetic dark-frame generator for rapid dataset bootstrapping
⚙️ Fully configurable with YAML-based CLI interface
🧠 New HSDR-ScanFusion blocks blending transformer attention with bidirectional selective scans for efficient long-context reasoning
📄 Well-documented, modular, and strongly-typed codebase

Note that my companion notes can be found in notion

Looking for deeper architectural or physics context? See the research supplement at README_research.md.

🧩 HSDR-ScanFusion Hybrid Architecture

We call the upgraded backbone HSDR-ScanFusion (Hierarchical Spectral Denoiser with Residual ScanFusion). It preserves HSDT’s transformer hierarchy while blending in the bidirectional selective state-space scanning strategy proposed in HSIDMamba (arXiv:2404.09697):

Each encoder/decoder block augments the transformer branch with a Hyperspectral Continuous Scan Block that runs lightweight Mamba-style 1D selective scans over forward/backward spatial directions.
The bidirectional scan injects long-range spatial-spectral context while keeping near-linear complexity, then feeds a guided spectral self-attention head to recover fine detail.
Depthwise spatial convolutions provide the scale-preserving residual paths described in the paper, while channel mixing still benefits from the original transformer FFN.
Toggle or ablate the new path by setting model.variant ("scanfusion" for the hybrid block, "baseline" to recover the legacy HSDT stack). Additional knobs—scan_kernel, scan_hidden_scale, and scan_dropout—expose the state-space hyperparameters straight from the YAML configs.

Quick Hyperparameter Reference

Config key	What it does	Default
`model.variant`	`"scanfusion"` uses the hybrid Selective-Scan + Transformer block; `"baseline"` reverts to vanilla HSDT	`"scanfusion"`
`model.scan_kernel`	1D kernel size for the selective scan along flattened spatial sequences	`5`
`model.scan_hidden_scale`	Width multiplier for the hidden state inside the scan module	`2.0`
`model.scan_dropout`	Dropout applied after the scan mixing	`0.05`

Switching the variant is enough to benchmark transformer-only vs. hybrid behaviour—no code edits required.

🌙 Dark-Frame Modelling Overview

The project now couples the hyperspectral denoiser with a learnable dark-frame generator:

Darkframe/dark_operator.py learns spectral–spatial bases conditioned on acquisition metadata (temperature, gain, integration time) and stores a fixed-pattern noise (FPN) buffer. After training, the operator replays the dataset to estimate the residual FPN map automatically, ensuring sensor-specific hot pixels and bias structure are captured.
Darkframe/operator_wrapper.py subtracts the predicted dark map before forwarding to the HSDT network and logs the running FPN RMS so drift can be monitored during finetuning.
Darkframe/train_data_dark.py / Darkframe/test_data_dark.py train or evaluate the operator on recorded dark cubes. Evaluation reports mean-squared error and residual entropy (in bits), giving quick feedback on how random the remaining bias appears after subtraction.
DataSetGenerator/generate_dark_training_set.py can synthesize dark cubes with randomised, realistic sensor parameters; optionally it adds those dark frames onto clean datacubes to create paired “dirty” captures with matching metadata.

When adapting to a new sensor, load the existing operator checkpoint, call estimate_fixed_pattern on a short dark run, and the fixed-pattern map will be updated in place without retraining.

📦 Installing Dependencies

Create a virtual environment, and run

pip install -r requirements.txt

🗂️ Data Preparation

Create a data folder.
Create raw and test inside the data folder.
Download your hyperspectral images to train on into the raw folder.
Download your hyperspectral images to train on into the test folder.
(Optional to 6) Run python -m preprocess.main.
(Optional to 5) Ensure data: preprocess_data is set to true inside the yaml config file

⚙️ Running the code

🔧 Training

python main.py fit --config config/train.yaml

🔧 Training from a checkpoint

python main.py fit --config config/train.yaml --ckpt_path checkpoint/hsdt-epoch10.ckpt

🔧 Running a smoke test

Run this when you want to see if your code is running or not.

python main.py fit --config config/train_local.yaml --trainer.profiler=null --trainer.fast_dev_run=True

🔧 Best batch finder

python main.py fit --config config/train.yaml --run_batch_size_finder true --batch_size_finder_mode power

🔧 Best learning rate finder

python main.py fit --config config/train.yaml --run_lr_finder true --show_lr_plot true

✅ Validation

python main.py validate --config config/train.yaml

🧪 Testing

python main.py test --config config/train.yaml

🔮 Predict

python main.py predict --config config/train.yaml

🆘 For help text

python main.py --help

Note that all the individual commands also have --help

Reading Logs

tensorboard --logdir logs/hsdt_lightning

🧠 Training HSDR-ScanFusion on Dark-Frame Simulations

Organize your data so each sample has raw.npy (clean + dark) and clean.npy:

hsdt-lightning/data/train/sample_0000/raw.npy
hsdt-lightning/data/train/sample_0000/clean.npy
hsdt-lightning/data/test/sample_0100/raw.npy
...

Set gaussian_noises: [] and list any specific folders in the YAML configs if needed.

Run the hybrid training:

cd hsdt-lightning
python main.py fit --config config/train.yaml

Optional: Smoke test before the full run

python main.py fit --config config/train_local.yaml --trainer.fast_dev_run true

🔁 Data & Operator Workflow

Use these standalone steps (the old train_pipeline.py has been removed):

Generate synthetic dark cubes (optional if you already have real captures):

python DataSetGenerator/generate_dark_training_set.py \
  --count 200 \
  --height 256 \
  --width 256 \
  --frames 81 \
  --seed 2024 \
  --output hsdt-lightning/dark_data

Add --clean-dir /path/to/clean_cubes to produce paired dirty measurements.

Train or refresh the physics-driven dark operator:

 python Darkframe/train_data_dark.py \
   --root hsdt-lightning/dark_data \
   --patch-size 64 \
   --stride 64 \
   --epochs 20 \
   --lr 1e-3

This outputs best_dark_operator.pt and reports residual entropy to sanity-check realism.

(Optional) Joint fine-tuning: load the checkpoint into OperatorWrapper and run a Lightning fit loop (see Darkframe/hsdt_darkframe_train.py for a template) so the denoiser trains on dark-corrected inputs.

🛠️ Generating new config file

If you change model or data module parameters, regenerate the config file:

python main.py fit --print_config > configs/default.yaml

Use an existing config (e.g. config/train.yaml) as a template to fill in values.

🧾 Project Structure

Overview of the project structure:


├── main.py               # Entry point using LightningCLI
├── model.py              # LightningModule for HSDT model
├── data_module.py        # LightningDataModule with transforms
├── dataset.py            # HSI dataset & patching logic
│
├── config/               # YAML configuration files
├── hsdt/                 # HSDT architecture (`hsdt/arch.py`)
├── metrics/              # Metrics: SSIM (`ssim.py`), PSNR (`psnr.py`)
├── Darkframe/            # Dark-operator training + Lightning wrapper
├── DataSetGenerator/     # Synthetic dark-frame + dirty datacube generators
├── preprocess/           # Preprocessing scripts (`main.py` is entry point)
│
├── data/                 # Input images for training/testing
│   ├── raw/              # Raw training data
│   └── test/             # Testing data
│
├── logs/                 # Lightning logs
│
└── checkpoints/
    ├── best/             # Best-performing checkpoints (highest PSNR/SSIM)
    └── interval/         # Checkpoints saved every 5 epochs

🧰 Technologies Used

Pytorch Lightning - Training Loop Abstraction
Pytorch - Deep learning framework
NumPy & SciPy - Simulation and IO tooling for synthetic dark datasets
Scikit Image - Image quality metrics (SSIM & PSNR)
Scipy - For loading/saving .mat files

For complete list, consult the requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Darkframe		Darkframe
DataSetGenerator		DataSetGenerator
hsdt-lightning		hsdt-lightning
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_research.md		README_research.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ HSDT-Lightning

🚀 Features

🧩 HSDR-ScanFusion Hybrid Architecture

Quick Hyperparameter Reference

🌙 Dark-Frame Modelling Overview

📦 Installing Dependencies

🗂️ Data Preparation

⚙️ Running the code

🔧 Training

🔧 Training from a checkpoint

🔧 Running a smoke test

🔧 Best batch finder

🔧 Best learning rate finder

✅ Validation

🧪 Testing

🔮 Predict

🆘 For help text

Reading Logs

🧠 Training HSDR-ScanFusion on Dark-Frame Simulations

🔁 Data & Operator Workflow

🛠️ Generating new config file

🧾 Project Structure

🧰 Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

⚡ HSDT-Lightning

🚀 Features

🧩 HSDR-ScanFusion Hybrid Architecture

Quick Hyperparameter Reference

🌙 Dark-Frame Modelling Overview

📦 Installing Dependencies

🗂️ Data Preparation

⚙️ Running the code

🔧 Training

🔧 Training from a checkpoint

🔧 Running a smoke test

🔧 Best batch finder

🔧 Best learning rate finder

✅ Validation

🧪 Testing

🔮 Predict

🆘 For help text

Reading Logs

🧠 Training HSDR-ScanFusion on Dark-Frame Simulations

🔁 Data & Operator Workflow

🛠️ Generating new config file

🧾 Project Structure

🧰 Technologies Used

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages