Skip to content

utat-ss/FINCH-DarkNoise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡ HSDT-Lightning

PyTorch Lightning ⚡ implementation of the HSDT model for removing dark noise from hyperspectral images (HSI) — developed for the FINCH satellite by the University of Toronto Aerospace Team.

🚀 Features

  • ⚡ Built with PyTorch Lightning for clean, scalable research
  • 🔁 Multi-GPU training via Distributed Data Parallel (DDP)
  • 🌙 Metadata-driven dark-frame modelling with fixed-pattern noise estimation
  • 📊 Expanded evaluation metrics including residual Shannon entropy
  • 🧪 Synthetic dark-frame generator for rapid dataset bootstrapping
  • ⚙️ Fully configurable with YAML-based CLI interface
  • 🧠 New HSDR-ScanFusion blocks blending transformer attention with bidirectional selective scans for efficient long-context reasoning
  • 📄 Well-documented, modular, and strongly-typed codebase

Note that my companion notes can be found in notion

Looking for deeper architectural or physics context? See the research supplement at README_research.md.

🧩 HSDR-ScanFusion Hybrid Architecture

We call the upgraded backbone HSDR-ScanFusion (Hierarchical Spectral Denoiser with Residual ScanFusion). It preserves HSDT’s transformer hierarchy while blending in the bidirectional selective state-space scanning strategy proposed in HSIDMamba (arXiv:2404.09697):

  • Each encoder/decoder block augments the transformer branch with a Hyperspectral Continuous Scan Block that runs lightweight Mamba-style 1D selective scans over forward/backward spatial directions.
  • The bidirectional scan injects long-range spatial-spectral context while keeping near-linear complexity, then feeds a guided spectral self-attention head to recover fine detail.
  • Depthwise spatial convolutions provide the scale-preserving residual paths described in the paper, while channel mixing still benefits from the original transformer FFN.
  • Toggle or ablate the new path by setting model.variant ("scanfusion" for the hybrid block, "baseline" to recover the legacy HSDT stack). Additional knobs—scan_kernel, scan_hidden_scale, and scan_dropout—expose the state-space hyperparameters straight from the YAML configs.

Quick Hyperparameter Reference

Config key What it does Default
model.variant "scanfusion" uses the hybrid Selective-Scan + Transformer block; "baseline" reverts to vanilla HSDT "scanfusion"
model.scan_kernel 1D kernel size for the selective scan along flattened spatial sequences 5
model.scan_hidden_scale Width multiplier for the hidden state inside the scan module 2.0
model.scan_dropout Dropout applied after the scan mixing 0.05

Switching the variant is enough to benchmark transformer-only vs. hybrid behaviour—no code edits required.

🌙 Dark-Frame Modelling Overview

The project now couples the hyperspectral denoiser with a learnable dark-frame generator:

  • Darkframe/dark_operator.py learns spectral–spatial bases conditioned on acquisition metadata (temperature, gain, integration time) and stores a fixed-pattern noise (FPN) buffer. After training, the operator replays the dataset to estimate the residual FPN map automatically, ensuring sensor-specific hot pixels and bias structure are captured.
  • Darkframe/operator_wrapper.py subtracts the predicted dark map before forwarding to the HSDT network and logs the running FPN RMS so drift can be monitored during finetuning.
  • Darkframe/train_data_dark.py / Darkframe/test_data_dark.py train or evaluate the operator on recorded dark cubes. Evaluation reports mean-squared error and residual entropy (in bits), giving quick feedback on how random the remaining bias appears after subtraction.
  • DataSetGenerator/generate_dark_training_set.py can synthesize dark cubes with randomised, realistic sensor parameters; optionally it adds those dark frames onto clean datacubes to create paired “dirty” captures with matching metadata.

When adapting to a new sensor, load the existing operator checkpoint, call estimate_fixed_pattern on a short dark run, and the fixed-pattern map will be updated in place without retraining.

📦 Installing Dependencies

Create a virtual environment, and run

pip install -r requirements.txt

🗂️ Data Preparation

  1. Create a data folder.
  2. Create raw and test inside the data folder.
  3. Download your hyperspectral images to train on into the raw folder.
  4. Download your hyperspectral images to train on into the test folder.
  5. (Optional to 6) Run python -m preprocess.main.
  6. (Optional to 5) Ensure data: preprocess_data is set to true inside the yaml config file

⚙️ Running the code

🔧 Training

python main.py fit --config config/train.yaml

🔧 Training from a checkpoint

python main.py fit --config config/train.yaml --ckpt_path checkpoint/hsdt-epoch10.ckpt

🔧 Running a smoke test

Run this when you want to see if your code is running or not.

python main.py fit --config config/train_local.yaml --trainer.profiler=null --trainer.fast_dev_run=True

🔧 Best batch finder

python main.py fit --config config/train.yaml --run_batch_size_finder true --batch_size_finder_mode power

🔧 Best learning rate finder

python main.py fit --config config/train.yaml --run_lr_finder true --show_lr_plot true

✅ Validation

python main.py validate --config config/train.yaml

🧪 Testing

python main.py test --config config/train.yaml

🔮 Predict

python main.py predict --config config/train.yaml

🆘 For help text

python main.py --help

Note that all the individual commands also have --help

Reading Logs

tensorboard --logdir logs/hsdt_lightning

🧠 Training HSDR-ScanFusion on Dark-Frame Simulations

  1. Organize your data so each sample has raw.npy (clean + dark) and clean.npy:
    hsdt-lightning/data/train/sample_0000/raw.npy
    hsdt-lightning/data/train/sample_0000/clean.npy
    hsdt-lightning/data/test/sample_0100/raw.npy
    ...
    
  2. Set gaussian_noises: [] and list any specific folders in the YAML configs if needed.
  3. Run the hybrid training:
    cd hsdt-lightning
    python main.py fit --config config/train.yaml
  4. Optional: Smoke test before the full run
    python main.py fit --config config/train_local.yaml --trainer.fast_dev_run true

🔁 Data & Operator Workflow

Use these standalone steps (the old train_pipeline.py has been removed):

  1. Generate synthetic dark cubes (optional if you already have real captures):

    python DataSetGenerator/generate_dark_training_set.py \
      --count 200 \
      --height 256 \
      --width 256 \
      --frames 81 \
      --seed 2024 \
      --output hsdt-lightning/dark_data

    Add --clean-dir /path/to/clean_cubes to produce paired dirty measurements.

  2. Train or refresh the physics-driven dark operator:

     python Darkframe/train_data_dark.py \
       --root hsdt-lightning/dark_data \
       --patch-size 64 \
       --stride 64 \
       --epochs 20 \
       --lr 1e-3

    This outputs best_dark_operator.pt and reports residual entropy to sanity-check realism.

  3. (Optional) Joint fine-tuning: load the checkpoint into OperatorWrapper and run a Lightning fit loop (see Darkframe/hsdt_darkframe_train.py for a template) so the denoiser trains on dark-corrected inputs.

🛠️ Generating new config file

If you change model or data module parameters, regenerate the config file:

python main.py fit --print_config > configs/default.yaml

Use an existing config (e.g. config/train.yaml) as a template to fill in values.

🧾 Project Structure

Overview of the project structure:


├── main.py               # Entry point using LightningCLI
├── model.py              # LightningModule for HSDT model
├── data_module.py        # LightningDataModule with transforms
├── dataset.py            # HSI dataset & patching logic
│
├── config/               # YAML configuration files
├── hsdt/                 # HSDT architecture (`hsdt/arch.py`)
├── metrics/              # Metrics: SSIM (`ssim.py`), PSNR (`psnr.py`)
├── Darkframe/            # Dark-operator training + Lightning wrapper
├── DataSetGenerator/     # Synthetic dark-frame + dirty datacube generators
├── preprocess/           # Preprocessing scripts (`main.py` is entry point)
│
├── data/                 # Input images for training/testing
│   ├── raw/              # Raw training data
│   └── test/             # Testing data
│
├── logs/                 # Lightning logs
│
└── checkpoints/
    ├── best/             # Best-performing checkpoints (highest PSNR/SSIM)
    └── interval/         # Checkpoints saved every 5 epochs

🧰 Technologies Used

For complete list, consult the requirements.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors