A lightweight Vision-Mamba network built using State Space Models (SSM) and a SimpleDecoder for image reconstruction and restoration tasks such as inpainting, colorization, and denoising.
The model leverages a Patch-based Encoder, MambaBlock (SSM) layers for long-range spatial reasoning, and a transposed-convolutional decoder for high-quality reconstruction.
Training integrates L1, SSIM, and VGG Perceptual losses for better structural and perceptual fidelity.
- 🧩 Mamba-based Encoder Blocks – SSM-style modeling for efficient long-range spatial dependencies.
- 🎨 Simple Transposed-Conv Decoder – Reconstructs full-resolution RGB images.
- 📏 Multi-Loss Training – Combines L1, SSIM, and Perceptual losses for better color and texture recovery.
- ⚡ Lightweight & Fast – Designed for image colorization, denoising, and inpainting tasks.
- 🧠 PyTorch Implementation – Modular, readable, and easy to extend.
┌────────────┐ ┌────────────────────┐ ┌───────────────────────┐ ┌────────────────────────────┐ ┌───────────────┐
│ Input │ --> │ Patch-based │ --> │ MambaBlock (SSM) │ --> │ Transposed-Conv Decoder │ --> │ Output │
│ Image │ │ Encoder │ │ Layers │ │ (SimpleDecoder) │ │ Image │
└────────────┘ └────────────────────┘ └───────────────────────┘ └────────────────────────────┘ └───────────────┘
─────────────────────────────────────────────────────────────────────────────────────────
↑ Losses applied during training (L1 + SSIM + VGG Perceptual)
This project is licensed under the MIT License — see the LICENSE file for details.