Skip to content

navaneet625/vision_mamba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐍 Vision-Mamba: Lightweight SSM-based Vision Autoencoder

A lightweight Vision-Mamba network built using State Space Models (SSM) and a SimpleDecoder for image reconstruction and restoration tasks such as inpainting, colorization, and denoising.

The model leverages a Patch-based Encoder, MambaBlock (SSM) layers for long-range spatial reasoning, and a transposed-convolutional decoder for high-quality reconstruction.
Training integrates L1, SSIM, and VGG Perceptual losses for better structural and perceptual fidelity.


🚀 Features

  • 🧩 Mamba-based Encoder Blocks – SSM-style modeling for efficient long-range spatial dependencies.
  • 🎨 Simple Transposed-Conv Decoder – Reconstructs full-resolution RGB images.
  • 📏 Multi-Loss Training – Combines L1, SSIM, and Perceptual losses for better color and texture recovery.
  • Lightweight & Fast – Designed for image colorization, denoising, and inpainting tasks.
  • 🧠 PyTorch Implementation – Modular, readable, and easy to extend.

🖼️ Pipeline Overview

┌────────────┐     ┌────────────────────┐     ┌───────────────────────┐     ┌────────────────────────────┐     ┌───────────────┐
│ Input      │ --> │ Patch-based        │ --> │ MambaBlock (SSM)      │ --> │ Transposed-Conv Decoder    │ --> │ Output        │
│ Image      │     │ Encoder            │     │ Layers                │     │ (SimpleDecoder)            │     │ Image         │
└────────────┘     └────────────────────┘     └───────────────────────┘     └────────────────────────────┘     └───────────────┘

        ─────────────────────────────────────────────────────────────────────────────────────────
                              ↑ Losses applied during training (L1 + SSIM + VGG Perceptual)

🧩 License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Implementation of a Vision-Mamba network, integrating State Space Models (SSM) with a patch-based encoder–decoder for image inpainting, colorization, and denoising. Trained with L1, SSIM, and VGG perceptual losses to preserve both structure and perceptual realism.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors