🌈 SwinVQColor: Hierarchical VQ-VAE with Swin Transformer for Image Colorization

SwinVQColor implements a Swin Transformer-based Vector Quantized Variational Autoencoder (VQ-VAE) designed for perceptually realistic image colorization in the CIE Lab color space.
Given a grayscale (L-channel) image, the model predicts the chrominance (ab) channels using learned discrete embeddings, delivering rich, vibrant, and structured colorizations.

🧠 Motivation

CNN-based colorization models often produce blurry or desaturated outputs due to regression to the mean and limited context modeling.
SwinVQColor overcomes these limitations by integrating:

Swin Transformer Encoder: Captures hierarchical, long-range context and spatial structure from the grayscale input.
VQ-VAE with EMA Codebook: Learns robust discrete latent color representations, encouraging multimodal and vivid color synthesis.
Advanced Loss Functions: Blends pixel, perceptual, vector-quantization, and color fidelity losses, ensuring sharp, visually plausible results.

🧩 Model Overview

Pipeline

flowchart LR
    L["Input L-channel (grayscale)"] --> E[("Swin Transformer<br>Encoder")]
    E --> Z["Latent Code (z_enc)"]
    Z --> QV["VQ-VAE Codebook<br>(EMA Quantization)"]
    QV --> D[("Decoder<br>(CNN/Transformer)")]
    D --> ab["Output ab channels<br>(colorized image)"]

Key Components

Module	Description
Swin Encoder	Hierarchical Vision Transformer backbone to extract multi-scale features from input L.
VQ-VAE (EMA)	Quantizes features into a discrete space using a codebook with EMA updates for stability.
Decoder	Maps quantized codes to color channels using upsampling, residual, and attention blocks.
Losses	Combines pixel, perceptual, VQ, and color consistency losses for best perceptual effect.

⚙️ Architecture Details

1️⃣ Swin Transformer Encoder

Input: L (grayscale), shape (1×H×W)
Backbone: Swin-T/S/B (configurable)
Output: Hierarchical latent features z_enc

2️⃣ Vector Quantization (VQ-VAE, EMA)

Embedding codebook for discretizing latent space, updated with Exponential Moving Average (EMA)
Produces:
- Quantized embeddings: z_q
- VQ Loss (vq_loss): codebook + commitment losses

3️⃣ Decoder

Upsamples z_q to reconstruct ab_pred
Utilizes residual and attention layers for fidelity and sharpness
Output: ab_pred (shape: 2×H×W)

🚀 Getting Started

1. Install Requirements

pip install -r requirements.txt

2. Prepare Data

Expect images in CIE Lab format or convert to obtain L (input) and ab (target).
You may use datasets like ImageNet or COCO.
For custom data, update paths in your config (see next step).

3. Training

Edit training configs in configs/swinvq_color.yaml to fit your data paths and hyperparameters.

python train.py --config configs/swinvq_color.yaml

Example Config Snippet

model:
  encoder: swin_t
  codebook_size: 512
  code_dim: 64
...
data:
  train_dir: "path/to/train/images"
  val_dir: "path/to/val/images"

🖼️ Example Results

Input (L)	Output (ab, pred)	Ground Truth (ab)

📝 References

📄 License

This project is licensed under the MIT License.
See the LICENSE file for more details.

✨ Acknowledgements

Built with inspiration from the official Swin Transformer, VQ-VAE, and pioneering colorization works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌈 SwinVQColor: Hierarchical VQ-VAE with Swin Transformer for Image Colorization

🧠 Motivation

🧩 Model Overview

Pipeline

Key Components

⚙️ Architecture Details

1️⃣ Swin Transformer Encoder

2️⃣ Vector Quantization (VQ-VAE, EMA)

3️⃣ Decoder

🚀 Getting Started

1. Install Requirements

2. Prepare Data

3. Training

Example Config Snippet

🖼️ Example Results

📝 References

📄 License

✨ Acknowledgements

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🌈 SwinVQColor: Hierarchical VQ-VAE with Swin Transformer for Image Colorization

🧠 Motivation

🧩 Model Overview

Pipeline

Key Components

⚙️ Architecture Details

1️⃣ Swin Transformer Encoder

2️⃣ Vector Quantization (VQ-VAE, EMA)

3️⃣ Decoder

🚀 Getting Started

1. Install Requirements

2. Prepare Data

3. Training

Example Config Snippet

🖼️ Example Results

📝 References

📄 License

✨ Acknowledgements