Pixel Prediction Transformer

This project implements an autoregressive transformer model for generating MNIST-like handwritten digits. The model learns to predict and generate images one patch at a time, using a transformer architecture. The model is trained on the MNIST dataset and learns to generate new handwritten digits that follow similar patterns.

Project Structure

model.py: Contains the transformer model architecture
data.py: Handles data loading and preprocessing
train.py: Main training script
utils.py: Utility functions for visualization and processing
config.py: Configuration parameters for the model and training
build_codebook.py: Script for building the image token codebook
generated_images/: Directory containing generated images
data/: Directory containing training data
Model checkpoints:
- best_pixel_transformer.pth: Best performing model checkpoint
- final_pixel_transformer.pth: Final model checkpoint
- pixel_transformer.pth: Latest model checkpoint
codebook.pkl: Pre-computed codebook for image tokenization

Features

Image tokenization using patch-based approach with K-means clustering for codebook generation
Autoregressive transformer architecture
Training on MNIST handwritten digits dataset
Generation of new MNIST-like handwritten digits
Configurable model parameters
Progress tracking and visualization

Setup

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Build the codebook (if not using pre-computed one):

python build_codebook.py

Run training:

python train.py

How it Works

The model works by:

Breaking down MNIST images into patches
Converting patches into tokens using a K-means clustering based codebook
Using a transformer to predict the next patch in the sequence
Generating new MNIST-like digits autoregressively

Requirements

Python 3.8+
PyTorch 2.0.0+
torchvision 0.15.0+
numpy 1.21.0+
matplotlib 3.4.0+
Pillow 8.0.0+
tqdm 4.65.0+
einops 0.6.0+

See requirements.txt for full list of dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pixel Prediction Transformer

Project Structure

Features

Setup

How it Works

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
build_codebook.py		build_codebook.py
config.py		config.py
data.py		data.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Pixel Prediction Transformer

Project Structure

Features

Setup

How it Works

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages