Distributed GAN training with students as workers

An educational distributed deep learning system where students become part of a compute cluster to train a GAN (Generative Adversarial Network) to generate images.

Concept

This project demonstrates distributed machine learning by:

Using students' computers as a distributed compute cluster
Coordinating training through a PostgreSQL database (no complex networking!)
Training a DCGAN to generate realistic images
Teaching distributed systems, parallel training, and GANs simultaneously

Architecture

Main process (instructor/admin):

Creates work units (batches of image indices)
Aggregates gradients from workers
Applies optimizer steps
Tracks training progress

Worker process (students/workers):

Polls database for available work
Computes gradients on assigned image batches
Uploads gradients back to database
Runs continuously until training completes

PostgreSQL database:

Stores model weights, gradients, work units
Acts as communication hub (no port forwarding needed!)
Tracks worker statistics for monitoring
Note: instructor/admin needs to set-up student-accessible SQL database

Documentation

Full documentation

Quick links:

Getting Started - Introduction and concepts
Installation Guide - Choose your setup path
Student Guide - How to participate as a worker
Instructor Guide - Running the coordinator
Configuration Reference - All config options
Architecture - System design details
FAQ - Frequently asked questions

Quick start

Choose your installation path:

Setup Path	Best For	GPU Required	Documentation
Dev Container †	Full development environment	Optional	Setup guide
Native Python	Direct local control	Optional	Setup guide
Conda	Conda users	Optional	Setup guide
Google Colab	Zero installation, free GPU	No (provided)	Setup guide
Local Training	Single GPU, no database	Optional	Setup guide

† Recommended configuration

New to the project? Start with the Getting Started Guide.
For students: See the Student Guide for how to participate as a worker.
For instructors: See the Instructor Guide for running the coordinator and managing training.

Features

Database-coordinated training: No complex networking, works across firewalls
Fault tolerant: Workers can disconnect/reconnect, automatic work reassignment
Flexible hardware: CPU and GPU workers can participate together
Educational: Learn distributed systems, GANs, and parallel training

What students learn

Distributed systems: Coordination, fault tolerance, atomic operations
Deep learning: GAN training, gradient aggregation, data parallelism
Practical skills: PostgreSQL, PyTorch, collaborative computing

Contributing

This is an educational project! Contributions welcome:

Bug fixes and improvements
Additional GAN architectures
Gradient compression techniques

See the Contributing Guide for more details.

License

MIT License - See LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.devcontainer		.devcontainer
.github		.github
.streamlit		.streamlit
docs		docs
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml.template		config.yaml.template
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed GAN training with students as workers

Concept

Architecture

Documentation

Quick start

Features

What students learn

Contributing

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed GAN training with students as workers

Concept

Architecture

Documentation

Quick start

Features

What students learn

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages