A PyTorch implementation of Stable Diffusion from scratch, inspired by various open-source implementations and designed to be flexible and easy to use.
- Introduction
- Project Structure
- Key Components
- Implementation Details
- Usage
- Training
- Inference
- Contributing
- License
ImageWeaver is a PyTorch implementation of Stable Diffusion, a powerful generative model that combines diffusion-based image synthesis with a text encoder. This project aims to provide a clear, well-documented implementation suitable for both research and practical applications.
ImageWeaver implements the diffusion process, gradually adding noise to an image until it becomes completely random. This process is reversible, allowing us to start with noise and progressively refine it into an image.
The denoising process removes noise from an image until it becomes recognizable. This is achieved through a series of forward diffusion steps and reverse diffusion steps.
ImageWeaver uses a U-Net as the primary neural network architecture. U-Nets are particularly well-suited for image-to-image translation tasks and image denoising.
We employ both regular attention and cross-attention mechanisms. Regular attention allows ImageWeaver to focus on different parts of the input image, while cross-attention enables the model to combine information from both the image and text prompt.
Our text encoder is responsible for encoding text prompts into embeddings that can be used by ImageWeaver. It's implemented using a Transformer architecture.
ImageWeaver utilizes rotary positional encodings to handle long sequences effectively, improving the model's ability to capture position information.
Layer normalization is used throughout ImageWeaver to stabilize training and improve performance.
Our implementation includes various sampling methods, including DDIM (Denoising Diffusion Implicit Model) and PLMS (Probabilistic Latent Variable Models).
Training ImageWeaver involves optimizing the loss function that measures the difference between the predicted noise and the actual noise added during the forward diffusion process. Our implementation supports training on custom datasets.
Once trained, ImageWeaver can be used for various tasks:
- Image Generation from Text Prompts
- Image-to-Image Translation
- Fine-tuning on Custom Datasets
Example scripts for these tasks are provided in the examples
directory.
Contributions are welcome! Please see our CONTRIBUTING.md file for guidelines on how to contribute to ImageWeaver.
ImageWeaver is licensed under the MIT License - see the LICENSE file for details.
Special thanks to the following repositories for inspiration and reference:
- https://github.com/CompVis/stable-diffusion/
- https://github.com/divamgupta/stable-diffusion-tensorflow
- https://github.com/kjsman/stable-diffusion-pytorch
- https://github.com/huggingface/diffusers/
To install required dependencies: pip install torch torchvision numpy matplotlib
Download the necessary files from Hugging Face:
vocab.json
andmerges.txt
from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/tokenizerv1-5-pruned-emaonly.ckpt
from https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main
Place these files in the data
folder of your ImageWeaver project.
ImageWeaver has been tested with several fine-tuned models:
- InkPunk Diffusion: https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main
- Illustration Diffusion (Hollie Mengert): https://huggingface.co/ogkalu/Illustration-Diffusion/tree/main
Simply download the .ckpt
file from any fine-tuned SD model (up to v1.5) and place it in the data
folder.
Thank you for exploring ImageWeaver! Feel free to experiment and push the boundaries of what's possible with this powerful generative model.