This script allows you to train a Stable Diffusion model locally using your own dataset. It's based on the official Hugging Face Diffusers implementation.
- Python 3.7+
- CUDA-capable GPU (recommended)
- At least 16GB of RAM
- Sufficient disk space for your dataset and model checkpoints
- Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the required packages:
pip install -r requirements.txtTo train on a local dataset:
python train_stable_diffusion.py \
--train_data_dir path/to/your/dataset \
--output_dir path/to/save/model \
--resolution 512 \
--train_batch_size 1 \
--gradient_accumulation_steps 4 \
--max_train_steps 1000To train on a dataset from Hugging Face:
python train_stable_diffusion.py \
--dataset_name "dataset_name" \
--output_dir path/to/save/model \
--resolution 512 \
--train_batch_size 1 \
--gradient_accumulation_steps 4 \
--max_train_steps 1000You can use a locally downloaded Stable Diffusion model instead of downloading it from Hugging Face. This is useful if you want to:
- Avoid downloading the model every time you train
- Use a modified version of the base model
- Train without internet access
To use a local model:
python train_stable_diffusion.py \
--local_model_path path/to/local/model \
--train_data_dir path/to/your/dataset \
--output_dir path/to/save/model \
--resolution 512 \
--train_batch_size 1 \
--gradient_accumulation_steps 4 \
--max_train_steps 1000The local model directory should contain the following structure:
model/
├── tokenizer/
├── text_encoder/
├── vae/
└── unet/
--train_data_dir: Directory containing your training images and text files--output_dir: Where to save the trained model--local_model_path: Path to local directory containing the base model files (optional)--resolution: Input image resolution (default: 512)--train_batch_size: Batch size per GPU (default: 1)--gradient_accumulation_steps: Number of steps to accumulate gradients (default: 4)--learning_rate: Learning rate (default: 1e-6)--max_train_steps: Total number of training steps--mixed_precision: Use mixed precision training ("fp16" or "bf16")--gradient_checkpointing: Enable gradient checkpointing to save memory--logging_dir: Directory to store logs for TensorBoard (default: "logs")
Your dataset should be organized as follows:
dataset/
├── image1.jpg
├── image1.txt
├── image2.jpg
├── image2.txt
└── ...
Each image should have a corresponding text file with the same name containing the description of the image.
- Start with a small batch size (1-2) and increase if your GPU memory allows
- Use gradient accumulation to simulate larger batch sizes
- Enable mixed precision training for better memory efficiency
- Monitor the loss during training to ensure it's decreasing
- Save checkpoints regularly to resume training if needed
If you run into memory issues:
- Reduce the batch size
- Enable gradient checkpointing
- Use mixed precision training
- Reduce the image resolution
- Use gradient accumulation
The script logs training progress to TensorBoard. To view the logs:
tensorboard --logdir path/to/logging_dirTo resume from a checkpoint:
python train_stable_diffusion.py \
--resume_from_checkpoint path/to/checkpoint \
--output_dir path/to/save/modelThis script is based on the Hugging Face Diffusers library and follows its license terms.