Check (HERE) to get complete idea about the project
- Abdul Azeem Syed
- Ramshankar Bhuvaneswaran
Instructor: Prof. Handan Liu
This project explores distributed data parallelization strategies for text-to-image generation pipelines. By leveraging multiple CPUs and GPUs, we evaluate the feasibility of reducing reliance on GPUs for preprocessing tasks, enabling cost-effective solutions for large-scale image and text embedding generation.
- Evaluate CPU-based parallel preprocessing against GPU-based preprocessing.
- Identify trade-offs in terms of speedup, efficiency, and cost.
- Assess scalability and economic viability in text-to-image pipelines.
While GPUs are powerful, their high cost and limited availability often hinder accessibility. This project investigates whether CPU clusters, using distributed frameworks, can serve as viable alternatives for preprocessing stages in text-to-image generation.
- Dataset Used: MS COCO 2014 (Filtered Subset)
- Number of Images: ~83,000 (filtered for people-centric captions)
- Resolution: ~640×480 pixels
- Data Size: ~13 GB
- VAE Embedding (Image Preprocessing): Converts images into latent embeddings.
- CLIP Embedding (Text Preprocessing): Generates text embeddings and attention masks from captions.
- CPU-Based Parallelization:
- Native Multiprocessing
- Joblib
- Dask
- GPU-Based Parallelization:
- Multithreading
- Distributed data loading
- Implements Distributed Data Parallelism (DDP) with U-Net-based architectures.
- Explores mixed precision training for optimization.
Propress.ipynb: Filters the MS COCO dataset for people-centric imagery.Main.ipynb: Prepares image and text embeddings using VAE and CLIP models.
Scripts for converting captions into embeddings using various parallelization methods:
cp-multi.py: Native multiprocessing.jb.py: Joblib parallelization.multi.py: GPU-based acceleration.dasky.py: Dask-based distributed processing.
Scripts for converting images into embeddings with similar methods:
multi.py: Native multiprocessing.job.py: Joblib parallelization.g.py: GPU-based acceleration.dasky.py: Dask-based distributed processing.
train3.py: Full training pipeline without mixed precision.train-mixed-p.py: Training with mixed precision for faster execution and resource efficiency.
generate.py: Generates images based on user-provided prompts using the trained model.
python generate.py --prompt "your text" --model-path "path/to/model" \
--guidance-scale 2.0 --num-inference-steps 50- Time (Wall-clock processing time)
- Speedup (Single CPU/GPU vs. multiple CPUs/GPUs)
- Efficiency (Scaling performance relative to workers)
- CPU Parallelization: Limited scalability and efficiency, with diminishing returns beyond 2-4 CPUs.
- GPU Parallelization: Significant speedups with reasonable efficiency, especially for training and CLIP embedding tasks.
- Mixed Precision Training: Improved final loss and reduced training time by ~10-15%.
- Ramesh, A., et al. (2021). Zero-Shot Text-to-Image Generation. ICML.
- Ho, J., et al. (2020). Denoising Diffusion Probabilistic Models. NeurIPS.
- Radford, A., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. ICML.
- Paszke, A., et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. NeurIPS.
- Micikevicius, P., et al. (2017). Mixed Precision Training. arXiv:1710.03740
- Python 3.8+
- PyTorch 1.12+
- CUDA Toolkit (for GPU acceleration)
- Required Libraries:
joblib,dask,transformers,torchvision,scikit-learn
Contributions are welcome! Feel free to open issues or submit pull requests for improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.