A powerful yet beginner-friendly web application that transforms
natural language prompts into stunning images using AI
This project demonstrates how to integrate Stable Diffusion AI models into a Python web application using Flask and Hugging Face Diffusers. Built with simplicity in mind, it provides an intuitive interface for generating high-quality images from text descriptions, making AI-powered image generation accessible to everyone.
- ✅ Text-to-Image Generation - Transform natural language prompts into images
- ✅ High-Quality Output - Powered by Stable Diffusion v1.5
- ✅ Image Download - Save generated images locally
- ✅ Responsive Design - Works seamlessly on desktop and mobile
- ✅ Real-time Preview - Instant image display upon generation
- 🔄 Flexible Hardware Support - CPU and GPU compatibility
- 📦 Local Storage - Generated images saved to filesystem
- 🎯 RESTful API - Clean API endpoints for integration
- ⚡ Optimized Pipeline - Efficient model loading and inference
- 🛡️ Error Handling - Robust error management and logging
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | HTML5, CSS3, JavaScript | User interface and interactions |
| Backend | Python 3.8+, Flask | Web server and API endpoints |
| AI Model | Stable Diffusion v1.5 | Text-to-image generation |
| ML Framework | PyTorch, Diffusers | Model inference |
| Storage | Local Filesystem | Generated image storage |
Flask >= 2.0.0
torch >= 2.0.0
diffusers >= 0.21.0
transformers >= 4.30.0
accelerate >= 0.20.0
pillow >= 9.0.0
Before you begin, ensure you have the following installed:
- Python 3.8 or higher - Download Python
- pip - Python package installer (comes with Python)
- Git - Download Git
- (Optional) CUDA - For GPU acceleration
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8GB | 16GB+ |
| Storage | 10GB free | 20GB+ free |
| GPU | None (CPU works) | NVIDIA GPU with 6GB+ VRAM |
git clone https://github.com/ARUNAGIRINATHAN-K/text-to-image-generator.git
cd text-to-image-generator# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtOr install manually:
pip install flask torch diffusers transformers accelerate pillowmkdir generated_images
mkdir static/css static/js- Start the Flask server:
python app.py- Open your browser and navigate to:
http://localhost:5000
- Generate images:
- Enter a descriptive text prompt (e.g., "a beautiful sunset over mountains")
- Click "Generate Image"
- Wait for processing (may take 30-60 seconds on CPU)
- View and download your generated image
✨ "a serene lake surrounded by autumn trees, digital art"
🌆 "futuristic cityscape at night, cyberpunk style"
🐱 "cute cat wearing a wizard hat, watercolor painting"
🏔️ "majestic mountain peak with clouds, photography"
text-to-image-generator/
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── .gitignore # Git ignore rules
├── generated_images/ # Output directory for images
├── static/
│ ├── css/
│ │ └── style.css # Custom styles
│ └── js/
│ └── script.js # Frontend JavaScript
├── templates/
│ └── index.html # Main HTML template
└── models/ # Model cache directory (auto-created)
Endpoint: POST /generate
Request Body:
{
"prompt": "your text description here"
}Response:
{
"success": true,
"image_url": "/generated_images/image_1234567890.png",
"timestamp": "2025-11-07T10:30:00"
}cURL Example:
curl -X POST http://localhost:5000/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "a beautiful landscape"}'Create a .env file in the root directory:
FLASK_APP=app.py
FLASK_ENV=development
FLASK_DEBUG=1
MODEL_NAME=runwayml/stable-diffusion-v1-5
IMAGE_WIDTH=512
IMAGE_HEIGHT=512
INFERENCE_STEPS=50Edit app.py to adjust:
# Image dimensions
image = pipe(
prompt,
height=512, # Adjust height
width=512, # Adjust width
num_inference_steps=50, # Quality vs speed tradeoff
guidance_scale=7.5 # Prompt adherence
).images[0]- Model ID:
runwayml/stable-diffusion-v1-5 - Type: Text-to-Image Diffusion Model
- License: CreativeML Open RAIL-M
- Size: ~4GB
- Resolution: 512×512 (default)
Issue: Model downloading is slow
Solution: First run downloads ~4GB model. Be patient or use faster internet.
Issue: Out of memory error
Solution: Reduce image size or use CPU instead of GPU
pipe.to("cpu")
Issue: Port 5000 already in use
Solution: Change port in app.py:
app.run(port=5001)
Issue: Generated images look poor quality
Solution: Increase inference steps in configuration (50-100 recommended)
- Add multiple model support
- Implement image-to-image generation
- Add batch generation feature
- Create Docker container
- Add prompt suggestions
- Implement user authentication
- Add generation history
- Create mobile app