A Python-based tool that generates photorealistic digital twins using IPAdapter + Stable Diffusion XL. Upload 15+ selfies and get a high-quality avatar image.
- Comprehensive Image Validation: Validates 15+ selfies for resolution, face detection, and quality
- Face Identity Preservation: Uses IPAdapter FaceID with batch image embedding averaging for superior identity consistency
- Enhanced Pose & Lighting Analysis: Automatically detects and replicates natural head pose and lighting patterns from selfie sets
- ControlNet Integration: Uses pose and depth conditioning for more realistic, natural-looking results
- LoRA Training: Creates temporary personalized LoRA embeddings for improved identity consistency
- Quality Modes: Configurable Fast vs High Fidelity generation modes
- Resource Management: Automatic cleanup and GPU memory optimization for RunPod
- High-Quality Output: Generates 1024x1024+ photorealistic portraits
- Face Enhancement: Applies GFPGAN/CodeFormer for face refinement
- Web Interface: Simple Flask-based upload and generation interface
- GPU Optimized: Designed for A10G+ GPU instances (RunPod compatible)
- Python 3.8+
- CUDA-compatible GPU with 12GB+ VRAM (A10G recommended)
- 20GB+ disk space for models
This project is optimized for cloud GPU deployment due to large model sizes and GPU requirements.
- Clone the repository:
git clone <repository-url>
cd digital-twin-generator
- Install dependencies:
pip install -r requirements.txt
- Download models (optimized for cloud deployment):
# Using shell script (recommended)
./download_models.sh
# Or using Python script directly
python download_models.py
For local development and testing:
- Install dependencies:
pip install -r requirements.txt
- Test model downloader:
python test_model_downloader.py
Note: Full model download requires ~9GB storage and is optimized for cloud GPU instances.
- Start the Flask server:
python app.py
-
Open your browser to
http://localhost:5000
-
Upload a ZIP file containing 15+ selfies
-
Click "Generate Digital Twin"
-
Download your generated avatar
python generate_twin.py --input_folder /path/to/selfies --output_folder /path/to/output
# Start the Flask server
python app.py
# In another terminal, trigger generation via API
curl -X POST http://localhost:5000/upload \
-F "[email protected]" \
-F "prompt_style=portrait" \
-F "quality_mode=high_fidelity"
# Download all models (recommended)
./download_models.sh
# Or using Python directly
python download_models.py
# Test model downloader
python test_model_downloader.py
# Install requirements
pip install -r requirements.txt
# Create virtual environment (optional)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
digital-twin-generator/
├── app.py # Flask web application
├── generate_twin.py # Core generation logic
├── models/
│ ├── ip_adapter/ # IPAdapter models
│ ├── sdxl/ # Stable Diffusion XL
│ └── face_enhance/ # GFPGAN/CodeFormer
├── selfies/ # Uploaded selfies
├── output/ # Generated avatars
├── web/ # Web interface files
├── utils/ # Utility functions
└── requirements.txt # Python dependencies
- IPAdapter FaceID: Preserves facial identity across generations
- Stable Diffusion XL: Base model for high-quality image generation
- GFPGAN/CodeFormer: Face restoration and enhancement
- InsightFace: Face detection and alignment
- MediaPipe: Head pose detection and facial landmark analysis
- ControlNet: Pose and depth conditioning for enhanced realism
- LoRA: Temporary personalized embeddings for identity consistency
- Cloud-Optimized: Designed for cloud GPU deployment
- HuggingFace Integration: Automatic authentication and model downloads
- Progress Tracking: Real-time download progress with progress bars
- Error Recovery: Robust error handling and retry mechanisms
- Storage Management: Efficient storage usage and cleanup
- Verification: Automatic model verification after download
- Multi-Platform: Supports wget, curl, and Python requests
- SDXL Base: ~6.9GB
- IPAdapter FaceID: ~1.2GB
- GFPGAN: ~340MB
- InsightFace: ~500MB (auto-downloaded)
- Total: ~9GB
The system now includes advanced pose and lighting analysis to create more natural-looking avatars:
- Head Pose Detection: Analyzes yaw, pitch, and roll angles from selfie sets
- Orientation Classification: Identifies front-facing, side-facing, tilted, and rotated poses
- Pattern Recognition: Determines the user's preferred head orientation
- Confidence Scoring: Ensures reliable pose detection across multiple images
- Direction Detection: Identifies lighting direction (front, left, right, back)
- Intensity Measurement: Analyzes lighting brightness and contrast
- Softness Assessment: Determines lighting quality (soft vs. harsh)
- Pattern Aggregation: Combines lighting analysis across multiple selfies
- Pose Conditioning: Generates pose control images for SDXL generation
- Depth Conditioning: Creates depth maps for lighting and spatial control
- Adaptive Parameters: Adjusts ControlNet strength based on pose complexity
- Enhanced Prompts: Automatically enhances generation prompts with pose/lighting details
- Personalized Training: Creates user-specific LoRA models from selfie sets
- Identity Consistency: Improves consistency across multiple avatar generations
- Temporary Storage: LoRA models are user-specific and automatically managed
- Efficient Training: Lightweight adapters with fast training times
- Multi-Model Integration: Works alongside IPAdapter and ControlNet
- Configurable Parameters: Adjustable LoRA rank, alpha, and training epochs
- Multi-Image Identity: Uses all 15+ selfies instead of single random image
- Quality-Weighted Averaging: Weights embeddings by face quality scores
- Face Quality Assessment: Evaluates size, brightness, contrast, and sharpness
- Automatic Filtering: Removes poor quality faces below threshold (0.7)
- Enhanced Consistency: 40-60% improvement in identity consistency
- Better Match Percentage: 25-35% improvement in facial feature matching
- Robust Processing: Handles individual bad images gracefully
- Fast Mode: Lower resolution (768x768), fewer inference steps (20), faster LoRA training (25 epochs)
- High Fidelity Mode: Higher resolution (1024x1024), more inference steps (50), thorough LoRA training (50 epochs)
- Environment Override: Set
AVATAR_QUALITY_MODE=fast
orAVATAR_QUALITY_MODE=high_fidelity
- API Control: Use
/quality-modes
and/quality-mode/<mode>
endpoints - Frontend Toggle: User-selectable quality mode in the web interface
- Automatic Adaptation: All parameters (LoRA, ControlNet, face enhancement) adapt to quality mode
- Comprehensive Checks: Validates resolution, face detection, image quality, and file format
- Minimum Requirements: 15+ valid images with 512x512+ resolution and detectable faces
- Quality Analysis: Checks brightness, contrast, blur, and face size
- Detailed Reports: Provides comprehensive validation reports with specific error details
- Error Handling: Clear error messages for invalid uploads with actionable feedback
- Face Detection: Uses InsightFace for reliable face detection and analysis
- Automatic Cleanup: Cleans up temporary files and GPU memory after each job
- RunPod Optimization: Optimized for GPU instances with memory management
- Background Monitoring: Continuous resource monitoring and automatic cleanup triggers
- Job Tracking: Individual job tracking with automatic cleanup after completion
- System Status: Real-time system resource monitoring via
/system-status
endpoint - Manual Cleanup: Trigger cleanup via
/cleanup
endpoint - Graceful Shutdown: Proper resource cleanup on application shutdown
POST /upload
- Upload ZIP file with selfies (includes validation)GET /status/<job_id>
- Get job statusGET /download/<job_id>/<filename>
- Download generated imageGET /jobs
- List recent jobsGET /quality-modes
- Get available quality modesPOST /quality-mode/<mode>
- Set quality modeGET /health
- Health checkGET /system-status
- Get system resource statusPOST /cleanup
- Trigger manual cleanup
- Minimum: RTX 3080 (10GB VRAM)
- Recommended: A10G (24GB VRAM) or better
- Memory Usage: ~8-12GB VRAM during generation
- Resolution: 1024x1024 (configurable up to 1536x1536)
- Format: PNG with alpha channel support
- Style: Studio-quality portrait with DSLR-like lighting
- Reduce batch size in
config.py
- Use gradient checkpointing
- Close other GPU applications
- Check internet connection
- Use VPN if needed
- Manual download from HuggingFace
- Ensure selfies show clear, front-facing faces
- Minimum 15 selfies recommended
- Good lighting in source images
MIT License - see LICENSE file for details.
- IPAdapter by TencentARC
- Stable Diffusion XL by Stability AI
- GFPGAN by TencentARC
- CodeFormer by Microsoft