Skip to content

Advanced digital twin generator using IPAdapter FaceID with batch image embedding averaging, ControlNet integration, and LoRA training for photorealistic avatar creation

License

Notifications You must be signed in to change notification settings

ranjanashwin/digital-twin-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digital Twin Generator

A Python-based tool that generates photorealistic digital twins using IPAdapter + Stable Diffusion XL. Upload 15+ selfies and get a high-quality avatar image.

Features

  • Comprehensive Image Validation: Validates 15+ selfies for resolution, face detection, and quality
  • Face Identity Preservation: Uses IPAdapter FaceID with batch image embedding averaging for superior identity consistency
  • Enhanced Pose & Lighting Analysis: Automatically detects and replicates natural head pose and lighting patterns from selfie sets
  • ControlNet Integration: Uses pose and depth conditioning for more realistic, natural-looking results
  • LoRA Training: Creates temporary personalized LoRA embeddings for improved identity consistency
  • Quality Modes: Configurable Fast vs High Fidelity generation modes
  • Resource Management: Automatic cleanup and GPU memory optimization for RunPod
  • High-Quality Output: Generates 1024x1024+ photorealistic portraits
  • Face Enhancement: Applies GFPGAN/CodeFormer for face refinement
  • Web Interface: Simple Flask-based upload and generation interface
  • GPU Optimized: Designed for A10G+ GPU instances (RunPod compatible)

Requirements

  • Python 3.8+
  • CUDA-compatible GPU with 12GB+ VRAM (A10G recommended)
  • 20GB+ disk space for models

Installation

Cloud GPU Deployment (Recommended)

This project is optimized for cloud GPU deployment due to large model sizes and GPU requirements.

  1. Clone the repository:
git clone <repository-url>
cd digital-twin-generator
  1. Install dependencies:
pip install -r requirements.txt
  1. Download models (optimized for cloud deployment):
# Using shell script (recommended)
./download_models.sh

# Or using Python script directly
python download_models.py

Local Development (Mac Mini M2)

For local development and testing:

  1. Install dependencies:
pip install -r requirements.txt
  1. Test model downloader:
python test_model_downloader.py

Note: Full model download requires ~9GB storage and is optimized for cloud GPU instances.

Usage

Web Interface (Recommended)

  1. Start the Flask server:
python app.py
  1. Open your browser to http://localhost:5000

  2. Upload a ZIP file containing 15+ selfies

  3. Click "Generate Digital Twin"

  4. Download your generated avatar

Command Line

Direct Generation

python generate_twin.py --input_folder /path/to/selfies --output_folder /path/to/output

Backend Generation (for testing)

# Start the Flask server
python app.py

# In another terminal, trigger generation via API
curl -X POST http://localhost:5000/upload \
  -F "[email protected]" \
  -F "prompt_style=portrait" \
  -F "quality_mode=high_fidelity"

Model Download Commands

# Download all models (recommended)
./download_models.sh

# Or using Python directly
python download_models.py

# Test model downloader
python test_model_downloader.py

Installation Commands

# Install requirements
pip install -r requirements.txt

# Create virtual environment (optional)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Project Structure

digital-twin-generator/
├── app.py                 # Flask web application
├── generate_twin.py       # Core generation logic
├── models/
│   ├── ip_adapter/       # IPAdapter models
│   ├── sdxl/            # Stable Diffusion XL
│   └── face_enhance/    # GFPGAN/CodeFormer
├── selfies/             # Uploaded selfies
├── output/              # Generated avatars
├── web/                 # Web interface files
├── utils/               # Utility functions
└── requirements.txt     # Python dependencies

Model Information

Core Models

  • IPAdapter FaceID: Preserves facial identity across generations
  • Stable Diffusion XL: Base model for high-quality image generation
  • GFPGAN/CodeFormer: Face restoration and enhancement
  • InsightFace: Face detection and alignment
  • MediaPipe: Head pose detection and facial landmark analysis
  • ControlNet: Pose and depth conditioning for enhanced realism
  • LoRA: Temporary personalized embeddings for identity consistency

Model Downloader Features

  • Cloud-Optimized: Designed for cloud GPU deployment
  • HuggingFace Integration: Automatic authentication and model downloads
  • Progress Tracking: Real-time download progress with progress bars
  • Error Recovery: Robust error handling and retry mechanisms
  • Storage Management: Efficient storage usage and cleanup
  • Verification: Automatic model verification after download
  • Multi-Platform: Supports wget, curl, and Python requests

Storage Requirements

  • SDXL Base: ~6.9GB
  • IPAdapter FaceID: ~1.2GB
  • GFPGAN: ~340MB
  • InsightFace: ~500MB (auto-downloaded)
  • Total: ~9GB

Enhanced Avatar Generation

The system now includes advanced pose and lighting analysis to create more natural-looking avatars:

Pose Analysis

  • Head Pose Detection: Analyzes yaw, pitch, and roll angles from selfie sets
  • Orientation Classification: Identifies front-facing, side-facing, tilted, and rotated poses
  • Pattern Recognition: Determines the user's preferred head orientation
  • Confidence Scoring: Ensures reliable pose detection across multiple images

Lighting Analysis

  • Direction Detection: Identifies lighting direction (front, left, right, back)
  • Intensity Measurement: Analyzes lighting brightness and contrast
  • Softness Assessment: Determines lighting quality (soft vs. harsh)
  • Pattern Aggregation: Combines lighting analysis across multiple selfies

ControlNet Integration

  • Pose Conditioning: Generates pose control images for SDXL generation
  • Depth Conditioning: Creates depth maps for lighting and spatial control
  • Adaptive Parameters: Adjusts ControlNet strength based on pose complexity
  • Enhanced Prompts: Automatically enhances generation prompts with pose/lighting details

LoRA Integration

  • Personalized Training: Creates user-specific LoRA models from selfie sets
  • Identity Consistency: Improves consistency across multiple avatar generations
  • Temporary Storage: LoRA models are user-specific and automatically managed
  • Efficient Training: Lightweight adapters with fast training times
  • Multi-Model Integration: Works alongside IPAdapter and ControlNet
  • Configurable Parameters: Adjustable LoRA rank, alpha, and training epochs

IPAdapter Batch Averaging

  • Multi-Image Identity: Uses all 15+ selfies instead of single random image
  • Quality-Weighted Averaging: Weights embeddings by face quality scores
  • Face Quality Assessment: Evaluates size, brightness, contrast, and sharpness
  • Automatic Filtering: Removes poor quality faces below threshold (0.7)
  • Enhanced Consistency: 40-60% improvement in identity consistency
  • Better Match Percentage: 25-35% improvement in facial feature matching
  • Robust Processing: Handles individual bad images gracefully

Quality Modes

  • Fast Mode: Lower resolution (768x768), fewer inference steps (20), faster LoRA training (25 epochs)
  • High Fidelity Mode: Higher resolution (1024x1024), more inference steps (50), thorough LoRA training (50 epochs)
  • Environment Override: Set AVATAR_QUALITY_MODE=fast or AVATAR_QUALITY_MODE=high_fidelity
  • API Control: Use /quality-modes and /quality-mode/<mode> endpoints
  • Frontend Toggle: User-selectable quality mode in the web interface
  • Automatic Adaptation: All parameters (LoRA, ControlNet, face enhancement) adapt to quality mode

Image Validation System

  • Comprehensive Checks: Validates resolution, face detection, image quality, and file format
  • Minimum Requirements: 15+ valid images with 512x512+ resolution and detectable faces
  • Quality Analysis: Checks brightness, contrast, blur, and face size
  • Detailed Reports: Provides comprehensive validation reports with specific error details
  • Error Handling: Clear error messages for invalid uploads with actionable feedback
  • Face Detection: Uses InsightFace for reliable face detection and analysis

Resource Management

  • Automatic Cleanup: Cleans up temporary files and GPU memory after each job
  • RunPod Optimization: Optimized for GPU instances with memory management
  • Background Monitoring: Continuous resource monitoring and automatic cleanup triggers
  • Job Tracking: Individual job tracking with automatic cleanup after completion
  • System Status: Real-time system resource monitoring via /system-status endpoint
  • Manual Cleanup: Trigger cleanup via /cleanup endpoint
  • Graceful Shutdown: Proper resource cleanup on application shutdown

API Endpoints

  • POST /upload - Upload ZIP file with selfies (includes validation)
  • GET /status/<job_id> - Get job status
  • GET /download/<job_id>/<filename> - Download generated image
  • GET /jobs - List recent jobs
  • GET /quality-modes - Get available quality modes
  • POST /quality-mode/<mode> - Set quality mode
  • GET /health - Health check
  • GET /system-status - Get system resource status
  • POST /cleanup - Trigger manual cleanup

GPU Requirements

  • Minimum: RTX 3080 (10GB VRAM)
  • Recommended: A10G (24GB VRAM) or better
  • Memory Usage: ~8-12GB VRAM during generation

Output Quality

  • Resolution: 1024x1024 (configurable up to 1536x1536)
  • Format: PNG with alpha channel support
  • Style: Studio-quality portrait with DSLR-like lighting

Troubleshooting

CUDA Out of Memory

  • Reduce batch size in config.py
  • Use gradient checkpointing
  • Close other GPU applications

Model Download Issues

  • Check internet connection
  • Use VPN if needed
  • Manual download from HuggingFace

Face Detection Issues

  • Ensure selfies show clear, front-facing faces
  • Minimum 15 selfies recommended
  • Good lighting in source images

License

MIT License - see LICENSE file for details.

Acknowledgments

  • IPAdapter by TencentARC
  • Stable Diffusion XL by Stability AI
  • GFPGAN by TencentARC
  • CodeFormer by Microsoft