An advanced implementation of SEESR (Semantic Edge Enhanced Super-Resolution) optimized with SD Turbo for ultra-fast, high-quality super-resolution. Includes optional Real-ESRGAN pre-enhancement and robust color/frequency correction.
- Ultra-fast inference: 1β4 steps vs 20β50 traditional
- Quality retained with SD Turbo optimizations for few steps
- Memory efficient: Tiled VAE for large images on limited VRAM
- Automatic tagging: RAM model auto-generates guidance from images
- Color correction: Wavelet-based color fix for natural results
- KDS (Kernel Density Steering): Advanced generation control
- Optional Real-ESRGAN βGAN-Embeddingβ pre-enhancement
- Docker-ready: Pre-configured container with pre-fetched models
- Cross-platform: macOS, Linux, and Windows
- Virtual environment: Isolated, reproducible setup
The project includes a fully updated Dockerfile with:
- Python 3.10 environment
- Pre-downloaded model weights during build
- Automatic environment tests
- CUDA optimizations and memory management
# Quick build with Cog
cog build
# Manual Docker build
./docker/docker_build.sh build
# Full instructions
cat docker/DOCKER_BUILD_GUIDE.md
# Automatic virtual environment setup
./start_seesr.sh setup
# Run tests
./start_seesr.sh test
# Run super-resolution
./start_seesr.sh run input.jpg
.
βββ activate_seesr.sh # Activate the local venv
βββ cog.yaml # Cog configuration (root)
βββ config.yaml # App config
βββ predict.py # Shim: re-exports Predictor from cog/predict.py
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ setup.py # Package metadata (editable install)
βββ start_seesr.sh # Helper for setup/run/test
βββ TECHNICAL_DOCS.md # Technical docs
βββ USAGE_EXAMPLES.md # Extra usage examples
βββ test_input.jpg # Sample input image
βββ cog/
β βββ predict.py # Main Predictor (Cog entrypoint)
β βββ README.md
βββ deployment/
β βββ download_models.py # Optional weights prefetch
β βββ REPLICATE_FINAL_RECOMMENDATION.md
β βββ REPLICATE_HARDWARE_GUIDE.md
β βββ preset/models/ # Model presets
βββ docker/
β βββ dockerfile # Dockerfile
β βββ docker_build.sh # Build helper
β βββ README.md
βββ models/ # Custom UNet/ControlNet
β βββ controlnet.py
β βββ unet_2d_condition.py
βββ pipelines/
β βββ pipeline_seesr.py # SEESR + SD Turbo pipeline
βββ ram/
β βββ models/ram_lora.py # RAM model (auto-tagging)
βββ tests/ # Test suite
β βββ test_complete.py
β βββ test_docker_env.py
β βββ test_environment.py
β βββ test_seesr.py
βββ utils/
βββ wavelet_color_fix.py # Wavelet/AdaIN/luminance color fixes
βββ xformers_utils.py # Attention optimizations helpers
The easiest way to use SEESR is via the helper script, which creates an isolated virtual environment and installs all dependencies:
# Clone the repository
git clone https://github.com/alexgenovese/cog-super-resolution-SEESR.git
cd cog-super-resolution-SEESR
# Automatic venv + install
./start_seesr.sh setup
This script will:
- Verify system requirements (Python 3.9+)
- Create a dedicated venv (
seesr_env
) - Install all required dependencies
- Configure the environment for usage
# Test the model with a sample image
./start_seesr.sh test
# Start a Python shell inside the env
./start_seesr.sh python
# Quick performance benchmark
./start_seesr.sh benchmark
# Manually activate the environment
source activate_seesr.sh
# Main commands:
# ./start_seesr.sh - Setup/run helper
# python tests/test_complete.py - System test
# python predict.py - Predictor shim (imports cog/predict.py)
- Python 3.9+ (auto-checked)
- CUDA 11.8+ (optional for GPU)
- 8β16GB VRAM (recommended for GPU)
- 4GB+ RAM (CPU minimum)
# Install dependencies
pip install -r requirements.txt
# Editable install
pip install -e .
# Install Cog if not present
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog
# Build container
cog build
# Test model
cog predict -i [email protected]
from predict import Predictor
# Initialize predictor
predictor = Predictor()
predictor.setup()
# Run super-resolution
result = predictor.predict(
image="input.jpg",
scale_factor=4,
num_inference_steps=4, # SD Turbo ottimizzato per 1-4 steps
cfg_scale=1.0, # SD Turbo funziona meglio con CFG=1.0
use_kds=True, # Abilita Kernel Density Steering
positive_prompt="high quality, detailed, 8k",
negative_prompt="blur, lowres, artifacts"
)
result = predictor.predict(
image="input.jpg",
user_prompt="beautiful landscape", # Optional user prompt
positive_prompt="masterpiece, best quality", # Positive prompt
negative_prompt="blur, noise, artifacts", # Negative prompt
num_inference_steps=4, # 1β4 for SD Turbo
scale_factor=4, # Upscale factor
cfg_scale=1.0, # SD Turbo CFG
use_kds=True, # Kernel Density Steering
bandwidth=0.1, # KDS bandwidth
num_particles=10, # KDS particles
seed=42, # Reproducibility seed
latent_tiled_size=320, # Diffusion tile size
latent_tiled_overlap=4 # Tile overlap
)
- Inference Steps: 1β4 (vs 20β50 traditional)
- CFG Scale: 1.0 (SD Turbo is tuned for low CFG)
- Scheduler: DDIM with tuned timesteps
- Memory: Tiled VAE for large images
- Inference time: ~5β15s (vs 30β60s traditional)
- VRAM: ~8β10GB (with tiling)
- Quality: High thanks to semantic guidance
- Max resolution: Limited by available VRAM
- Automatic tagging: Generates image tags
- Semantic guidance: Improves quality using tag embeddings
- LoRA integration: Efficient adaptations
- Generation control: Guides diffusion
- Stability: Reduces artifacts and improves consistency
- Configurable: Bandwidth and particles
- Preserves original colors
- Multi-method: Wavelet, AdaIN, and luminance correction
- Automatic: Applied to the output image
- Training-free enhancement before diffusion
- Improves detail and stability for low-quality inputs
- Can be disabled automatically when not available
# Configure custom model paths
import os
os.environ['SEESR_MODEL_PATH'] = '/path/to/custom/seesr'
os.environ['SD_TURBO_PATH'] = '/path/to/custom/sd-turbo'
os.environ['RAM_MODEL_PATH'] = '/path/to/custom/ram'
# Set this to skip heavy downloads during CI/tests (not for production inference)
os.environ['SEESR_TEST_MODE'] = '1'
# For limited VRAM
predictor.validation_pipeline._init_tiled_vae(
encoder_tile_size=512, # Lower for less VRAM
decoder_tile_size=128 # Lower for less VRAM
)
# Enable gradient checkpointing
predictor.unet.enable_gradient_checkpointing()
Method | Time (s) | VRAM (GB) | PSNR | SSIM |
---|---|---|---|---|
SEESR (original) | 45β60 | 12β16 | 28.5 | 0.85 |
SEESR + SD Turbo | 8β15 | 8β10 | 29.2 | 0.87 |
SD Turbo fallback | 3β5 | 6β8 | 26.8 | 0.82 |
-
CUDA Out of Memory
# Riduci dimensioni tile latent_tiled_size=256 latent_tiled_overlap=2
-
Models not found
# Forza il download python -c "from predict import Predictor; p = Predictor(); p.setup()"
-
Low quality output
# Aumenta steps se necessario num_inference_steps=4 # Massimo per SD Turbo cfg_scale=1.0 # Ottimale per SD Turbo
Signature (cog/predict.py):
def predict(
image: Path,
user_prompt: str = "",
positive_prompt: str = "clean, high-resolution, 8k, masterpiece",
negative_prompt: str = "dotted, noise, blur, lowres, oversmooth, bad anatomy, bad hands, cropped",
num_inference_steps: int = 4, # 1β8
scale_factor: int = 4, # 1β6
cfg_scale: float = 1.0, # 0.5β1.5
use_kds: bool = True,
bandwidth: float = 0.1, # 0.1β0.8
num_particles: int = 10, # 1β16
seed: int = 231,
latent_tiled_size: int = 320, # 128β480
latent_tiled_overlap: int = 4, # 4β16
) -> Path
Hardware:
- GPU: NVIDIA with 8GB+ VRAM recommended (CPU works but slower)
- RAM: 8GB minimum, 16GB+ recommended for large images
- Disk: 15β20GB for models and cache
Model:
- Very small inputs (<256px) may yield suboptimal results
- Scale factors >4Γ may introduce artifacts
- Tuned for natural photos; results vary for drawings/art
Performance Considerations
- Virtual environments are recommended for consistent PyTorch/CUDA
- GPU (CUDA): ~5β15s per inference
- CPU: ~2β10 minutes per inference
- Apple M1/M2: Intermediate with MPS
Memory Management
- Tiled VAE for >2K images with <16GB VRAM
- Gradient checkpointing reduces VRAM at speed cost
- Mixed precision (fp16) enabled by default
Common Troubles
CUDA Out of Memory:
# Reduce VAE tile size
latent_tiled_size = 256 # default: 320
# Reduce internal batch if customized
Import Errors:
# Recreate virtual environment
rm -rf seesr_env
./start_seesr.sh setup
Slow Performance:
# Check GPU detection
./start_seesr.sh test
# Force CPU usage if necessary
export CUDA_VISIBLE_DEVICES=""
Models Not Found:
# Models are downloaded automatically
# Ensure internet connectivity on first run
For performance:
- Use NVIDIA GPU with CUDA 11.8+
- Keep inference steps 2β4 for SD Turbo
- Use CFG scale = 1.0
- Enable tiled VAE for large images
For quality:
- Provide precise prompts
- Prefer moderate scale factors (2Γβ4Γ)
- Enable KDS for stability
- Try different seeds
For development:
- Always use a virtual environment
- Iterate on small images first
- Monitor memory usage
- Keep a known-good requirements.txt
MIT License β see LICENSE.
Contributions are welcome:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to your branch
- Open a Pull Request
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- SEESR: Based on the Semantic Edge Enhanced Super-Resolution work
- SD Turbo: Stability AI
- RAM: Recognition Anything Model team
- Diffusers: Hugging Face
Note: For CI/tests, you can set SEESR_LITE=1 or SEESR_TEST_MODE=1 to skip heavy downloads. Do not use lite mode for real inference.