pip install uv
uv sync
uv run python lt1.pyUse with the official ltxv2 models and full gemma text encoder from the main ltx page. This repository is under active development and alot of features are quite broken but the basics should work well. If you need some helps to get it going I will try...
This repository is organized as a monorepo with three main packages:
- ltx-core - Core model implementation, inference stack, and utilities
- ltx-pipelines - High-level pipeline implementations for text-to-video, image-to-video, and other generation modes
- ltx-trainer - Training and fine-tuning tools for LoRA, full fine-tuning, and IC-LoRA
Each package has its own README and documentation. See the Documentation section below.
Each package includes comprehensive documentation:
- LTX-Core README - Core model implementation, inference stack, and utilities
- LTX-Pipelines README - High-level pipeline implementations and usage guides
- LTX-Trainer README - Training and fine-tuning documentation with detailed guides
Download from Lightricks/LTX-2 on HuggingFace:
| File | Description |
|---|---|
ltx-2-19b-dev.safetensors |
Main 19B dev checkpoint |
ltx-2-19b-distilled.safetensors |
Distilled model |
ltx-2-19b-distilled-lora-384.safetensors |
Distilled LoRA |
ltx-2-spatial-upscaler-x2-1.0.safetensors |
2x spatial upscaler |
| Model | Link |
|---|---|
| gemma-3-12b-it-qat-q4_0-unquantized | google/gemma-3-12b-it-qat-q4_0-unquantized |
I have created a repository with all the interpolation files in one place here: https://huggingface.co/maybleMyers/interpolate/tree/main
Download from GSean/GIMM-VFI on HuggingFace:
| File | Description |
|---|---|
gimmvfi_r_arb.pt |
GIMM-VFI-R (RAFT-based) |
gimmvfi_r_arb_lpips.pt |
GIMM-VFI-R-P (RAFT + Perceptual) |
gimmvfi_f_arb.pt |
GIMM-VFI-F (FlowFormer-based) |
gimmvfi_f_arb_lpips.pt |
GIMM-VFI-F-P (FlowFormer + Perceptual) |
flowformer_sintel.pth |
FlowFormer optical flow (also on Google Drive) |
raft-things.pth |
RAFT optical flow (also from princeton-vl/RAFT) |
| File | Link |
|---|---|
bim_vfi.pth |
Google Drive |
Place VFI checkpoints in GIMM-VFI/pretrained_ckpt/
| Model | Link |
|---|---|
| RealESRGAN_x2plus.pth | GitHub Release |
| RealESRGAN_x4plus.pth | GitHub Release |
| 003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth | SwinIR GitHub Release |
| basicvsr_plusplus_reds4.pth | OpenMMLab |
Place upscaler checkpoints in GIMM-VFI/pretrained_ckpt/
| Model | Link |
|---|---|
| ZoeDepth (Intel/zoedepth-nyu-kitti) | HuggingFace (auto-downloaded by transformers) |
The default directory structure expected by lt1.py:
ltx/
βββ weights/ # LTX-2 core models
β βββ ltx-2-19b-dev.safetensors # Main checkpoint
β βββ ltx-2-19b-distilled.safetensors # Distilled model (optional)
β βββ ltx-2-19b-distilled-lora-384.safetensors # Distilled LoRA
β βββ ltx-2-spatial-upscaler-x2-1.0.safetensors # Spatial upscaler
β
βββ gemma-3-12b-it-qat-q4_0-unquantized/ # Text encoder
β βββ config.json
β βββ model-00001-of-00005.safetensors
β βββ model-00002-of-00005.safetensors
β βββ model-00003-of-00005.safetensors
β βββ model-00004-of-00005.safetensors
β βββ model-00005-of-00005.safetensors
β βββ ...
β
βββ GIMM-VFI/pretrained_ckpt/ # Interpolation & upscaler models
β βββ gimmvfi_r_arb.pt # GIMM-VFI-R
β βββ gimmvfi_r_arb_lpips.pt # GIMM-VFI-R-P
β βββ gimmvfi_f_arb.pt # GIMM-VFI-F
β βββ gimmvfi_f_arb_lpips.pt # GIMM-VFI-F-P
β βββ flowformer_sintel.pth # Required for FlowFormer variants
β βββ raft-things.pth # Required for RAFT variants
β βββ bim_vfi.pth # BiM-VFI model
β βββ RealESRGAN_x2plus.pth # 2x upscaler
β βββ RealESRGAN_x4plus.pth # 4x upscaler
β βββ 003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth # SwinIR 4x
β βββ basicvsr_plusplus_reds4.pth # BasicVSR++ video upscaler
β
βββ lora/ # Custom LoRAs (optional)
β βββ your-lora.safetensors
β
βββ outputs/ # Generated videos
All model paths in the GUI can be customized. Use the Save Defaults button to persist your settings:
| Generation | Save Defaults | LTX checkpoint, Gemma path, spatial upscaler, VAE, distilled LoRA, LoRA folder, all generation parameters |
Settings are saved to ui_configs/ as JSON files and automatically loaded on startup.
Note: The Post-Processing tab (interpolation/upscaling) does not have a Save Defaults button. These models use hardcoded paths in GIMM-VFI/pretrained_ckpt/. You can override paths per-session using the "Custom Model Path" fields, but they won't persist.
CUDA out of memory
- Enable CPU Offloading in Model Settings
- Enable Block Swap for DiT and Text Encoder to reduce VRAM usage
- Reduce DiT Blocks in GPU (try 10-15 for 24GB VRAM)
- Reduce Text Encoder Blocks in GPU (try 4-6)
- Lower resolution or frame count
- Use FP8 quantized checkpoints (
ltx-2-19b-dev-fp8.safetensors)
CUDA version mismatch / not detected
- Ensure CUDA >= 12.7 is installed
- Check PyTorch CUDA version matches system:
python -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)" - Reinstall PyTorch with correct CUDA version from pytorch.org
FileNotFoundError: Checkpoint not found
- Verify model paths in the GUI match actual file locations
- Check that models are downloaded completely (not corrupted/partial)
- Use absolute paths if relative paths fail
Error loading Gemma text encoder
- Ensure you have the full unquantized Gemma model, not GGUF format
- Accept the license on HuggingFace before downloading
- Check the
gemma-3-12b-it-qat-q4_0-unquantizedfolder containsconfig.jsonand model files
GIMM-VFI / BiM-VFI model errors
- Ensure all checkpoints are in
GIMM-VFI/pretrained_ckpt/ - For FlowFormer variants, verify
flowformer_sintel.pthis present - For RAFT variants, verify
raft-things.pthis present
Black or corrupted output video
- Check input image/video dimensions are divisible by 32
- Frame count must be divisible by 8, plus 1 (e.g., 9, 17, 25, 33...)
- Try reducing inference steps or changing the seed
Very slow generation
- Enable block swap to trade speed for VRAM
- Disable CPU offloading if you have sufficient VRAM
- Use distilled model for faster inference (fewer steps needed)
Prompt enhancement not working
- Verify Gemma model path is correct
- Check "Enhance Prompt" is enabled
- Gemma requires significant VRAM; enable text encoder block swap
uv sync fails
- Update uv:
pip install -U uv - Clear cache:
uv cache clean - Try with fresh venv:
uv venv && uv sync
Import errors / missing modules
- Run from the project root directory
- Ensure virtual environment is activated:
uv run python lt1.py - Check Python version >= 3.12
Gradio UI not loading
- Check for port conflicts (default 7860)
- Try specifying a different port in launch options
- Disable any VPN/proxy that might block localhost
Interpolation produces artifacts
- Try a different model variant (RAFT vs FlowFormer)
- Use perceptual variants (-P) for better quality
- Reduce interpolation multiplier for fast motion scenes
Upscaler produces blurry results
- SwinIR-L generally produces sharper results than RealESRGAN
- BasicVSR++ is optimized for video temporal consistency
- Check input video quality - upscalers can't recover lost detail
# Clear PyTorch cache
rm -rf ~/.cache/torch
# Clear HuggingFace cache (redownloads models)
rm -rf ~/.cache/huggingface
# Check GPU memory usage
nvidia-smi
# Monitor GPU during generation
watch -n 1 nvidia-smi