This guide covers common issues and solutions when training with LTX-Video-Trainer.
Memory management is crucial for successful training, especially with larger models like LTXV 13B.
When training with the LTXV 13B model, you must enable gradient checkpointing:
optimization:
enable_gradient_checkpointing: true # Required for LTXV 13BNote: Gradient checkpointing trades training speed for memory savings. It's essential for training LTXV 13B on consumer GPUs.
Load the text encoder in 8-bit precision to save GPU memory during training:
quantization:
load_text_encoder_in_8bit: trueThis setting is also available in all data preparation scripts:
# Dataset preprocessing with 8-bit text encoder
python scripts/preprocess_dataset.py dataset.json \
--resolution-buckets "768x768x25" \
--load_text_encoder_in_8bit
# Caption generation with 8-bit quantization
python scripts/caption_videos.py videos/ \
--output dataset.json \
--use-8bitLower the batch size if you encounter out-of-memory errors:
data:
batch_size: 1 # Start with 1 and increase graduallyReduce spatial or temporal dimensions to save memory:
# Smaller spatial resolution
python scripts/preprocess_dataset.py dataset.json \
--resolution-buckets "512x512x49"
# Fewer frames
python scripts/preprocess_dataset.py dataset.json \
--resolution-buckets "768x768x25" # 25 frames instead of 49Use the low VRAM configuration as a starting point:
# Based on configs/ltxv_2b_lora_low_vram.yaml
model_source: "LTXV_2B_0.9.6_DEV"
data:
batch_size: 1
optimization:
enable_gradient_checkpointing: true
optimizer_type: "adamw8bit" # 8-bit optimizer
quantization:
load_text_encoder_in_8bit: trueSequence Length Calculation:
sequence_length = (H/32) * (W/32) * ((F-1)/8 + 1)
Where:
- H = Height, W = Width, F = Number of frames
- 32 = VAE spatial downsampling factor
- 8 = VAE temporal downsampling factor
Examples:
768x768x25: sequence_length = 24 × 24 × 4 = 2,304768x448x89: sequence_length = 24 × 14 × 12 = 4,032512x512x49: sequence_length = 16 × 16 × 7 = 1,792
Memory Requirements by Model:
- LTXV 2B: ~16-40GB VRAM (depending on resolution and batch size)
- LTXV 13B: ~40GB+ VRAM (requires gradient checkpointing)
Solution: Ensure you're in the correct environment and have installed dependencies:
# Reinstall if needed
uv sync
# Activate virtual environment
source .venv/bin/activateOptimizations:
-
Disable gradient checkpointing (if you have enough VRAM):
optimization: enable_gradient_checkpointing: false
-
Increase batch size (if memory allows):
data: batch_size: 2 # Or higher
-
Use compiled models (experimental):
optimization: use_torch_compile: true
Solutions:
-
Use Image-to-Video Validation Instead of Text-to-Video:
- For more reliable validation, use image-to-video (first-frame conditioning) rather than text-to-video. This is supported via the
imagesfield in your validation config (seeValidationConfiginconfig.py):validation: prompts: - "a professional portrait video of a person with blurry bokeh background" images: - "/path/to/first_frame.png" # One image per prompt
- This approach provides a stronger conditioning signal and typically results in higher quality validation outputs.
- For more reliable validation, use image-to-video (first-frame conditioning) rather than text-to-video. This is supported via the
-
Note on Diffusers Inference Quality:
- The default inference pipeline in 🤗 Diffusers is suboptimal for LTXV models: it does not include STG (Spatio-Temporal Guidance) or other inference-time tricks that improve video quality.
- For best results, use validation videos to track training progress, but for actual quality testing, export your LoRA and test it in ComfyUI using the recommended workflow: 👉 ComfyUI-LTXVideo
-
Other Tips:
- Check caption quality: Review and, if needed, manually edit captions for accuracy.
- Adjust LoRA rank: Try higher values for
lora.rank(e.g., 32, 64, 128) for more capacity:lora: rank: 64
- Increase training steps: Train longer if needed:
optimization: steps: 2000
Cause: LoRA checkpoints trained with this trainer are saved in Diffusers format, but ComfyUI expects a different format with diffusion_model prefixes instead of transformer prefixes.
Solution: Convert your checkpoint from Diffusers to ComfyUI format using the conversion script:
# Convert from Diffusers to ComfyUI format
python scripts/convert_checkpoint.py your_lora.safetensors --to-comfy --output_path your_lora_comfy.safetensorsWhat this does:
- Converts
transformerprefixes todiffusion_modelprefixes - Maintains safetensors format for security
- Creates a new file with
_comfysuffix (if no output path specified)
After conversion:
- Load the converted
.safetensorsfile in ComfyUI - The LoRA should now load without errors
For more details on checkpoint conversion, see the Utility Scripts Reference.
Track memory usage during training:
# Watch GPU memory in real-time
watch -n 1 nvidia-smi
# Log memory usage to file
nvidia-smi --query-gpu=memory.used,memory.total --format=csv --loop=5 > memory_log.csvDecode latents to check to visualize the pre-processed videos:
python scripts/decode_latents.py dataset/.precomputed/latents \
--output-dir debug_outputCompare decoded videos with originals to ensure quality.
- Test preprocessing with a small subset first
- Verify all video files are accessible
- Check available GPU memory
- Review configuration against hardware capabilities
- Monitor GPU memory usage
- Check loss convergence regularly
- Review validation samples periodically
- Save checkpoints frequently
- Test trained model with diverse prompts
- Convert to ComfyUI format if needed
- Document training parameters and results
- Archive training data and configs
If you're still experiencing issues:
- Check logs: Review console output and log files for error details
- Search issues: Look through GitHub issues for similar problems
- Provide details: When reporting issues, include:
- Hardware specifications (GPU model, VRAM)
- Configuration file used
- Complete error message
- Steps to reproduce the issue
Have questions, want to share your results, or need real-time help? Join our community Discord server to connect with other users and the development team!
- Get troubleshooting help
- Share your training results and workflows
- Stay up to date with announcements and updates
We look forward to seeing you there!