This guide covers the various utility scripts available for preprocessing, conversion, and debugging tasks.
The scripts/convert_checkpoint.py script converts LoRA weights between Diffusers and ComfyUI formats.
# Convert from Diffusers to ComfyUI format
python scripts/convert_checkpoint.py input.safetensors --to-comfy --output_path output.safetensors
# Convert from ComfyUI to Diffusers format
python scripts/convert_checkpoint.py input.safetensors --output_path output.safetensorsKey features:
- Bidirectional conversion: Supports both directions (diffusers ↔ ComfyUI)
- Automatic naming: If no output path is specified, automatically adds
_comfyor_diffuserssuffix - Safetensors format: Maintains safetensors format for security
When to use:
- After training a LoRA for use in ComfyUI
- Converting existing ComfyUI LoRAs for use with this trainer
- Preparing models for different inference pipelines
The scripts/split_scenes.py script automatically splits long videos into shorter, coherent scenes.
# Basic scene splitting
python scripts/split_scenes.py input.mp4 output_dir/ --filter-shorter-than 5sKey features:
- Automatic scene detection: Uses PySceneDetect for intelligent splitting
- Multiple algorithms: Content-based, adaptive, threshold, and histogram detection
- Filtering options: Remove scenes shorter than specified duration
- Customizable parameters: Thresholds, window sizes, and detection modes
Common options:
# See all available options
python scripts/split_scenes.py --help
# Use adaptive detection with custom threshold
python scripts/split_scenes.py video.mp4 scenes/ --detector adaptive --threshold 30.0
# Limit to maximum number of scenes
python scripts/split_scenes.py video.mp4 scenes/ --max-scenes 50The scripts/caption_videos.py script generates captions for videos using vision-language models.
# Generate captions for all videos in a directory
python scripts/caption_videos.py scenes_output_dir/ --output captions.json
# Use 8-bit quantization to reduce VRAM usage
python scripts/caption_videos.py scenes_output_dir/ --output captions.json --use-8bitKey features:
- VLM-powered: Uses Qwen2.5-VL for high-quality captions
- Memory optimization: 8-bit quantization option for limited VRAM
- Batch processing: Processes entire directories of videos
- JSON output: Creates structured dataset files
The scripts/preprocess_dataset.py script processes videos and caches latents for training.
# Basic preprocessing
python scripts/preprocess_dataset.py dataset.json \
--resolution-buckets "768x768x25" \
--caption-column "caption" \
--video-column "media_path"
# With video decoding for verification
python scripts/preprocess_dataset.py dataset.json \
--resolution-buckets "768x768x25" \
--decode-videosFor detailed usage, see the Dataset Preparation Guide.
The scripts/compute_condition.py script provides a template for creating reference videos needed for IC-LoRA training.
This specific example generates reference videos using Canny edge detection.
Note: You can edit the
scripts/compute_condition.pyscript to generate other types of reference videos for IC-LoRA training. For example, you might implement colorization, depth maps, segmentation masks, or any custom video transformation by modifying thecompute_condition()function. This flexibility allows you to tailor the conditioning signal to your specific research or creative needs.
# Generate Canny edge reference videos
python scripts/compute_condition.py videos_dir/ --output dataset.jsonKey features:
- Canny edge detection: Creates edge-based reference videos
- In-place editing: Updates existing dataset JSON files
- Customizable: Modify the
compute_condition()function for different conditions
The scripts/decode_latents.py script decodes precomputed video latents back into video files for visual inspection.
# Basic usage
python scripts/decode_latents.py /path/to/latents/dir --output-dir /path/to/outputThe script will:
- Load the VAE model from the specified path
- Process all
.ptlatent files in the input directory - Decode each latent back into a video using the VAE
- Save resulting videos as MP4 files in the output directory
When to use:
- Verify preprocessing quality: Check that your videos were encoded correctly
- Debug training data: Visualize what the model actually sees during training
- Quality assessment: Ensure latent encoding preserves important visual details
Example workflow:
# After preprocessing your dataset
python scripts/preprocess_dataset.py dataset.json --resolution-buckets "768x768x25"
# Decode some latents to verify quality
python scripts/decode_latents.py dataset/.precomputed/latents --output-dir decoded_samples
# Review the decoded videos to ensure quality
ls decoded_samples/The main training scripts for single and multi-GPU training.
# Single-GPU training
python scripts/train.py config.yaml
# Multi-GPU distributed training
python scripts/train_distributed.py config.yamlFor detailed usage, see the Training Guide.
The scripts/run_pipeline.py script automates the entire workflow from raw videos to trained models.
python scripts/run_pipeline.py [LORA_BASE_NAME] \
--resolution-buckets "768x768x49" \
--config-template configs/ltxv_2b_lora_template.yaml \
--rank 32For detailed usage, see the automated pipeline section in the Training Guide.
- Start with
--help: Always check available options for each script - Test on small datasets: Verify workflows with a few files before processing large datasets
- Use decode verification: Always decode a few samples to verify preprocessing quality
- Monitor VRAM usage: Use
--use-8bitflags when running into memory issues - Keep backups: Make copies of important dataset files before running conversion scripts