The trainer uses structured Pydantic models for configuration, making it easy to customize training parameters. This guide covers all available configuration options and their usage.
The main configuration class is LtxvTrainerConfig, which includes the following sub-configurations:
- ModelConfig: Base model and training mode settings
- LoraConfig: LoRA training parameters
- ConditioningConfig: Video conditioning settings (reference videos, first frame conditioning)
- OptimizationConfig: Learning rate, batch sizes, and scheduler settings
- AccelerationConfig: Mixed precision and other optimization settings
- DataConfig: Data loading parameters
- ValidationConfig: Validation and inference settings
- CheckpointsConfig: Checkpoint saving frequency and retention settings
- HubConfig: Hugging Face Hub integration settings
- FlowMatchingConfig: Timestep sampling parameters
Check out our example configurations in the configs directory. You can use these as templates for your training runs:
- 📄 LTXV 2B Full Model Fine-tuning Example
- 📄 LTXV 2B LoRA training Example
- 📄 LTXV 13B LoRA training Example
- 📄 LTXV 2B LoRA Fine-tuning Example (Low VRAM) - Optimized for GPUs with 24GB VRAM
- 📄 LTXV 13B IC-LoRA Training Example - Video-to-video transformation training
Controls the base model and training mode settings.
model:
model_source: "LTXV_13B_097_DEV" # Model version, HuggingFace repo, or local path
training_mode: "lora" # "lora" or "full"
load_checkpoint: null # Path to checkpoint file/directory to resume fromKey parameters:
model_source: Model to use - can be a model version (see model_loader.py), HuggingFace repo ID, or local pathtraining_mode: Training approach - either"lora"for LoRA training or"full"for full-rank model fine-tuningload_checkpoint: Optional path to a checkpoint to resume the training from
LoRA-specific fine-tuning parameters (only used when training_mode: "lora").
lora:
rank: 64 # LoRA rank (higher = more parameters, more flexibility)
alpha: 64 # LoRA alpha scaling factor
dropout: 0.0 # Dropout probability (0.0-1.0)
target_modules: # Modules to apply LoRA to
- "to_k"
- "to_q"
- "to_v"
- "to_out.0"Key parameters:
rank: LoRA rank - higher values mean more trainable parameters and potentially more flexibility (typical range: 16-128)alpha: Alpha scaling factor - usually set equal to rankdropout: Dropout probability for regularizationtarget_modules: List of transformer modules (can include wildchar characters) to apply LoRA adapters to.
Video conditioning settings for specialized training modes.
conditioning:
mode: "none" # "none" or "reference_video"
first_frame_conditioning_p: 0.1 # Probability of first-frame conditioning
reference_latents_dir: "reference_latents" # Directory for reference video latentsKey parameters:
mode: Conditioning type -"none"for standard training,"reference_video"for IC-LoRAfirst_frame_conditioning_p: Probability of using first frame as conditioning (0.0-1.0)reference_latents_dir: Directory name for reference video latents (IC-LoRA only)
Training optimization parameters including learning rates, batch sizes, and schedulers.
optimization:
learning_rate: 1e-4 # Learning rate
steps: 3000 # Total training steps
batch_size: 2 # Batch size per GPU
gradient_accumulation_steps: 1 # Steps to accumulate gradients
max_grad_norm: 1.0 # Gradient clipping threshold
optimizer_type: "adamw" # "adamw" or "adamw8bit"
scheduler_type: "linear" # Scheduler type
scheduler_params: {} # Additional scheduler parameters
enable_gradient_checkpointing: false # Memory optimization at cost of speedKey parameters:
learning_rate: Learning rate for optimization (typical range: 1e-5 to 1e-3)steps: Total number of training stepsbatch_size: Batch size per GPU (reduce if running out of memory)gradient_accumulation_steps: Accumulate gradients over multiple steps (increases effective batch size)scheduler_type: Learning rate scheduler -"constant","linear","cosine","cosine_with_restarts","polynomial"enable_gradient_checkpointing: Trade training speed for GPU memory savings (required for LTXV 13B)
Hardware acceleration and compute optimization settings.
acceleration:
mixed_precision_mode: "bf16" # "no", "fp16", or "bf16"
quantization: null # Quantization options
load_text_encoder_in_8bit: false # Load text encoder in 8-bit
compile_with_inductor: true # Enable PyTorch compilation
compilation_mode: "reduce-overhead" # Compilation optimization modeKey parameters:
mixed_precision_mode: Precision mode -"bf16"recommended for modern GPUs,"fp16"for older onesquantization: Quantization precision for model weights. Options includenull(no quantization),"int8-quanto","int4-quanto","int2-quanto","fp8-quanto", and"fp8uz-quanto". Use quantization to reduce memory usage, especially for large models or limited hardware.load_text_encoder_in_8bit: Load the text encoder in 8-bit to save GPU memorycompile_with_inductor: Enable torch.compile() compilation for speed improvementscompilation_mode: Compilation strategy -"default","reduce-overhead","max-autotune"
Data loading and processing configuration.
data:
preprocessed_data_root: "path/to/preprocessed/data" # Path to precomputed dataset directory
num_dataloader_workers: 2 # Background data loading workersKey parameters:
preprocessed_data_root: Path to your preprocessed dataset (contains the.precomputeddirectory)num_dataloader_workers: Number of parallel data loading processes (0 = synchronous loading)
Validation and inference settings for monitoring training progress.
validation:
prompts: # Validation prompts
- "A cat playing with a ball"
- "A dog running in a field"
negative_prompt: "worst quality, inconsistent motion, blurry, jittery, distorted"
images: null # Optional list of image paths for image-to-video
reference_videos: null # Reference video paths (IC-LoRA only)
video_dims: [704, 480, 161] # Video dimensions [width, height, frames]
seed: 42 # Random seed for reproducibility
inference_steps: 50 # Number of inference steps
interval: 100 # Steps between validation runs
videos_per_prompt: 1 # Videos generated per prompt
guidance_scale: 3.0 # CFG guidance strengthKey parameters:
prompts: List of text prompts for validation video generationimages: List of image paths for image-to-video validation (must match number of prompts)interval: Steps between validation runs (set tonullto disable)inference_steps: Number of denoising steps for validation videosvideo_dims: Output video dimensions[width, height, frames]reference_videos: List of paths to reference videos. Required for IC-LoRA validation (must match number of prompts)
Model checkpointing configuration.
checkpoints:
interval: null # Steps between checkpoint saves (null = disabled)
keep_last_n: 5 # Number of recent checkpoints to retainKey parameters:
interval: Steps between intermediate checkpoint saves (set tonullto disable checkpoint saving)keep_last_n: Number of most recent checkpoints to keep (older ones are deleted)
Hugging Face Hub integration for automatic model uploads.
hub:
push_to_hub: false # Enable Hub uploading
hub_model_id: "username/model-name" # Hub repository IDKey parameters:
push_to_hub: Whether to automatically push trained models to Hugging Face Hubhub_model_id: Repository ID in format"username/repository-name"
Flow matching training configuration for timestep sampling.
flow_matching:
timestep_sampling_mode: "shifted_logit_normal" # Timestep sampling strategy
timestep_sampling_params: {} # Additional sampling parametersKey parameters:
timestep_sampling_mode: Sampling strategy -"uniform"or"shifted_logit_normal"timestep_sampling_params: Additional parameters for the sampling strategy
Once you've configured your training parameters:
- Set up your dataset using Dataset Preparation
- Choose your training approach in Training Modes
- Start training with the Training Guide