Fix Layerwise Casting #316

a-r-r-o-w · 2025-03-10T23:23:50Z

Fixes #312.

@dorpxam Could you give this branch a try with what you were trying in #312?

Important to note that:

this is "fake" fp8 training, i.e. it's not quantization-aware training and might produce worse results than bf16/fp32 training
it won't work on > 1 GPU training because of unimplemented cuda collective operations for torch.float8* types

a-r-r-o-w · 2025-03-10T23:34:05Z

One can refer to this thread for the discussion about fake fp8: #184 (comment)

True quantization-aware FP8 training is WIP, but I don't have the bandwidth to complete it at the moment

dorpxam · 2025-03-11T07:39:42Z

Of course. Let me see that, and I will report here.

dorpxam · 2025-03-11T08:11:24Z

Tested without --transformer_dtype float8_e4m3fn, just --layerwise_upcasting_modules transformer: ERROR - An error occurred during training: mat1 and mat2 must have the same dtype, but got Float8_e4m3fn and BFloat16 (same log than previous one)

Tested with --transformer_dtype float8_e4m3fn and --layerwise_upcasting_modules transformer, here is the full log (new error):

ERROR:finetrainers:Traceback (most recent call last):
  File "/home/dorpxam/ai/finetrainers/train.py", line 70, in main
    trainer.run()
  File "/home/dorpxam/ai/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 97, in run
    raise e
  File "/home/dorpxam/ai/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 92, in run
    self._train()
  File "/home/dorpxam/ai/finetrainers/finetrainers/trainer/sft_trainer/trainer.py", line 471, in _train
    pred, target, sigmas = self.model_specification.forward(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dorpxam/ai/finetrainers/finetrainers/models/wan/base_specification.py", line 302, in forward
    mu = self._normalize_latents(mu, latents_mean, latents_std)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dorpxam/ai/finetrainers/finetrainers/models/wan/base_specification.py", line 392, in _normalize_latents
    latents = ((latents.float() - latents_mean) * latents_std).to(latents)
                ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
RuntimeError: Promotion for Float8 Types is not supported, attempted to promote Float and Float8_e4m3fn

a-r-r-o-w · 2025-03-12T22:36:47Z

Okay, it seems like not all models work out-of-the-box with the fix. I'll test every supported one by one and update this thread by running the scripts in examples/

a-r-r-o-w · 2025-03-12T22:39:08Z

Running every example with the following setting appended (and modifying the script to run with DDP_1, so single GPU replication):

  --layerwise_upcasting_modules transformer
  --layerwise_upcasting_storage_dtype float8_e4m3fn

--transformer_dtype must still be torch.bfloat16 and should not be changed

a-r-r-o-w · 2025-03-12T22:40:30Z

CogView4 is working.

Logs

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation$ ./examples/training/sft/cogview4/raider_white_tarot/train.sh 
+ export WANDB_MODE=offline
+ WANDB_MODE=offline
+ export NCCL_P2P_DISABLE=1
+ NCCL_P2P_DISABLE=1
+ export TORCH_NCCL_ENABLE_MONITORING=0
+ TORCH_NCCL_ENABLE_MONITORING=0
+ export FINETRAINERS_LOG_LEVEL=DEBUG
+ FINETRAINERS_LOG_LEVEL=DEBUG
+ BACKEND=ptd
+ NUM_GPUS=1
+ CUDA_VISIBLE_DEVICES=3
+ TRAINING_DATASET_CONFIG=examples/training/sft/cogview4/raider_white_tarot/training.json
+ VALIDATION_DATASET_FILE=examples/training/sft/cogview4/raider_white_tarot/validation.json
+ DDP_1='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ FSDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ FSDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1'
+ HSDP_2_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ parallel_cmd=($DDP_1)
+ model_cmd=(--model_name "cogview4" --pretrained_model_name_or_path "THUDM/CogView4-6B")
+ dataset_cmd=(--dataset_config $TRAINING_DATASET_CONFIG --dataset_shuffle_buffer_size 32 --enable_precomputation --precomputation_items 120 --precomputation_once)
+ dataloader_cmd=(--dataloader_num_workers 0)
+ diffusion_cmd=(--flow_weighting_scheme "logit_normal")
+ training_cmd=(--training_type "lora" --seed 42 --batch_size 1 --train_steps 5000 --rank 32 --lora_alpha 32 --target_modules "transformer_blocks.*(to_q|to_k|to_v|to_out.0)" --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn)
+ optimizer_cmd=(--optimizer "adamw" --lr 3e-5 --lr_scheduler "constant_with_warmup" --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0)
+ validation_cmd=(--validation_dataset_file "$VALIDATION_DATASET_FILE" --validation_steps 500)
+ miscellaneous_cmd=(--tracker_name "finetrainers-cogview4" --output_dir "/raid/aryan/cogview4" --init_timeout 600 --nccl_timeout 600 --report_to "wandb")
+ '[' ptd == accelerate ']'
+ '[' ptd == ptd ']'
+ export CUDA_VISIBLE_DEVICES=3
+ CUDA_VISIBLE_DEVICES=3
+ torchrun --standalone --nnodes=1 --nproc_per_node=1 --rdzv_backend c10d --rdzv_endpoint=localhost:0 train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name cogview4 --pretrained_model_name_or_path THUDM/CogView4-6B --dataset_config examples/training/sft/cogview4/raider_white_tarot/training.json --dataset_shuffle_buffer_size 32 --enable_precomputation --precomputation_items 120 --precomputation_once --dataloader_num_workers 0 --flow_weighting_scheme logit_normal --training_type lora --seed 42 --batch_size 1 --train_steps 5000 --rank 32 --lora_alpha 32 --target_modules 'transformer_blocks.*(to_q|to_k|to_v|to_out.0)' --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --optimizer adamw --lr 3e-5 --lr_scheduler constant_with_warmup --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --validation_dataset_file examples/training/sft/cogview4/raider_white_tarot/validation.json --validation_steps 500 --tracker_name finetrainers-cogview4 --output_dir /raid/aryan/cogview4 --init_timeout 600 --nccl_timeout 600 --report_to wandb
2025-03-12 23:38:21,811 - finetrainers - DEBUG - Successfully imported bitsandbytes version 0.43.3
2025-03-12 23:38:21,817 - finetrainers - DEBUG - Remaining unparsed arguments: []
2025-03-12 23:38:22,405 - finetrainers - INFO - Initialized parallel state with:
  - World size: 1
  - Pipeline parallel degree: 1
  - Data parallel degree: 1
  - Context parallel degree: 1
  - Tensor parallel degree: 1
  - Data parallel shards: 1

2025-03-12 23:38:22,431 - finetrainers - DEBUG - Device mesh: DeviceMesh('cuda', 0)
2025-03-12 23:38:22,431 - finetrainers - DEBUG - Enabling determinism: {'global_rank': 0, 'seed': 42}
2025-03-12 23:38:22,434 - finetrainers - INFO - Initializing models
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 61.94it/s]
2025-03-12 23:38:22,930 - finetrainers - INFO - Initializing trainable parameters
2025-03-12 23:38:22,930 - finetrainers - INFO - Finetuning transformer with PEFT parameters
2025-03-12 23:38:25,277 - finetrainers - INFO - Initializing optimizer and lr scheduler
2025-03-12 23:38:25,282 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_optimizer completed!
2025-03-12 23:38:25,283 - finetrainers - INFO - Initialized FineTrainers
2025-03-12 23:38:25,283 - finetrainers - INFO - Initializing trackers: ['wandb']. Logging to log_dir='logs'
wandb: Tracking run with wandb version 0.17.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2025-03-12 23:38:26,412 - finetrainers - INFO - WandB logging enabled
2025-03-12 23:38:26,412 - finetrainers - INFO - Initializing dataset and dataloader
2025-03-12 23:38:26,412 - finetrainers - INFO - Training configured to use 3 datasets
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 144820.81it/s]
2025-03-12 23:38:27,223 - finetrainers - INFO - Initialized dataset: multimodalart/1920-raider-waite-tarot-public-domain
2025-03-12 23:38:27,223 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-12 23:38:27,224 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: image
  - ID Token: TRTCRD
  - Image Resolution Buckets: [[1280, 720]]
  - Video Resolution Buckets: None
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 121284.78it/s]
2025-03-12 23:38:27,857 - finetrainers - INFO - Initialized dataset: multimodalart/1920-raider-waite-tarot-public-domain
2025-03-12 23:38:27,857 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-12 23:38:27,858 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: image
  - ID Token: TRTCRD
  - Image Resolution Buckets: [[512, 512]]
  - Video Resolution Buckets: None
  - Reshape Mode: center_crop
  - Remove Common LLM Caption Prefixes: True

Resolving data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:00<00:00, 68602.49it/s]
2025-03-12 23:38:28,879 - finetrainers - INFO - Initialized dataset: multimodalart/1920-raider-waite-tarot-public-domain
2025-03-12 23:38:28,879 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-12 23:38:28,880 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: image
  - ID Token: TRTCRD
  - Image Resolution Buckets: [[768, 768]]
  - Video Resolution Buckets: None
  - Reshape Mode: center_crop
  - Remove Common LLM Caption Prefixes: True

2025-03-12 23:38:28,880 - finetrainers - INFO - Initializing IterableCombinedDataset with the following configuration:
  - Number of Datasets: 3
  - Buffer Size: 32
  - Shuffle: True

2025-03-12 23:38:28,880 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataloader completed!
2025-03-12 23:38:28,880 - finetrainers - INFO - Checkpointing enabled. Checkpoints will be stored in '/raid/aryan/cogview4'
2025-03-12 23:38:28,880 - finetrainers - INFO - Starting training
2025-03-12 23:38:28,881 - finetrainers - INFO - Memory before training start: {
    "memory_allocated": 6.719,
    "memory_reserved": 6.732,
    "max_memory_allocated": 6.719,
    "max_memory_reserved": 6.732
}
2025-03-12 23:38:28,881 - finetrainers - INFO - Training configuration: {
    "trainable parameters": 29360128,
    "train steps": 5000,
    "per-replica batch size": 1,
    "global batch size": 1,
    "gradient accumulation steps": 1
}
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/5000 [00:00<?, ?it/s]2025-03-12 23:38:28,895 - finetrainers - DEBUG - Deleting files: []
2025-03-12 23:38:28,895 - finetrainers - INFO - Precomputed condition & latent data exhausted. Loading & preprocessing new data.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 12681.19it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.23it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.26it/s2025-03-12 23:38:33,453 - finetrainers - INFO - Starting IterableCombinedDataset with 3 datasets                                                                                                                                                                                      | 0/120 [00:00<?, ?it/s]
                                                                                                                                                                                                                                                                                                            2025-03-12 23:38:33,454 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset
Filling buffer from data iterator 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 29.67it/s]
                                                                                                                                                                                                                                                                                                            2025-03-12 23:38:33,791 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset██████████████████████████████████████████████████████████████████████████████████████████████▌                                                                    | 7/10 [00:00<00:00, 29.47it/s]
Filling buffer from data iterator 1: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 36.11it/s]
                                                                                                                                                                                                                                                                                                            2025-03-12 23:38:34,069 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                             | 8/10 [00:00<00:00, 36.14it/s]
Filling buffer from data iterator 2: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 35.96it/s]
Processing data on rank 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:10<00:00, 11.63it/s]
Processing data on rank 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:14<00:00,  8.11it/s]
2025-03-12 23:38:59,486 - finetrainers - DEBUG - Starting training step (1/5000)███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [00:14<00:00,  9.07it/s]
Training steps:   0%|                                                                                                                                                                                      | 1/5000 [00:31<44:07:24, 31.78s/it, grad_norm=0.0556, global_avg_loss=1.46, global_max_loss=1.46]2025-03-12 23:39:00,661 - finetrainers - DEBUG - Starting training step (2/5000)
Training steps:   0%|                                                                                                                                                                                      | 2/5000 [00:32<18:38:48, 13.43s/it, grad_norm=0.0347, global_avg_loss=1.23, global_max_loss=1.23]2025-03-12 23:39:01,251 - finetrainers - DEBUG - Starting training step (3/5000)
Training steps:   0%|                                                                                                                                                                                       | 3/5000 [00:32<10:30:19,  7.57s/it, grad_norm=0.124, global_avg_loss=1.25, global_max_loss=1.25]2025-03-12 23:39:01,843 - finetrainers - DEBUG - Starting training step (4/5000)
Training steps:   0%|▏                                                                                                                                                                                       | 4/5000 [00:34<7:10:25,  5.17s/it, grad_norm=0.244, global_avg_loss=1.16, global_max_loss=1.16]2025-03-12 23:39:03,334 - finetrainers - DEBUG - Starting training step (5/5000)
Training steps:   0%|▏                                                                                                                                                                                       | 5/5000 [00:35<4:52:51,  3.52s/it, grad_norm=0.434, global_avg_loss=1.26, global_max_loss=1.26]2025-03-12 23:39:03,923 - finetrainers - DEBUG - Starting training step (6/5000)
Training steps:   0%|▏                                                                                                                                                                                    | 6/5000 [00:35<3:30:04,  2.52s/it, grad_norm=0.0376, global_avg_loss=0.872, global_max_loss=0.872]2025-03-12 23:39:04,518 - finetrainers - DEBUG - Starting training step (7/5000)
Training steps:   0%|▎                                                                                                                                                                                       | 7/5000 [00:36<2:37:23,  1.89s/it, grad_norm=0.121, global_avg_loss=1.21, global_max_loss=1.21]2025-03-12 23:39:05,107 - finetrainers - DEBUG - Starting training step (8/5000)
Training steps:   0%|▎                                                                                                                                                                                       | 8/5000 [00:37<2:26:53,  1.77s/it, grad_norm=0.034, global_avg_loss=1.33, global_max_loss=1.33]2025-03-12 23:39:06,603 - finetrainers - DEBUG - Starting training step (9/5000)
Training steps:   0%|▎                                                                                                                                                                                      | 9/5000 [00:38<1:56:22,  1.40s/it, grad_norm=0.0419, global_avg_loss=1.26, global_max_loss=1.26]2025-03-12 23:39:07,196 - finetrainers - DEBUG - Starting training step (10/5000)

a-r-r-o-w · 2025-03-12T22:45:02Z

CogVideoX is working.

Logs

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation$ ./examples/training/sft/cogvideox/crush_smol_lora/train.sh 
+ export WANDB_MODE=offline
+ WANDB_MODE=offline
+ export NCCL_P2P_DISABLE=1
+ NCCL_P2P_DISABLE=1
+ export TORCH_NCCL_ENABLE_MONITORING=0
+ TORCH_NCCL_ENABLE_MONITORING=0
+ export FINETRAINERS_LOG_LEVEL=DEBUG
+ FINETRAINERS_LOG_LEVEL=DEBUG
+ BACKEND=ptd
+ NUM_GPUS=1
+ CUDA_VISIBLE_DEVICES=2
+ TRAINING_DATASET_CONFIG=examples/training/sft/cogvideox/crush_smol_lora/training.json
+ VALIDATION_DATASET_FILE=examples/training/sft/cogvideox/crush_smol_lora/validation.json
+ DDP_1='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ FSDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ FSDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1'
+ HSDP_2_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ parallel_cmd=($DDP_1)
+ model_cmd=(--model_name "cogvideox" --pretrained_model_name_or_path "THUDM/CogVideoX1.5-5B")
+ dataset_cmd=(--dataset_config $TRAINING_DATASET_CONFIG)
+ dataloader_cmd=(--dataloader_num_workers 0)
+ diffusion_cmd=(--flow_weighting_scheme "logit_normal")
+ training_cmd=(--training_type "lora" --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules "(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0)" --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling)
+ optimizer_cmd=(--optimizer "adamw" --lr 5e-5 --lr_scheduler "constant_with_warmup" --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn)
+ validation_cmd=(--validation_dataset_file "$VALIDATION_DATASET_FILE" --validation_steps 500)
+ miscellaneous_cmd=(--tracker_name "finetrainers-cogvideox" --output_dir "/raid/aryan/cogvideox" --init_timeout 600 --nccl_timeout 600 --report_to "wandb")
+ '[' ptd == accelerate ']'
+ '[' ptd == ptd ']'
+ export CUDA_VISIBLE_DEVICES=2
+ CUDA_VISIBLE_DEVICES=2
+ torchrun --standalone --nnodes=1 --nproc_per_node=1 --rdzv_backend c10d --rdzv_endpoint=localhost:0 train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name cogvideox --pretrained_model_name_or_path THUDM/CogVideoX1.5-5B --dataset_config examples/training/sft/cogvideox/crush_smol_lora/training.json --dataloader_num_workers 0 --flow_weighting_scheme logit_normal --training_type lora --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules '(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0)' --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling --optimizer adamw --lr 5e-5 --lr_scheduler constant_with_warmup --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --validation_dataset_file examples/training/sft/cogvideox/crush_smol_lora/validation.json --validation_steps 500 --tracker_name finetrainers-cogvideox --output_dir /raid/aryan/cogvideox --init_timeout 600 --nccl_timeout 600 --report_to wandb
2025-03-12 23:42:20,409 - finetrainers - DEBUG - Successfully imported bitsandbytes version 0.43.3
2025-03-12 23:42:20,415 - finetrainers - DEBUG - Remaining unparsed arguments: []
model_index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 3.69MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 849/849 [00:00<00:00, 10.0MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 871/871 [00:00<00:00, 9.25MB/s]
2025-03-12 23:42:27,642 - finetrainers - INFO - Initialized parallel state with:
  - World size: 1
  - Pipeline parallel degree: 1
  - Data parallel degree: 1
  - Context parallel degree: 1
  - Tensor parallel degree: 1
  - Data parallel shards: 1

2025-03-12 23:42:27,671 - finetrainers - DEBUG - Device mesh: DeviceMesh('cuda', 0)
2025-03-12 23:42:27,671 - finetrainers - DEBUG - Enabling determinism: {'global_rank': 0, 'seed': 42}
2025-03-12 23:42:27,673 - finetrainers - INFO - Initializing models
(…)ion_pytorch_model.safetensors.index.json: 103kB [00:00, 124MB/s]
(…)pytorch_model-00001-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.98G/4.98G [00:07<00:00, 659MB/s]
(…)pytorch_model-00002-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.95G/4.95G [00:07<00:00, 676MB/s]
(…)pytorch_model-00003-of-00003.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1.22G/1.22G [00:03<00:00, 329MB/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 26.66it/s]
scheduler_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 482/482 [00:00<00:00, 3.87MB/s]
2025-03-12 23:42:48,858 - finetrainers - INFO - Initializing trainable parameters
2025-03-12 23:42:48,858 - finetrainers - INFO - Finetuning transformer with PEFT parameters
2025-03-12 23:42:50,863 - finetrainers - INFO - Initializing optimizer and lr scheduler
2025-03-12 23:42:50,871 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_optimizer completed!
2025-03-12 23:42:50,872 - finetrainers - INFO - Initialized FineTrainers
2025-03-12 23:42:50,872 - finetrainers - INFO - Initializing trackers: ['wandb']. Logging to log_dir='logs'
wandb: Tracking run with wandb version 0.17.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2025-03-12 23:42:52,012 - finetrainers - INFO - WandB logging enabled
2025-03-12 23:42:52,012 - finetrainers - INFO - Initializing dataset and dataloader
2025-03-12 23:42:52,012 - finetrainers - INFO - Training configured to use 1 datasets
2025-03-12 23:42:52,513 - finetrainers - INFO - Downloading dataset finetrainers/crush-smol from the HF Hub
2025-03-12 23:42:52,673 - finetrainers - INFO - Initialized dataset: finetrainers/crush-smol
2025-03-12 23:42:52,673 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-12 23:42:52,674 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: video
  - ID Token: PIKA_CRUSH
  - Image Resolution Buckets: None
  - Video Resolution Buckets: [[81, 480, 768]]
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

2025-03-12 23:42:52,674 - finetrainers - INFO - Initializing IterableCombinedDataset with the following configuration:
  - Number of Datasets: 1
  - Buffer Size: 1
  - Shuffle: True

2025-03-12 23:42:52,674 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataloader completed!
2025-03-12 23:42:52,674 - finetrainers - INFO - Checkpointing enabled. Checkpoints will be stored in '/raid/aryan/cogvideox'
2025-03-12 23:42:52,674 - finetrainers - INFO - Starting training
2025-03-12 23:42:52,675 - finetrainers - INFO - Memory before training start: {
    "memory_allocated": 6.067,
    "memory_reserved": 6.25,
    "max_memory_allocated": 6.067,
    "max_memory_reserved": 6.25
}
2025-03-12 23:42:52,675 - finetrainers - INFO - Training configuration: {
    "trainable parameters": 33030144,
    "train steps": 3000,
    "per-replica batch size": 1,
    "global batch size": 1,
    "gradient accumulation steps": 1
}
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/3000 [00:00<?, ?it/s]2025-03-12 23:42:52,693 - finetrainers - INFO - Precomputation disabled. Loading in-memory data loaders. All components will be loaded on GPUs.
tokenizer_config.json: 20.6kB [00:00, 71.3MB/s]
spiece.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 18.0MB/s]
added_tokens.json: 2.59kB [00:00, 12.6MB/s]                                                                                                                                                                                                                                       | 0.00/792k [00:00<?, ?B/s]
special_tokens_map.json: 2.54kB [00:00, 12.8MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 781/781 [00:00<00:00, 5.08MB/s]
model.safetensors.index.json: 19.9kB [00:00, 72.7MB/s]                                                                                                                                                                                                                             | 0.00/781 [00:00<?, ?B/s]
model-00001-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.99G/4.99G [00:08<00:00, 608MB/s]
model-00002-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:07<00:00, 627MB/s]
model-00003-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.87G/4.87G [00:07<00:00, 632MB/s]
model-00004-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.19G/4.19G [00:07<00:00, 540MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:32<00:00,  8.13s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.19it/s]
diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 862M/862M [00:02<00:00, 305MB/s]
2025-03-12 23:43:40,070 - finetrainers - INFO - Starting IterableCombinedDataset with 1 datasets██████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                            | 632M/862M [00:02<00:00, 830MB/s]
                                                                                                                                                                                                                                                                                                            2025-03-12 23:43:40,071 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Filling buffer from data iterator 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.71s/it]
2025-03-12 23:43:47,854 - finetrainers - DEBUG - Starting training step (1/3000)███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.71s/it]
Training steps:   0%|                                                                                                                                                                                      | 1/3000 [01:04<53:25:38, 64.13s/it, grad_norm=0.0116, global_avg_loss=0.18, global_max_loss=0.18]2025-03-12 23:44:03,210 - finetrainers - DEBUG - Starting training step (2/3000)
Training steps:   0%|                                                                                                                                                                                      | 2/3000 [01:19<29:26:24, 35.35s/it, grad_norm=0.02, global_avg_loss=0.151, global_max_loss=0.151]2025-03-12 23:44:19,148 - finetrainers - DEBUG - Starting training step (3/3000)
Training steps:   0%|▏                                                                                                                                                                                   | 3/3000 [01:35<22:04:39, 26.52s/it, grad_norm=0.0215, global_avg_loss=0.152, global_max_loss=0.152]2025-03-12 23:44:34,310 - finetrainers - DEBUG - Starting training step (4/3000)

a-r-r-o-w · 2025-03-12T22:51:39Z

LTXVideo is working.

Logs

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation$ ./examples/training/sft/ltx_video/crush_smol_lora/train.sh 
+ export WANDB_MODE=offline
+ WANDB_MODE=offline
+ export NCCL_P2P_DISABLE=1
+ NCCL_P2P_DISABLE=1
+ export TORCH_NCCL_ENABLE_MONITORING=0
+ TORCH_NCCL_ENABLE_MONITORING=0
+ export FINETRAINERS_LOG_LEVEL=DEBUG
+ FINETRAINERS_LOG_LEVEL=DEBUG
+ BACKEND=ptd
+ NUM_GPUS=1
+ CUDA_VISIBLE_DEVICES=3
+ TRAINING_DATASET_CONFIG=examples/training/sft/ltx_video/crush_smol_lora/training.json
+ VALIDATION_DATASET_FILE=examples/training/sft/ltx_video/crush_smol_lora/validation.json
+ DDP_1='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ FSDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ FSDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1'
+ HSDP_2_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ parallel_cmd=($DDP_1)
+ model_cmd=(--model_name "ltx_video" --pretrained_model_name_or_path "a-r-r-o-w/LTX-Video-diffusers")
+ dataset_cmd=(--dataset_config $TRAINING_DATASET_CONFIG)
+ dataloader_cmd=(--dataloader_num_workers 0)
+ diffusion_cmd=(--flow_weighting_scheme "logit_normal")
+ training_cmd=(--training_type "lora" --seed 42 --batch_size 1 --train_steps 5000 --rank 32 --lora_alpha 32 --target_modules "(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0)" --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn)
+ optimizer_cmd=(--optimizer "adamw" --lr 5e-5 --lr_scheduler "constant_with_warmup" --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0)
+ validation_cmd=(--validation_dataset_file "$VALIDATION_DATASET_FILE" --validation_steps 500)
+ miscellaneous_cmd=(--tracker_name "finetrainers-ltxvideo" --output_dir "/raid/aryan/ltx-video" --init_timeout 600 --nccl_timeout 600 --report_to "wandb")
+ '[' ptd == accelerate ']'
+ '[' ptd == ptd ']'
+ export CUDA_VISIBLE_DEVICES=3
+ CUDA_VISIBLE_DEVICES=3
+ torchrun --standalone --nnodes=1 --nproc_per_node=1 --rdzv_backend c10d --rdzv_endpoint=localhost:0 train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name ltx_video --pretrained_model_name_or_path a-r-r-o-w/LTX-Video-diffusers --dataset_config examples/training/sft/ltx_video/crush_smol_lora/training.json --dataloader_num_workers 0 --flow_weighting_scheme logit_normal --training_type lora --seed 42 --batch_size 1 --train_steps 5000 --rank 32 --lora_alpha 32 --target_modules '(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0)' --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --optimizer adamw --lr 5e-5 --lr_scheduler constant_with_warmup --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --validation_dataset_file examples/training/sft/ltx_video/crush_smol_lora/validation.json --validation_steps 500 --tracker_name finetrainers-ltxvideo --output_dir /raid/aryan/ltx-video --init_timeout 600 --nccl_timeout 600 --report_to wandb
2025-03-12 23:49:32,027 - finetrainers - DEBUG - Successfully imported bitsandbytes version 0.43.3
2025-03-12 23:49:32,033 - finetrainers - DEBUG - Remaining unparsed arguments: []
model_index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 412/412 [00:00<00:00, 3.85MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 4.02MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 502/502 [00:00<00:00, 4.28MB/s]
2025-03-12 23:49:39,932 - finetrainers - INFO - Initialized parallel state with:
  - World size: 1
  - Pipeline parallel degree: 1
  - Data parallel degree: 1
  - Context parallel degree: 1
  - Tensor parallel degree: 1
  - Data parallel shards: 1

2025-03-12 23:49:39,959 - finetrainers - DEBUG - Device mesh: DeviceMesh('cuda', 0)
2025-03-12 23:49:39,959 - finetrainers - DEBUG - Enabling determinism: {'global_rank': 0, 'seed': 42}
2025-03-12 23:49:39,963 - finetrainers - INFO - Initializing models
(…)ion_pytorch_model.safetensors.index.json: 72.1kB [00:00, 98.1MB/s]
(…)pytorch_model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.94G/4.94G [00:08<00:00, 600MB/s]
(…)pytorch_model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2.75G/2.75G [00:06<00:00, 441MB/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  5.30it/s]
2025-03-12 23:49:56,536 - finetrainers - INFO - Initializing trainable parameters
2025-03-12 23:49:56,536 - finetrainers - INFO - Finetuning transformer with PEFT parameters
2025-03-12 23:49:57,733 - finetrainers - INFO - Initializing optimizer and lr scheduler
2025-03-12 23:49:57,740 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_optimizer completed!
2025-03-12 23:49:57,741 - finetrainers - INFO - Initialized FineTrainers
2025-03-12 23:49:57,741 - finetrainers - INFO - Initializing trackers: ['wandb']. Logging to log_dir='logs'
wandb: Tracking run with wandb version 0.17.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2025-03-12 23:49:58,888 - finetrainers - INFO - WandB logging enabled
2025-03-12 23:49:58,888 - finetrainers - INFO - Initializing dataset and dataloader
2025-03-12 23:49:58,888 - finetrainers - INFO - Training configured to use 1 datasets
2025-03-12 23:49:59,209 - finetrainers - INFO - Downloading dataset finetrainers/crush-smol from the HF Hub
2025-03-12 23:49:59,368 - finetrainers - INFO - Initialized dataset: finetrainers/crush-smol
2025-03-12 23:49:59,369 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-12 23:49:59,369 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: video
  - ID Token: PIKA_CRUSH
  - Image Resolution Buckets: None
  - Video Resolution Buckets: [[49, 512, 768]]
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

2025-03-12 23:49:59,369 - finetrainers - INFO - Initializing IterableCombinedDataset with the following configuration:
  - Number of Datasets: 1
  - Buffer Size: 1
  - Shuffle: True

2025-03-12 23:49:59,370 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataloader completed!
2025-03-12 23:49:59,370 - finetrainers - INFO - Checkpointing enabled. Checkpoints will be stored in '/raid/aryan/ltx-video'
2025-03-12 23:49:59,370 - finetrainers - INFO - Starting training
2025-03-12 23:49:59,371 - finetrainers - INFO - Memory before training start: {
    "memory_allocated": 1.902,
    "memory_reserved": 1.902,
    "max_memory_allocated": 1.902,
    "max_memory_reserved": 1.902
}
2025-03-12 23:49:59,371 - finetrainers - INFO - Training configuration: {
    "trainable parameters": 29360128,
    "train steps": 5000,
    "per-replica batch size": 1,
    "global batch size": 1,
    "gradient accumulation steps": 1
}
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/5000 [00:00<?, ?it/s]2025-03-12 23:49:59,390 - finetrainers - INFO - Precomputation disabled. Loading in-memory data loaders. All components will be loaded on GPUs.
tokenizer_config.json: 20.6kB [00:00, 61.0MB/s]
spiece.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 18.0MB/s]
added_tokens.json: 2.59kB [00:00, 12.1MB/s]                                                                                                                                                                                                                                       | 0.00/792k [00:00<?, ?B/s]
special_tokens_map.json: 2.54kB [00:00, 13.2MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 781/781 [00:00<00:00, 5.06MB/s]
model.safetensors.index.json: 19.9kB [00:00, 69.3MB/s]                                                                                                                                                                                                                             | 0.00/781 [00:00<?, ?B/s]
model-00001-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.99G/4.99G [00:08<00:00, 596MB/s]
model-00002-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:08<00:00, 561MB/s]
model-00003-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.87G/4.87G [00:08<00:00, 541MB/s]
model-00004-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.19G/4.19G [00:07<00:00, 566MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:34<00:00,  8.66s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.73it/s]
diffusion_pytorch_model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1.68G/1.68G [00:04<00:00, 371MB/s]
2025-03-12 23:50:50,313 - finetrainers - INFO - Starting IterableCombinedDataset with 1 datasets████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍            | 1.58G/1.68G [00:04<00:00, 956MB/s]
                                                                                                                                                                                                                                                                                                            2025-03-12 23:50:50,314 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Filling buffer from data iterator 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.55s/it]
2025-03-12 23:50:52,833 - finetrainers - DEBUG - Starting training step (1/5000)███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.55s/it]
Training steps:   0%|                                                                                                                                                                                   | 1/5000 [00:54<75:13:33, 54.17s/it, grad_norm=0.00524, global_avg_loss=0.443, global_max_loss=0.443]2025-03-12 23:50:54,745 - finetrainers - DEBUG - Starting training step (2/5000)
Training steps:   0%|                                                                                                                                                                                   | 2/5000 [00:55<32:25:45, 23.36s/it, grad_norm=0.00645, global_avg_loss=0.353, global_max_loss=0.353]2025-03-12 23:50:57,175 - finetrainers - DEBUG - Starting training step (3/5000)
Training steps:   0%|                                                                                                                                                                                   | 3/5000 [00:58<19:09:36, 13.80s/it, grad_norm=0.00638, global_avg_loss=0.562, global_max_loss=0.562]2025-03-12 23:50:58,857 - finetrainers - DEBUG - Starting training step (4/5000)
Training steps:   0%|▏                                                                                                                                                                                  | 4/5000 [01:00<12:31:00,  9.02s/it, grad_norm=0.00822, global_avg_loss=0.487, global_max_loss=0.487]2025-03-12 23:51:00,895 - finetrainers - DEBUG - Starting training step (5/5000)
Training steps:   0%|▏                                                                                                                                                                                   | 5/5000 [01:02<9:01:05,  6.50s/it, grad_norm=0.00516, global_avg_loss=0.436, global_max_loss=0.436]2025-03-12 23:51:03,063 - finetrainers - DEBUG - Starting training step (6/5000)
Training steps:   0%|▏                                                                                                                                                                                   | 6/5000 [01:04<6:58:23,  5.03s/it, grad_norm=0.00502, global_avg_loss=0.388, global_max_loss=0.388]2025-03-12 23:51:04,718 - finetrainers - DEBUG - Starting training step (7/5000)

a-r-r-o-w · 2025-03-12T22:52:27Z

Oh, I just realized I'm not testing validation and checkpointing yet 😭 oops

a-r-r-o-w · 2025-03-12T22:59:46Z

Okay LTXVideo is working with validation as well 💯 I'm going to assume CogView4/CogVideoX also work since it makes sense to me logically that if forward pass can run during training, there shouldn't be a reason for it to fail for validation (I'm lazy after a full day of work, sorry :p)

Logs

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation$ ./examples/training/sft/ltx_video/crush_smol_lora/train.sh 
+ export WANDB_MODE=offline
+ WANDB_MODE=offline
+ export NCCL_P2P_DISABLE=1
+ NCCL_P2P_DISABLE=1
+ export TORCH_NCCL_ENABLE_MONITORING=0
+ TORCH_NCCL_ENABLE_MONITORING=0
+ export FINETRAINERS_LOG_LEVEL=DEBUG
+ FINETRAINERS_LOG_LEVEL=DEBUG
+ BACKEND=ptd
+ NUM_GPUS=1
+ CUDA_VISIBLE_DEVICES=3
+ TRAINING_DATASET_CONFIG=examples/training/sft/ltx_video/crush_smol_lora/training.json
+ VALIDATION_DATASET_FILE=examples/training/sft/ltx_video/crush_smol_lora/validation.json
+ DDP_1='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ FSDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ FSDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1'
+ HSDP_2_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ parallel_cmd=($DDP_1)
+ model_cmd=(--model_name "ltx_video" --pretrained_model_name_or_path "a-r-r-o-w/LTX-Video-diffusers")
+ dataset_cmd=(--dataset_config $TRAINING_DATASET_CONFIG)
+ dataloader_cmd=(--dataloader_num_workers 0)
+ diffusion_cmd=(--flow_weighting_scheme "logit_normal")
+ training_cmd=(--training_type "lora" --seed 42 --batch_size 1 --train_steps 5000 --rank 32 --lora_alpha 32 --target_modules "(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0)" --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn)
+ optimizer_cmd=(--optimizer "adamw" --lr 5e-5 --lr_scheduler "constant_with_warmup" --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0)
+ validation_cmd=(--validation_dataset_file "$VALIDATION_DATASET_FILE" --validation_steps 2)
+ miscellaneous_cmd=(--tracker_name "finetrainers-ltxvideo" --output_dir "/raid/aryan/ltx-video" --init_timeout 600 --nccl_timeout 600 --report_to "wandb")
+ '[' ptd == accelerate ']'
+ '[' ptd == ptd ']'
+ export CUDA_VISIBLE_DEVICES=3
+ CUDA_VISIBLE_DEVICES=3
+ torchrun --standalone --nnodes=1 --nproc_per_node=1 --rdzv_backend c10d --rdzv_endpoint=localhost:0 train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name ltx_video --pretrained_model_name_or_path a-r-r-o-w/LTX-Video-diffusers --dataset_config examples/training/sft/ltx_video/crush_smol_lora/training.json --dataloader_num_workers 0 --flow_weighting_scheme logit_normal --training_type lora --seed 42 --batch_size 1 --train_steps 5000 --rank 32 --lora_alpha 32 --target_modules '(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0)' --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 1000 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --optimizer adamw --lr 5e-5 --lr_scheduler constant_with_warmup --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --validation_dataset_file examples/training/sft/ltx_video/crush_smol_lora/validation.json --validation_steps 2 --tracker_name finetrainers-ltxvideo --output_dir /raid/aryan/ltx-video --init_timeout 600 --nccl_timeout 600 --report_to wandb
2025-03-12 23:58:28,687 - finetrainers - DEBUG - Successfully imported bitsandbytes version 0.43.3
2025-03-12 23:58:28,693 - finetrainers - DEBUG - Remaining unparsed arguments: []
2025-03-12 23:58:29,525 - finetrainers - INFO - Initialized parallel state with:
  - World size: 1
  - Pipeline parallel degree: 1
  - Data parallel degree: 1
  - Context parallel degree: 1
  - Tensor parallel degree: 1
  - Data parallel shards: 1

2025-03-12 23:58:29,551 - finetrainers - DEBUG - Device mesh: DeviceMesh('cuda', 0)
2025-03-12 23:58:29,551 - finetrainers - DEBUG - Enabling determinism: {'global_rank': 0, 'seed': 42}
2025-03-12 23:58:29,554 - finetrainers - INFO - Initializing models
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.82it/s]
2025-03-12 23:58:30,618 - finetrainers - INFO - Initializing trainable parameters
2025-03-12 23:58:30,618 - finetrainers - INFO - Finetuning transformer with PEFT parameters
2025-03-12 23:58:31,810 - finetrainers - INFO - Initializing optimizer and lr scheduler
2025-03-12 23:58:31,818 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_optimizer completed!
2025-03-12 23:58:31,819 - finetrainers - INFO - Initialized FineTrainers
2025-03-12 23:58:31,819 - finetrainers - INFO - Initializing trackers: ['wandb']. Logging to log_dir='logs'
wandb: Tracking run with wandb version 0.17.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2025-03-12 23:58:32,957 - finetrainers - INFO - WandB logging enabled
2025-03-12 23:58:32,957 - finetrainers - INFO - Initializing dataset and dataloader
2025-03-12 23:58:32,957 - finetrainers - INFO - Training configured to use 1 datasets
2025-03-12 23:58:33,310 - finetrainers - INFO - Downloading dataset finetrainers/crush-smol from the HF Hub
2025-03-12 23:58:33,500 - finetrainers - INFO - Initialized dataset: finetrainers/crush-smol
2025-03-12 23:58:33,501 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-12 23:58:33,501 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: video
  - ID Token: PIKA_CRUSH
  - Image Resolution Buckets: None
  - Video Resolution Buckets: [[49, 512, 768]]
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

2025-03-12 23:58:33,501 - finetrainers - INFO - Initializing IterableCombinedDataset with the following configuration:
  - Number of Datasets: 1
  - Buffer Size: 1
  - Shuffle: True

2025-03-12 23:58:33,502 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataloader completed!
2025-03-12 23:58:33,502 - finetrainers - INFO - Checkpointing enabled. Checkpoints will be stored in '/raid/aryan/ltx-video'
2025-03-12 23:58:33,502 - finetrainers - INFO - Starting training
2025-03-12 23:58:33,503 - finetrainers - INFO - Memory before training start: {
    "memory_allocated": 1.902,
    "memory_reserved": 1.902,
    "max_memory_allocated": 1.902,
    "max_memory_reserved": 1.902
}
2025-03-12 23:58:33,503 - finetrainers - INFO - Training configuration: {
    "trainable parameters": 29360128,
    "train steps": 5000,
    "per-replica batch size": 1,
    "global batch size": 1,
    "gradient accumulation steps": 1
}
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/5000 [00:00<?, ?it/s]2025-03-12 23:58:33,521 - finetrainers - INFO - Precomputation disabled. Loading in-memory data loaders. All components will be loaded on GPUs.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 12758.34it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.69it/s]
2025-03-12 23:58:40,953 - finetrainers - INFO - Starting IterableCombinedDataset with 1 datasets███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.65it/s]
                                                                                                                                                                                                                                                                                                            2025-03-12 23:58:40,954 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Filling buffer from data iterator 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.53s/it]
2025-03-12 23:58:43,446 - finetrainers - DEBUG - Starting training step (1/5000)███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.53s/it]
Training steps:   0%|                                                                                                                                                                                   | 1/5000 [00:10<14:47:27, 10.65s/it, grad_norm=0.00523, global_avg_loss=0.443, global_max_loss=0.443]2025-03-12 23:58:45,338 - finetrainers - DEBUG - Starting training step (2/5000)
Training steps:   0%|                                                                                                                                                                                    | 2/5000 [00:12<7:32:02,  5.43s/it, grad_norm=0.00645, global_avg_loss=0.353, global_max_loss=0.353]2025-03-12 23:58:45,926 - finetrainers - INFO - Starting validation
2025-03-12 23:58:46,207 - finetrainers - INFO - Memory before validation start: {
    "memory_allocated": 11.817,
    "memory_reserved": 14.297,
    "max_memory_allocated": 13.289,
    "max_memory_reserved": 14.297
}
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 113.33it/s]
2025-03-12 23:58:46,643 - finetrainers - DEBUG - Validating validation_data=[{'caption': 'PIKA_CRUSH A red toy car is being crushed by a large hydraulic press, which is flattening objects as if they were under a hydraulic press.', 'num_inference_steps': 50, 'height': 512, 'width': 768, 'num_frames': 49, 'frame_rate': 25, 'prompt': 'PIKA_CRUSH A red toy car is being crushed by a large hydraulic press, which is flattening objects as if they were under a hydraulic press.'}] on rank=0.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:14<00:00,  3.49it/s]
2025-03-12 23:59:01,832 - finetrainers - DEBUG - Saving video from rank=0 to /raid/aryan/ltx-video/validation-2-0-2-PIKA_CRUSH-A-red-toy-car--1741820341.mp4█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:14<00:00,  3.48it/s]
2025-03-12 23:59:02,254 - finetrainers - DEBUG - Validating validation_data=[{'caption': 'PIKA_CRUSH A green cube is being compressed by a hydraulic press, which flattens the object as if it were under a hydraulic press. The press is shown in action, with the cube being squeezed into a smaller shape.', 'num_inference_steps': 50, 'height': 512, 'width': 768, 'num_frames': 49, 'frame_rate': 25, 'prompt': 'PIKA_CRUSH A green cube is being compressed by a hydraulic press, which flattens the object as if it were under a hydraulic press. The press is shown in action, with the cube being squeezed into a smaller shape.'}] on rank=0.

 62%|██████████

a-r-r-o-w · 2025-03-12T23:03:25Z

Wan fails. Looking into it 🕵️‍♂️

Logs

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation$ ./examples/training/sft/wan/crush_smol_lora/train.sh 
+ export WANDB_MODE=offline
+ WANDB_MODE=offline
+ export NCCL_P2P_DISABLE=1
+ NCCL_P2P_DISABLE=1
+ export TORCH_NCCL_ENABLE_MONITORING=0
+ TORCH_NCCL_ENABLE_MONITORING=0
+ export FINETRAINERS_LOG_LEVEL=DEBUG
+ FINETRAINERS_LOG_LEVEL=DEBUG
+ BACKEND=ptd
+ NUM_GPUS=1
+ CUDA_VISIBLE_DEVICES=3
+ TRAINING_DATASET_CONFIG=examples/training/sft/wan/crush_smol_lora/training.json
+ VALIDATION_DATASET_FILE=examples/training/sft/wan/crush_smol_lora/validation.json
+ DDP_1='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ FSDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ FSDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1'
+ HSDP_2_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ parallel_cmd=($DDP_1)
+ model_cmd=(--model_name "wan" --pretrained_model_name_or_path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers")
+ dataset_cmd=(--dataset_config $TRAINING_DATASET_CONFIG)
+ dataloader_cmd=(--dataloader_num_workers 0)
+ diffusion_cmd=(--flow_weighting_scheme "logit_normal")
+ training_cmd=(--training_type "lora" --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules "blocks.*(to_q|to_k|to_v|to_out.0)" --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 500 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn)
+ optimizer_cmd=(--optimizer "adamw" --lr 5e-5 --lr_scheduler "constant_with_warmup" --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0)
+ validation_cmd=(--validation_dataset_file "$VALIDATION_DATASET_FILE" --validation_steps 2)
+ miscellaneous_cmd=(--tracker_name "finetrainers-wan" --output_dir "/raid/aryan/wan" --init_timeout 600 --nccl_timeout 600 --report_to "wandb")
+ '[' ptd == accelerate ']'
+ '[' ptd == ptd ']'
+ export CUDA_VISIBLE_DEVICES=3
+ CUDA_VISIBLE_DEVICES=3
+ torchrun --standalone --nnodes=1 --nproc_per_node=1 --rdzv_backend c10d --rdzv_endpoint=localhost:0 train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name wan --pretrained_model_name_or_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers --dataset_config examples/training/sft/wan/crush_smol_lora/training.json --dataloader_num_workers 0 --flow_weighting_scheme logit_normal --training_type lora --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules 'blocks.*(to_q|to_k|to_v|to_out.0)' --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 500 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --optimizer adamw --lr 5e-5 --lr_scheduler constant_with_warmup --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --validation_dataset_file examples/training/sft/wan/crush_smol_lora/validation.json --validation_steps 2 --tracker_name finetrainers-wan --output_dir /raid/aryan/wan --init_timeout 600 --nccl_timeout 600 --report_to wandb
2025-03-13 00:01:04,583 - finetrainers - DEBUG - Successfully imported bitsandbytes version 0.43.3
2025-03-13 00:01:04,590 - finetrainers - DEBUG - Remaining unparsed arguments: []
2025-03-13 00:01:05,362 - finetrainers - INFO - Initialized parallel state with:
  - World size: 1
  - Pipeline parallel degree: 1
  - Data parallel degree: 1
  - Context parallel degree: 1
  - Tensor parallel degree: 1
  - Data parallel shards: 1

2025-03-13 00:01:05,387 - finetrainers - DEBUG - Device mesh: DeviceMesh('cuda', 0)
2025-03-13 00:01:05,388 - finetrainers - DEBUG - Enabling determinism: {'global_rank': 0, 'seed': 42}
2025-03-13 00:01:05,390 - finetrainers - INFO - Initializing models
(…)ion_pytorch_model.safetensors.index.json: 73.3kB [00:00, 154MB/s]
(…)pytorch_model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:05<00:00, 979MB/s]
(…)pytorch_model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 677M/677M [00:01<00:00, 550MB/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  8.05it/s]
2025-03-13 00:01:13,613 - finetrainers - INFO - Initializing trainable parameters
2025-03-13 00:01:13,613 - finetrainers - INFO - Finetuning transformer with PEFT parameters
2025-03-13 00:01:14,797 - finetrainers - INFO - Initializing optimizer and lr scheduler
2025-03-13 00:01:14,806 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_optimizer completed!
2025-03-13 00:01:14,806 - finetrainers - INFO - Initialized FineTrainers
2025-03-13 00:01:14,807 - finetrainers - INFO - Initializing trackers: ['wandb']. Logging to log_dir='logs'
wandb: Tracking run with wandb version 0.17.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2025-03-13 00:01:15,948 - finetrainers - INFO - WandB logging enabled
2025-03-13 00:01:15,949 - finetrainers - INFO - Initializing dataset and dataloader
2025-03-13 00:01:15,949 - finetrainers - INFO - Training configured to use 1 datasets
2025-03-13 00:01:16,335 - finetrainers - INFO - Downloading dataset finetrainers/crush-smol from the HF Hub
2025-03-13 00:01:16,511 - finetrainers - INFO - Initialized dataset: finetrainers/crush-smol
2025-03-13 00:01:16,512 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-13 00:01:16,512 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: video
  - ID Token: PIKA_CRUSH
  - Image Resolution Buckets: None
  - Video Resolution Buckets: [[49, 480, 832]]
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

2025-03-13 00:01:16,512 - finetrainers - INFO - Initializing IterableCombinedDataset with the following configuration:
  - Number of Datasets: 1
  - Buffer Size: 1
  - Shuffle: True

2025-03-13 00:01:16,512 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataloader completed!
2025-03-13 00:01:16,513 - finetrainers - INFO - Checkpointing enabled. Checkpoints will be stored in '/raid/aryan/wan'
2025-03-13 00:01:16,513 - finetrainers - INFO - Starting training
2025-03-13 00:01:16,513 - finetrainers - INFO - Memory before training start: {
    "memory_allocated": 1.462,
    "memory_reserved": 1.527,
    "max_memory_allocated": 1.462,
    "max_memory_reserved": 1.527
}
2025-03-13 00:01:16,514 - finetrainers - INFO - Training configuration: {
    "trainable parameters": 23592960,
    "train steps": 3000,
    "per-replica batch size": 1,
    "global batch size": 1,
    "gradient accumulation steps": 1
}
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/3000 [00:00<?, ?it/s]2025-03-13 00:01:16,534 - finetrainers - INFO - Precomputation disabled. Loading in-memory data loaders. All components will be loaded on GPUs.
tokenizer_config.json: 61.8kB [00:00, 25.3MB/s]
spiece.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.55M/4.55M [00:00<00:00, 50.9MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.8M/16.8M [00:00<00:00, 39.7MB/s]
special_tokens_map.json: 7.08kB [00:00, 24.0MB/s]███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                                           | 10.5M/16.8M [00:00<00:00, 32.5MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 854/854 [00:00<00:00, 5.15MB/s]
model.safetensors.index.json: 22.5kB [00:00, 65.0MB/s]                                                                                                                                                                                                                             | 0.00/854 [00:00<?, ?B/s]
model-00001-of-00005.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.97G/4.97G [00:06<00:00, 711MB/s]
model-00002-of-00005.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.90G/4.90G [00:04<00:00, 983MB/s]
model-00003-of-00005.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.97G/4.97G [00:05<00:00, 990MB/s]
model-00004-of-00005.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:07<00:00, 712MB/s]
model-00005-of-00005.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2.89G/2.89G [00:03<00:00, 896MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:29<00:00,  5.82s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  2.66it/s]
diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 508M/508M [00:01<00:00, 406MB/s]
2025-03-13 00:02:22,774 - finetrainers - INFO - Starting IterableCombinedDataset with 1 datasets██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 508M/508M [00:01<00:00, 754MB/s]
                                                                                                                                                                                                                                                                                                            2025-03-13 00:02:22,774 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Filling buffer from data iterator 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.72s/it]
2025-03-13 00:02:26,543 - finetrainers - DEBUG - Starting training step (1/3000)███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.72s/it]
2025-03-13 00:02:26,560 - finetrainers - ERROR - Error during training: mat1 and mat2 must have the same dtype, but got Float8_e4m3fn and BFloat16
wandb:                                                                                
wandb: You can sync this run to the cloud by running:
wandb: wandb sync logs/wandb/offline-run-20250313_000115-ww7ryrbk
wandb: Find logs at: logs/wandb/offline-run-20250313_000115-ww7ryrbk/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
2025-03-13 00:02:28,318 - finetrainers - ERROR - An error occurred during training: mat1 and mat2 must have the same dtype, but got Float8_e4m3fn and BFloat16
2025-03-13 00:02:28,320 - finetrainers - ERROR - Traceback (most recent call last):
  File "/raid/aryan/cogvideox-distillation/train.py", line 70, in main
    trainer.run()
  File "/raid/aryan/cogvideox-distillation/finetrainers/trainer/sft_trainer/trainer.py", line 97, in run
    raise e
  File "/raid/aryan/cogvideox-distillation/finetrainers/trainer/sft_trainer/trainer.py", line 92, in run
    self._train()
  File "/raid/aryan/cogvideox-distillation/finetrainers/trainer/sft_trainer/trainer.py", line 471, in _train
    pred, target, sigmas = self.model_specification.forward(
  File "/raid/aryan/cogvideox-distillation/finetrainers/models/wan/base_specification.py", line 316, in forward
    pred = transformer(
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/aryan/work/diffusers/src/diffusers/models/transformers/transformer_wan.py", line 424, in forward
    temb, timestep_proj, encoder_hidden_states, encoder_hidden_states_image = self.condition_embedder(
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/aryan/work/diffusers/src/diffusers/models/transformers/transformer_wan.py", line 156, in forward
    temb = self.time_embedder(timestep).type_as(encoder_hidden_states)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/aryan/work/diffusers/src/diffusers/models/embeddings.py", line 1305, in forward
    sample = self.linear_1(sample)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/aryan/work/diffusers/src/diffusers/hooks/hooks.py", line 148, in new_forward
    output = function_reference.forward(*args, **kwargs)
  File "/raid/aryan/nightly-venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype, but got Float8_e4m3fn and BFloat16

/raid/aryan/nightly-venv/lib/python3.10/site-packages/wandb/sdk/wandb_run.py:2378: UserWarning: Run (ww7ryrbk) is finished. The call to `_console_raw_callback` will be ignored. Please make sure that you are using an active run.
  lambda data: self._console_raw_callback("stderr", data),
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/3000 [01:11<?, ?it/s]
+ echo -ne '-------------------- Finished executing script --------------------\n\n'
-------------------- Finished executing script --------------------

a-r-r-o-w · 2025-03-12T23:22:07Z

Wan is working now.

Logs

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation$ ./examples/training/sft/wan/crush_smol_lora/train.sh 
+ export WANDB_MODE=offline
+ WANDB_MODE=offline
+ export NCCL_P2P_DISABLE=1
+ NCCL_P2P_DISABLE=1
+ export TORCH_NCCL_ENABLE_MONITORING=0
+ TORCH_NCCL_ENABLE_MONITORING=0
+ export FINETRAINERS_LOG_LEVEL=DEBUG
+ FINETRAINERS_LOG_LEVEL=DEBUG
+ BACKEND=ptd
+ NUM_GPUS=1
+ CUDA_VISIBLE_DEVICES=3
+ TRAINING_DATASET_CONFIG=examples/training/sft/wan/crush_smol_lora/training.json
+ VALIDATION_DATASET_FILE=examples/training/sft/wan/crush_smol_lora/validation.json
+ DDP_1='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ FSDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ FSDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1'
+ HSDP_2_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ parallel_cmd=($DDP_1)
+ model_cmd=(--model_name "wan" --pretrained_model_name_or_path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers")
+ dataset_cmd=(--dataset_config $TRAINING_DATASET_CONFIG)
+ dataloader_cmd=(--dataloader_num_workers 0)
+ diffusion_cmd=(--flow_weighting_scheme "logit_normal")
+ training_cmd=(--training_type "lora" --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules "blocks.*(to_q|to_k|to_v|to_out.0)" --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 500 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn)
+ optimizer_cmd=(--optimizer "adamw" --lr 5e-5 --lr_scheduler "constant_with_warmup" --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0)
+ validation_cmd=(--validation_dataset_file "$VALIDATION_DATASET_FILE" --validation_steps 2)
+ miscellaneous_cmd=(--tracker_name "finetrainers-wan" --output_dir "/raid/aryan/wan" --init_timeout 600 --nccl_timeout 600 --report_to "wandb")
+ '[' ptd == accelerate ']'
+ '[' ptd == ptd ']'
+ export CUDA_VISIBLE_DEVICES=3
+ CUDA_VISIBLE_DEVICES=3
+ torchrun --standalone --nnodes=1 --nproc_per_node=1 --rdzv_backend c10d --rdzv_endpoint=localhost:0 train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name wan --pretrained_model_name_or_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers --dataset_config examples/training/sft/wan/crush_smol_lora/training.json --dataloader_num_workers 0 --flow_weighting_scheme logit_normal --training_type lora --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules 'blocks.*(to_q|to_k|to_v|to_out.0)' --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 500 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --optimizer adamw --lr 5e-5 --lr_scheduler constant_with_warmup --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --validation_dataset_file examples/training/sft/wan/crush_smol_lora/validation.json --validation_steps 2 --tracker_name finetrainers-wan --output_dir /raid/aryan/wan --init_timeout 600 --nccl_timeout 600 --report_to wandb
2025-03-13 00:17:50,233 - finetrainers - DEBUG - Successfully imported bitsandbytes version 0.43.3
2025-03-13 00:17:50,240 - finetrainers - DEBUG - Remaining unparsed arguments: []
2025-03-13 00:17:50,778 - finetrainers - INFO - Initialized parallel state with:
  - World size: 1
  - Pipeline parallel degree: 1
  - Data parallel degree: 1
  - Context parallel degree: 1
  - Tensor parallel degree: 1
  - Data parallel shards: 1

2025-03-13 00:17:50,803 - finetrainers - DEBUG - Device mesh: DeviceMesh('cuda', 0)
2025-03-13 00:17:50,803 - finetrainers - DEBUG - Enabling determinism: {'global_rank': 0, 'seed': 42}
2025-03-13 00:17:50,806 - finetrainers - INFO - Initializing models
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.19it/s]
2025-03-13 00:17:51,650 - finetrainers - INFO - Initializing trainable parameters
2025-03-13 00:17:51,650 - finetrainers - INFO - Finetuning transformer with PEFT parameters
2025-03-13 00:17:52,809 - finetrainers - INFO - Initializing optimizer and lr scheduler
2025-03-13 00:17:52,817 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_optimizer completed!
2025-03-13 00:17:52,818 - finetrainers - INFO - Initialized FineTrainers
2025-03-13 00:17:52,818 - finetrainers - INFO - Initializing trackers: ['wandb']. Logging to log_dir='logs'
wandb: Tracking run with wandb version 0.17.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2025-03-13 00:17:53,948 - finetrainers - INFO - WandB logging enabled
2025-03-13 00:17:53,948 - finetrainers - INFO - Initializing dataset and dataloader
2025-03-13 00:17:53,948 - finetrainers - INFO - Training configured to use 1 datasets
2025-03-13 00:17:54,215 - finetrainers - INFO - Downloading dataset finetrainers/crush-smol from the HF Hub
2025-03-13 00:17:54,365 - finetrainers - INFO - Initialized dataset: finetrainers/crush-smol
2025-03-13 00:17:54,365 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-13 00:17:54,365 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: video
  - ID Token: PIKA_CRUSH
  - Image Resolution Buckets: None
  - Video Resolution Buckets: [[49, 480, 832]]
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

2025-03-13 00:17:54,366 - finetrainers - INFO - Initializing IterableCombinedDataset with the following configuration:
  - Number of Datasets: 1
  - Buffer Size: 1
  - Shuffle: True

2025-03-13 00:17:54,366 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataloader completed!
2025-03-13 00:17:54,366 - finetrainers - INFO - Checkpointing enabled. Checkpoints will be stored in '/raid/aryan/wan'
2025-03-13 00:17:54,366 - finetrainers - INFO - Starting training
2025-03-13 00:17:54,367 - finetrainers - INFO - Memory before training start: {
    "memory_allocated": 1.462,
    "memory_reserved": 1.527,
    "max_memory_allocated": 1.462,
    "max_memory_reserved": 1.527
}
2025-03-13 00:17:54,367 - finetrainers - INFO - Training configuration: {
    "trainable parameters": 23592960,
    "train steps": 3000,
    "per-replica batch size": 1,
    "global batch size": 1,
    "gradient accumulation steps": 1
}
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/3000 [00:00<?, ?it/s]2025-03-13 00:17:54,387 - finetrainers - INFO - Precomputation disabled. Loading in-memory data loaders. All components will be loaded on GPUs.
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12985.46it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  2.75it/s]
2025-03-13 00:18:26,192 - finetrainers - INFO - Starting IterableCombinedDataset with 1 datasets███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  2.74it/s]
                                                                                                                                                                                                                                                                                                            2025-03-13 00:18:26,192 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Filling buffer from data iterator 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.58s/it]
2025-03-13 00:18:29,870 - finetrainers - DEBUG - Starting training step (1/3000)███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.58s/it]
Training steps:   0%|                                                                                                                                                                                    | 1/3000 [00:40<33:22:25, 40.06s/it, grad_norm=0.0875, global_avg_loss=0.235, global_max_loss=0.235]2025-03-13 00:18:36,678 - finetrainers - DEBUG - Starting training step (2/3000)
Training steps:   0%|                                                                                                                                                                                    | 2/3000 [00:47<17:08:14, 20.58s/it, grad_norm=0.0283, global_avg_loss=0.116, global_max_loss=0.116]2025-03-13 00:18:41,371 - finetrainers - INFO - Starting validation
2025-03-13 00:18:41,667 - finetrainers - INFO - Memory before validation start: {
    "memory_allocated": 12.504,
    "memory_reserved": 18.188,
    "max_memory_allocated": 16.987,
    "max_memory_reserved": 18.188
}
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 188.74it/s]
2025-03-13 00:18:41,997 - finetrainers - DEBUG - Validating validation_data=[{'caption': 'PIKA_CRUSH A red toy car is being crushed by a large hydraulic press, which is flattening objects as if they were under a hydraulic press.', 'num_inference_steps': 50, 'height': 480, 'width': 832, 'num_frames': 49, 'prompt': 'PIKA_CRUSH A red toy car is being crushed by a large hydraulic press, which is flattening objects as if they were under a hydraulic press.'}] on rank=0.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:58<00:00,  2.38s/it]
2025-03-13 00:20:43,963 - finetrainers - DEBUG - Saving video from rank=0 to /raid/aryan/wan/validation-2-0-2-PIKA_CRUSH-A-red-toy-car--1741821643.mp4███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:58<00:00,  2.38s/it]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2025-03-13 00:20:44,542 - finetrainers - DEBUG - Validating validation_data=[{'caption': 'PIKA_CRUSH A green cube is being compressed by a hydraulic press, which flattens the object as if it were under a hydraulic press. The press is shown in action, with the cube being squeezed into a smaller shape.', 'num_inference_steps': 50, 'height': 480, 'width': 832, 'num_frames': 49, 'prompt': 'PIKA_CRUSH A green cube is being compressed by a hydraulic press, which flattens the object as if it were under a hydraulic press. The press is shown in action, with the cube being squeezed into a smaller shape.'}] on rank=0.

 10%|██████████

a-r-r-o-w · 2025-03-12T23:29:55Z

HunyuanVideo is working.

Logs

(nightly-venv) (nightly-venv) aryan@hf-dgx-01:/raid/aryan/cogvideox-distillation$ ./examples/training/sft/hunyuan_video/modal_labs_dissolve/train.sh 
+ export WANDB_MODE=offline
+ WANDB_MODE=offline
+ export NCCL_P2P_DISABLE=1
+ NCCL_P2P_DISABLE=1
+ export TORCH_NCCL_ENABLE_MONITORING=0
+ TORCH_NCCL_ENABLE_MONITORING=0
+ export FINETRAINERS_LOG_LEVEL=DEBUG
+ FINETRAINERS_LOG_LEVEL=DEBUG
+ BACKEND=ptd
+ NUM_GPUS=1
+ CUDA_VISIBLE_DEVICES=3
+ TRAINING_DATASET_CONFIG=examples/training/sft/hunyuan_video/modal_labs_dissolve/training.json
+ VALIDATION_DATASET_FILE=examples/training/sft/hunyuan_video/modal_labs_dissolve/validation.json
+ DDP_1='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ DDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1'
+ FSDP_2='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ FSDP_4='--parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1'
+ HSDP_2_2='--parallel_backend ptd --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ HSDP_4_2='--parallel_backend ptd --pp_degree 1 --dp_degree 4 --dp_shards 2 --cp_degree 1 --tp_degree 1'
+ parallel_cmd=($DDP_1)
+ model_cmd=(--model_name "hunyuan_video" --pretrained_model_name_or_path "hunyuanvideo-community/HunyuanVideo")
+ dataset_cmd=(--dataset_config $TRAINING_DATASET_CONFIG)
+ dataloader_cmd=(--dataloader_num_workers 0)
+ diffusion_cmd=(--flow_weighting_scheme "logit_normal")
+ training_cmd=(--training_type "lora" --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules "(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0|add_q_proj|add_k_proj|add_v_proj|to_add_out)" --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 500 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn)
+ optimizer_cmd=(--optimizer "adamw" --lr 3e-5 --lr_scheduler "constant_with_warmup" --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0)
+ validation_cmd=(--validation_dataset_file "$VALIDATION_DATASET_FILE" --validation_steps 2)
+ miscellaneous_cmd=(--tracker_name "finetrainers-hunyuanvideo" --output_dir "/raid/aryan/hunyuanvideo" --init_timeout 600 --nccl_timeout 600 --report_to "wandb")
+ '[' ptd == accelerate ']'
+ '[' ptd == ptd ']'
+ export CUDA_VISIBLE_DEVICES=3
+ CUDA_VISIBLE_DEVICES=3
+ torchrun --standalone --nnodes=1 --nproc_per_node=1 --rdzv_backend c10d --rdzv_endpoint=localhost:0 train.py --parallel_backend ptd --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1 --model_name hunyuan_video --pretrained_model_name_or_path hunyuanvideo-community/HunyuanVideo --dataset_config examples/training/sft/hunyuan_video/modal_labs_dissolve/training.json --dataloader_num_workers 0 --flow_weighting_scheme logit_normal --training_type lora --seed 42 --batch_size 1 --train_steps 3000 --rank 32 --lora_alpha 32 --target_modules '(transformer_blocks|single_transformer_blocks).*(to_q|to_k|to_v|to_out.0|add_q_proj|add_k_proj|add_v_proj|to_add_out)' --gradient_accumulation_steps 1 --gradient_checkpointing --checkpointing_steps 500 --checkpointing_limit 2 --enable_slicing --enable_tiling --layerwise_upcasting_modules transformer --layerwise_upcasting_storage_dtype float8_e4m3fn --optimizer adamw --lr 3e-5 --lr_scheduler constant_with_warmup --lr_warmup_steps 1000 --lr_num_cycles 1 --beta1 0.9 --beta2 0.99 --weight_decay 1e-4 --epsilon 1e-8 --max_grad_norm 1.0 --validation_dataset_file examples/training/sft/hunyuan_video/modal_labs_dissolve/validation.json --validation_steps 2 --tracker_name finetrainers-hunyuanvideo --output_dir /raid/aryan/hunyuanvideo --init_timeout 600 --nccl_timeout 600 --report_to wandb
2025-03-13 00:26:34,406 - finetrainers - DEBUG - Successfully imported bitsandbytes version 0.43.3
2025-03-13 00:26:34,413 - finetrainers - DEBUG - Remaining unparsed arguments: []
2025-03-13 00:26:34,996 - finetrainers - INFO - Initialized parallel state with:
  - World size: 1
  - Pipeline parallel degree: 1
  - Data parallel degree: 1
  - Context parallel degree: 1
  - Tensor parallel degree: 1
  - Data parallel shards: 1

2025-03-13 00:26:35,023 - finetrainers - DEBUG - Device mesh: DeviceMesh('cuda', 0)
2025-03-13 00:26:35,023 - finetrainers - DEBUG - Enabling determinism: {'global_rank': 0, 'seed': 42}
2025-03-13 00:26:35,025 - finetrainers - INFO - Initializing models
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 38.67it/s]
2025-03-13 00:26:35,700 - finetrainers - INFO - Initializing trainable parameters
2025-03-13 00:26:35,701 - finetrainers - INFO - Finetuning transformer with PEFT parameters
2025-03-13 00:26:40,096 - finetrainers - INFO - Initializing optimizer and lr scheduler
2025-03-13 00:26:40,107 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_optimizer completed!
2025-03-13 00:26:40,108 - finetrainers - INFO - Initialized FineTrainers
2025-03-13 00:26:40,108 - finetrainers - INFO - Initializing trackers: ['wandb']. Logging to log_dir='logs'
wandb: Tracking run with wandb version 0.17.7
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
2025-03-13 00:26:41,477 - finetrainers - INFO - WandB logging enabled
2025-03-13 00:26:41,477 - finetrainers - INFO - Initializing dataset and dataloader
2025-03-13 00:26:41,477 - finetrainers - INFO - Training configured to use 2 datasets
2025-03-13 00:26:42,149 - finetrainers - INFO - Downloading dataset modal-labs/dissolve from the HF Hub
2025-03-13 00:26:42,352 - finetrainers - INFO - Initialized dataset: modal-labs/dissolve
2025-03-13 00:26:42,353 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-13 00:26:42,353 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: video
  - ID Token: MODAL_DISSOLVE
  - Image Resolution Buckets: None
  - Video Resolution Buckets: [[49, 480, 768]]
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

2025-03-13 00:26:42,614 - finetrainers - INFO - Downloading dataset modal-labs/dissolve from the HF Hub
2025-03-13 00:26:42,754 - finetrainers - INFO - Initialized dataset: modal-labs/dissolve
2025-03-13 00:26:42,755 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataset completed!
2025-03-13 00:26:42,755 - finetrainers - INFO - Initializing IterableDatasetPreprocessingWrapper for the dataset with the following configuration:
  - Dataset Type: video
  - ID Token: MODAL_DISSOLVE
  - Image Resolution Buckets: None
  - Video Resolution Buckets: [[81, 480, 768]]
  - Reshape Mode: bicubic
  - Remove Common LLM Caption Prefixes: True

2025-03-13 00:26:42,755 - finetrainers - INFO - Initializing IterableCombinedDataset with the following configuration:
  - Number of Datasets: 2
  - Buffer Size: 1
  - Shuffle: True

2025-03-13 00:26:42,755 - finetrainers - DEBUG - PytorchDTensorParallelBackend::prepare_dataloader completed!
2025-03-13 00:26:42,755 - finetrainers - INFO - Checkpointing enabled. Checkpoints will be stored in '/raid/aryan/hunyuanvideo'
2025-03-13 00:26:42,756 - finetrainers - INFO - Starting training
2025-03-13 00:26:42,756 - finetrainers - INFO - Memory before training start: {
    "memory_allocated": 15.656,
    "memory_reserved": 15.951,
    "max_memory_allocated": 15.656,
    "max_memory_reserved": 15.951
}
2025-03-13 00:26:42,757 - finetrainers - INFO - Training configuration: {
    "trainable parameters": 55050240,
    "train steps": 3000,
    "per-replica batch size": 1,
    "global batch size": 1,
    "gradient accumulation steps": 1
}
Training steps:   0%|                                                                                                                                                                                                                                                               | 0/3000 [00:00<?, ?it/s]2025-03-13 00:26:42,784 - finetrainers - INFO - Precomputation disabled. Loading in-memory data loaders. All components will be loaded on GPUs.
tokenizer_config.json: 51.7kB [00:00, 127MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.2M/17.2M [00:00<00:00, 68.5MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 577/577 [00:00<00:00, 3.95MB/s]
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 736/736 [00:00<00:00, 4.96MB/s]
vocab.json: 1.06MB [00:00, 16.2MB/s]                                                                                                                                                                                                                                               | 0.00/736 [00:00<?, ?B/s]
merges.txt: 525kB [00:00, 11.1MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 588/588 [00:00<00:00, 3.16MB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 766/766 [00:00<00:00, 4.95MB/s]
model.safetensors.index.json: 22.2kB [00:00, 85.8MB/s]                                                                                                                                                                                                                             | 0.00/766 [00:00<?, ?B/s]
model-00001-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.98G/4.98G [00:07<00:00, 702MB/s]
model-00002-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 5.00G/5.00G [00:07<00:00, 629MB/s]
model-00003-of-00004.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.92G/4.92G [00:08<00:00, 562MB/s]
model-00004-of-00004.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 117M/117M [00:00<00:00, 126MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:25<00:00,  6.35s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.82it/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 646/646 [00:00<00:00, 4.42MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 246M/246M [00:00<00:00, 268MB/s]
diffusion_pytorch_model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 986M/986M [00:03<00:00, 298MB/s]
2025-03-13 00:27:24,474 - finetrainers - INFO - Starting IterableCombinedDataset with 2 datasets████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                              | 713M/986M [00:03<00:00, 846MB/s]
                                                                                                                                                                                                                                                                                                            2025-03-13 00:27:24,475 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Filling buffer from data iterator 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.44it/s]
Filling buffer from data iterator 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.44it/s2025-03-13 00:27:25,172 - finetrainers - INFO - Starting IterableDatasetPreprocessingWrapper for the dataset                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Filling buffer from data iterator 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.26it/s]
2025-03-13 00:27:34,171 - finetrainers - DEBUG - Starting training step (1/3000)███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.26it/s]
Training steps:   0%|                                                                                                                                                                                | 1/3000 [01:45<88:06:36, 105.77s/it, grad_norm=0.00072, global_avg_loss=0.0436, global_max_loss=0.0436]2025-03-13 00:28:33,818 - finetrainers - DEBUG - Starting training step (2/3000)
Training steps:   0%|                                                                                                                                                                                 | 2/3000 [02:15<50:46:33, 60.97s/it, grad_norm=0.00474, global_avg_loss=0.0582, global_max_loss=0.0582]2025-03-13 00:28:58,141 - finetrainers - INFO - Starting validation
Generating train split: 8 examples [00:00, 1046.48 examples/s]
2025-03-13 00:28:58,442 - finetrainers - INFO - Memory before validation start: {
    "memory_allocated": 30.829,
    "memory_reserved": 54.729,
    "max_memory_allocated": 46.403,
    "max_memory_reserved": 54.729
}
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39.0/39.0 [00:00<00:00, 226kB/s]
scheduler_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 419/419 [00:00<00:00, 3.10MB/s]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 153.78it/s]
2025-03-13 00:28:59,793 - finetrainers - DEBUG - Validating validation_data=[{'caption': 'MODAL_DISSOLVE A meticulously detailed, antique-style vase, featuring mottled beige and brown hues and two small handles, sits centrally on a dark brown circular pedestal.  The vase, seemingly made of clay or porcelain, begins to dissolve from the bottom up.  The disintegration process is rapid but not explosive, with a cloud of fine, light tan dust forming and rising in a swirling, almost ethereal column that expands outwards before slowly descending. The dust particles are individually visible as they float, and the overall effect is one of delicate disintegration rather than shattering.  Finally, only the empty pedestal and the intricately patterned marble floor remain.', 'num_inference_steps': 30, 'height': 480, 'width': 768, 'num_frames': 49, 'prompt': 'MODAL_DISSOLVE A meticulously detailed, antique-style vase, featuring mottled beige and brown hues and two small handles, sits centrally on a dark brown circular pedestal.  The vase, seemingly made of clay or porcelain, begins to dissolve from the bottom up.  The disintegration process is rapid but not explosive, with a cloud of fine, light tan dust forming and rising in a swirling, almost ethereal column that expands outwards before slowly descending. The dust particles are individually visible as they float, and the overall effect is one of delicate disintegration rather than shattering.  Finally, only the empty pedestal and the intricately patterned marble floor remain.'}] on rank=0.
Token indices sequence length is longer than the specified maximum sequence length for this model (138 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: ['dust forming and rising in a swirling , almost ethereal column that expands outwards before slowly descending . the dust particles are individually visible as they float , and the overall effect is one of delicate disintegration rather than shattering . finally , only the empty pedestal and the intricately patterned marble floor remain .']

 13%|███████████████████████████████████▎

a-r-r-o-w · 2025-03-12T23:32:09Z

I think it's safe to say that it should work without problems now. I'll try to make sure to test every new model that is added so that it is compatible with layerwise upcasting.

Thanks for your patience!

dorpxam · 2025-03-13T08:05:59Z

You're right, it's better to take time. Thank you so much.

dorpxam · 2025-03-13T08:56:34Z

Oh. Missreading the parameter change before starting the test here. OK. Change right now.

Using:

  --layerwise_upcasting_modules transformer
  --layerwise_upcasting_storage_dtype float8_e4m3fn
  --enable_model_cpu_offload

Now :

Precomputation speed : less 2 minutes versus 25 minutes before
VRAM consumption : no change, always 33 GB like without --layerwise_upcasting_modules transformer
Training speed : blocked at step 2.

INFO:finetrainers:Precomputed condition & latent data exhausted. Loading & preprocessing new data.
Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 2917.57it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.63it/s]
Filling buffer from data iterator 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:20<00:00,  1.16it/s]
Processing data on rank 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:21<00:00,  1.13it/s]
Processing data on rank 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [01:34<00:00,  3.95s/it]
2025-03-13 09:49:36,279 - finetrainers - DEBUG - Starting training step (1/2400)█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [01:34<00:00,  3.56s/it]
DEBUG:finetrainers:Starting training step (1/2400)
Training steps:   0%|                                                                                                                                                                                                | 1/2400 [02:38<105:49:04, 158.79s/it, grad_norm=0.0157, global_avg_loss=0.0992, global_max_loss=0.0992]2025-03-13 09:49:56,467 - finetrainers - DEBUG - Starting training step (2/2400)
DEBUG:finetrainers:Starting training step (2/2400)

Maybe I do something wrong?

For information:

WanTraining from spacepxl (https://github.com/spacepxl/WanTraining) run at 4 sec/it with 10/12 GB of VRAM but I can't train with. Using default parameters, I got pure color noise. I don't know why.
I will test musubi-tuner from kohya-ss (https://github.com/kohya-ss/musubi-tuner) just for a comparison and see if I can run it.

Let me know if you need some specific test from me.

a-r-r-o-w · 2025-03-13T22:09:27Z

@dorpxam It's probably taking faster for precomputation because you need to explicitly set --enable_precomputation (I believe this will make it slow again once you add it). If you don't set this flag, then all models (text encoder, vae, transformer) is loaded onto GPU and uses more memory.

To see the actualy sec/it, I would recommend waiting 10-20 steps before taking the reading because, as you can see, the first step performs some logic for resolution preprocessing, prefilling buffers for shuffling data, etc.

The training is blocked at step 2 because of a bug. I fixed it in #320, so could you try the exact same settings as mentioned in your comment with that PR? I believe it should fix any problems you're having. Might be nice to also try with/without --enable_precomputation and share your findings 🤗

update

de928b5

patch

b34bcb8

Merge branch 'main' into fix/layerwise-upcasting

e629777

a-r-r-o-w merged commit 226c98d into main Mar 12, 2025
1 check passed

a-r-r-o-w deleted the fix/layerwise-upcasting branch March 12, 2025 23:32

This was referenced Mar 14, 2025

none get_lr_scheduler_state() #324

Closed

Patch WanTimeTextImageEmbedding forward only with fp8 #327

Merged

Fix Layerwise Casting #316

Fix Layerwise Casting #316

Uh oh!

Conversation

a-r-r-o-w commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Mar 10, 2025

Uh oh!

dorpxam commented Mar 11, 2025

Uh oh!

dorpxam commented Mar 11, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

a-r-r-o-w commented Mar 12, 2025

Uh oh!

Uh oh!

dorpxam commented Mar 13, 2025

Uh oh!

dorpxam commented Mar 13, 2025

Uh oh!

a-r-r-o-w commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

a-r-r-o-w commented Mar 10, 2025 •

edited

Loading