You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -277,7 +277,10 @@ pipe.vae.enable_tiling()
277
277
We highly welcome contributions from the community and actively contribute to the open-source community. The following
278
278
works have already been adapted for CogVideoX, and we invite everyone to use them:
279
279
280
-
+[CogVideoX-Fun](https://github.com/aigc-apps/CogVideoX-Fun): CogVideoX-Fun is a modified pipeline based on the CogVideoX architecture, supporting flexible resolutions and multiple launch methods.
280
+
+[CogVideoX-Fun](https://github.com/aigc-apps/CogVideoX-Fun): CogVideoX-Fun is a modified pipeline based on the
281
+
CogVideoX architecture, supporting flexible resolutions and multiple launch methods.
282
+
+[CogStudio](https://github.com/pinokiofactory/cogstudio): A separate repository for CogVideo's Gradio Web UI, which
283
+
supports more functional Web UIs.
281
284
+[Xorbits Inference](https://github.com/xorbitsai/inference): A powerful and comprehensive distributed inference
282
285
framework, allowing you to easily deploy your own models or the latest cutting-edge open-source models with just one
283
286
click.
@@ -288,7 +291,8 @@ works have already been adapted for CogVideoX, and we invite everyone to use the
288
291
techniques.
289
292
+[AutoDL Space](https://www.codewithgpu.com/i/THUDM/CogVideo/CogVideoX-5b-demo): A one-click deployment Huggingface
290
293
Space image provided by community members.
291
-
+[Interior Design Fine-Tuning Model](https://huggingface.co/collections/bertjiazheng/koolcogvideox-66e4762f53287b7f39f8f3ba): is a fine-tuned model based on CogVideoX, specifically designed for interior design.
# This command sets the PyTorch CUDA memory allocation strategy to expandable segments to prevent OOM (Out of Memory) errors.
57
-
58
-
accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gpu # Launch training using Accelerate with the specified config file for multi-GPU.
59
-
60
-
train_cogvideox_lora.py # This is the training script for LoRA fine-tuning of the CogVideoX model.
61
-
62
-
--pretrained_model_name_or_path THUDM/CogVideoX-2b # Path to the pretrained model you want to fine-tune, pointing to the CogVideoX-2b model.
63
-
64
-
--cache_dir ~/.cache # Directory for caching models downloaded from Hugging Face.
65
-
66
-
--enable_tiling # Enable VAE tiling to reduce memory usage by processing images in smaller chunks.
67
-
68
-
--enable_slicing # Enable VAE slicing to split the image into slices along the channel to save memory.
69
-
70
-
--instance_data_root ~/disney/ # Root directory for instance data, i.e., the dataset used for training.
71
-
72
-
--caption_column prompts.txt # Specify the column or file containing instance prompts (text descriptions), in this case, the `prompts.txt` file.
73
-
74
-
--video_column videos.txt # Specify the column or file containing video paths, in this case, the `videos.txt` file.
75
-
76
-
--validation_prompt "Mickey with the captain and friends:::Mickey and the bear"# Validation prompts; multiple prompts are separated by the specified delimiter (e.g., `:::`).
77
-
78
-
--validation_prompt_separator ::: # The separator for validation prompts, set to `:::` here.
79
-
80
-
--num_validation_videos 1 # Number of videos to generate during validation, set to 1.
81
-
82
-
--validation_epochs 2 # Number of epochs after which validation will be run, set to every 2 epochs.
83
-
84
-
--seed 3407 # Set a random seed to ensure reproducibility, set to 3407.
85
-
86
-
--rank 128 # Dimension of the LoRA update matrix, controls the size of the LoRA layers, set to 128.
87
-
88
-
--mixed_precision bf16 # Use mixed precision training, set to `bf16` (bfloat16) to reduce memory usage and speed up training.
89
-
90
-
--output_dir cogvideox-lora-single-gpu # Output directory for storing model predictions and checkpoints.
91
-
92
-
--height 480 # Height of the input videos, all videos will be resized to 480 pixels.
93
-
94
-
--width 720 # Width of the input videos, all videos will be resized to 720 pixels.
95
-
96
-
--fps 8 # Frame rate of the input videos, all videos will be processed at 8 frames per second.
97
-
98
-
--max_num_frames 49 # Maximum number of frames per input video, videos will be truncated to 49 frames.
99
-
100
-
--skip_frames_start 0 # Number of frames to skip from the start of each video, set to 0 to not skip any frames.
101
-
102
-
--skip_frames_end 0 # Number of frames to skip from the end of each video, set to 0 to not skip any frames.
103
-
104
-
--train_batch_size 1 # Training batch size per device, set to 1.
105
-
106
-
--num_train_epochs 10 # Total number of training epochs, set to 10.
107
-
108
-
--checkpointing_steps 500 # Save checkpoints every 500 steps.
109
-
110
-
--gradient_accumulation_steps 1 # Gradient accumulation steps, perform an update every 1 step.
111
-
112
-
--learning_rate 1e-4 # Initial learning rate, set to 1e-4.
113
-
114
-
--optimizer AdamW # Optimizer type, using AdamW optimizer.
115
-
116
-
--adam_beta1 0.9 # Beta1 parameter for the Adam optimizer, set to 0.9.
117
-
118
-
--adam_beta2 0.95 # Beta2 parameter for the Adam optimizer, set to 0.95.
54
+
```
55
+
accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gpu \ # Use accelerate to launch multi-GPU training with the config file accelerate_config_machine_single.yaml
56
+
train_cogvideox_lora.py \ # Training script train_cogvideox_lora.py for LoRA fine-tuning on CogVideoX model
57
+
--gradient_checkpointing \ # Enable gradient checkpointing to reduce memory usage
58
+
--pretrained_model_name_or_path $MODEL_PATH \ # Path to the pretrained model, specified by $MODEL_PATH
59
+
--cache_dir $CACHE_PATH \ # Cache directory for model files, specified by $CACHE_PATH
60
+
--enable_tiling \ # Enable tiling technique to process videos in chunks, saving memory
61
+
--enable_slicing \ # Enable slicing to further optimize memory by slicing inputs
62
+
--instance_data_root $DATASET_PATH \ # Dataset path specified by $DATASET_PATH
63
+
--caption_column prompts.txt \ # Specify the file prompts.txt for video descriptions used in training
64
+
--video_column videos.txt \ # Specify the file videos.txt for video paths used in training
65
+
--validation_prompt "" \ # Prompt used for generating validation videos during training
66
+
--validation_prompt_separator ::: \ # Set ::: as the separator for validation prompts
67
+
--num_validation_videos 1 \ # Generate 1 validation video per validation round
68
+
--validation_epochs 100 \ # Perform validation every 100 training epochs
69
+
--seed 42 \ # Set random seed to 42 for reproducibility
70
+
--rank 128 \ # Set the rank for LoRA parameters to 128
71
+
--lora_alpha 64 \ # Set the alpha parameter for LoRA to 64, adjusting LoRA learning rate
72
+
--mixed_precision bf16 \ # Use bf16 mixed precision for training to save memory
73
+
--output_dir $OUTPUT_PATH \ # Specify the output directory for the model, defined by $OUTPUT_PATH
74
+
--height 480 \ # Set video height to 480 pixels
75
+
--width 720 \ # Set video width to 720 pixels
76
+
--fps 8 \ # Set video frame rate to 8 frames per second
77
+
--max_num_frames 49 \ # Set the maximum number of frames per video to 49
78
+
--skip_frames_start 0 \ # Skip 0 frames at the start of the video
79
+
--skip_frames_end 0 \ # Skip 0 frames at the end of the video
80
+
--train_batch_size 4 \ # Set training batch size to 4
81
+
--num_train_epochs 30 \ # Total number of training epochs set to 30
82
+
--checkpointing_steps 1000 \ # Save model checkpoint every 1000 steps
83
+
--gradient_accumulation_steps 1 \ # Accumulate gradients for 1 step, updating after each batch
84
+
--learning_rate 1e-3 \ # Set learning rate to 0.001
85
+
--lr_scheduler cosine_with_restarts \ # Use cosine learning rate scheduler with restarts
86
+
--lr_warmup_steps 200 \ # Warm up the learning rate for the first 200 steps
87
+
--lr_num_cycles 1 \ # Set the number of learning rate cycles to 1
88
+
--optimizer AdamW \ # Use the AdamW optimizer
89
+
--adam_beta1 0.9 \ # Set Adam optimizer beta1 parameter to 0.9
90
+
--adam_beta2 0.95 \ # Set Adam optimizer beta2 parameter to 0.95
91
+
--max_grad_norm 1.0 \ # Set maximum gradient clipping value to 1.0
92
+
--allow_tf32 \ # Enable TF32 to speed up training
93
+
--report_to wandb # Use Weights and Biases (wandb) for logging and monitoring the training
119
94
```
120
95
121
96
## Running the Script to Start Fine-tuning
122
97
123
-
Single GPU fine-tuning:
98
+
Single Node (One GPU or Multi GPU) fine-tuning:
124
99
125
100
```shell
126
101
bash finetune_single_rank.sh
127
102
```
128
103
129
-
Multi-GPU fine-tuning:
104
+
Multi-Node fine-tuning:
130
105
131
106
```shell
132
107
bash finetune_multi_rank.sh # Needs to be run on each node
@@ -147,5 +122,5 @@ bash finetune_multi_rank.sh # Needs to be run on each node
147
122
but regular fine-tuning without such tokens also works.
148
123
+ The original repository used `lora_alpha` set to 1. We found this value ineffective across multiple runs, likely due
149
124
to differences in the backend and training setup. Our recommendation is to set `lora_alpha` equal to rank or rank //
0 commit comments