Skip to content

Commit cc1d2e7

Browse files
Multi-GPU parallel encoding support for training videos. (#6)
* Multi-GPU parallel encoding support for training videos. * revert * make style * update --------- Co-authored-by: Aryan <[email protected]>
1 parent 3a519f5 commit cc1d2e7

File tree

5 files changed

+236
-65
lines changed

5 files changed

+236
-65
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ __pycache__/
33
*.py[cod]
44
*$py.class
55

6+
# JetBrains
7+
.idea
8+
69
# C extensions
710
*.so
811

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -402,4 +402,4 @@ With `train_batch_size = 4`:
402402
- [x] Make 5B lora finetuning work in under 24GB
403403

404404
> [!IMPORTANT]
405-
> Since our goal is to make the scripts as memory-friendly as possible we don't guarantee multi-GPU training.
405+
> Since our goal is to make the scripts as memory-friendly as possible we don't guarantee multi-GPU training.

README_zh.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# CogVideoX Factory
2+
3+
## 简介
4+
5+
这是用于 CogVideoX 微调的仓库。
6+
7+
## 数据集准备
8+
9+
创建两个文件,一个文件包含以换行符分隔的提示词,另一个文件包含以换行符分隔的视频数据路径(视频文件的路径必须相对于您在指定 `--data_root` 时传递的路径)。让我们通过一个例子来更好地理解这一点!
10+
11+
假设您将 `--data_root` 指定为 `/dataset`,并且该目录包含文件:`prompts.txt``videos.txt`
12+
13+
`prompts.txt` 文件应包含以换行符分隔的提示词:
14+
15+
```
16+
一段黑白动画序列,主角是一只名为 Rabbity Ribfried 的兔子和一只拟人化的山羊,在一个充满音乐和趣味的环境中,展示他们不断发展的互动。
17+
一段黑白动画序列,场景在船甲板上,主角是一只名为 Bully Bulldoger 的斗牛犬角色,展示了夸张的面部表情和肢体语言。角色从自信到专注,再到紧张和痛苦,展示了一系列情绪,随着它克服挑战。船的内部在背景中保持静止,只有简单的细节,如钟声和开着的门。角色的动态动作和变化的表情推动了故事的发展,没有镜头移动,确保观众专注于其不断变化的反应和肢体动作。
18+
...
19+
```
20+
21+
`videos.txt` 文件应包含以换行符分隔的视频文件路径。请注意,路径应相对于 `--data_root` 目录。
22+
23+
```bash
24+
videos/00000.mp4
25+
videos/00001.mp4
26+
...
27+
```
28+
29+
总体而言,如果您在数据集根目录运行 `tree` 命令,您的数据集应如下所示:
30+
31+
```bash
32+
/dataset
33+
├── prompts.txt
34+
├── videos.txt
35+
├── videos
36+
├── videos/00000.mp4
37+
├── videos/00001.mp4
38+
├── ...
39+
```
40+
41+
使用此格式时,`--caption_column` 必须是 `prompts.txt``--video_column` 必须是 `videos.txt`。如果您的数据存储在 CSV 文件中,您也可以指定 `--dataset_file` 为 CSV 的路径,`--caption_column``--video_column` 为 CSV 文件中的实际列名。
42+
43+
例如,让我们使用这个 [Disney 数据集](https://huggingface.co/datasets/Wild-Heart/Disney-VideoGeneration-Dataset) 进行微调。要下载,可以使用 🤗 Hugging Face CLI。
44+
45+
```bash
46+
huggingface-cli download --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset --local-dir video-dataset-disney
47+
```
48+
49+
## 训练
50+
51+
TODO
52+
53+
请查看 `training/*.sh`
54+
55+
注意:未在 MPS 上测试
56+
57+
## 内存需求
58+
59+
训练支持并验证的内存优化包括:
60+
61+
- 来自 [TorchAO](https://github.com/pytorch/ao)`CPUOffloadOptimizer`
62+
- 来自 [bitsandbytes](https://huggingface.co/docs/bitsandbytes/optimizers) 的低位优化器。
63+
64+
### LoRA 微调
65+
66+
<details>
67+
<summary> AdamW </summary>
68+
69+
With `train_batch_size = 1`:
70+
71+
| model | lora rank | gradient_checkpointing | memory_before_training | memory_before_validation | memory_after_validation | memory_after_testing |
72+
|:------------------:|:---------:|:----------------------:|:----------------------:|:------------------------:|:-----------------------:|:--------------------:|
73+
| THUDM/CogVideoX-2b | 16 | False | 12.945 | 43.764 | 46.918 | 24.234 |
74+
| THUDM/CogVideoX-2b | 16 | True | 12.945 | 12.945 | 21.121 | 24.234 |
75+
| THUDM/CogVideoX-2b | 64 | False | 13.035 | 44.314 | 47.469 | 24.469 |
76+
| THUDM/CogVideoX-2b | 64 | True | 13.036 | 13.035 | 21.564 | 24.500 |
77+
| THUDM/CogVideoX-2b | 256 | False | 13.095 | 45.826 | 48.990 | 25.543 |
78+
| THUDM/CogVideoX-2b | 256 | True | 13.094 | 13.095 | 22.344 | 25.537 |
79+
| THUDM/CogVideoX-5b | 16 | True | 19.742 | 19.742 | 28.746 | 38.123 |
80+
| THUDM/CogVideoX-5b | 64 | True | 20.006 | 20.818 | 30.338 | 38.738 |
81+
| THUDM/CogVideoX-5b | 256 | True | 20.771 | 22.119 | 31.939 | 41.537 |
82+
83+
With `train_batch_size = 4`:
84+
85+
| model | lora rank | gradient_checkpointing | memory_before_training | memory_before_validation | memory_after_validation | memory_after_testing |
86+
|:------------------:|:---------:|:----------------------:|:----------------------:|:------------------------:|:-----------------------:|:--------------------:|
87+
| THUDM/CogVideoX-2b | 16 | True | 12.945 | 21.803 | 21.814 | 24.322 |
88+
| THUDM/CogVideoX-2b | 64 | True | 13.035 | 22.254 | 22.254 | 24.572 |
89+
| THUDM/CogVideoX-2b | 256 | True | 13.094 | 22.020 | 22.033 | 25.574 |
90+
| THUDM/CogVideoX-5b | 16 | True | 19.742 | 46.492 | 46.492 | 38.197 |
91+
| THUDM/CogVideoX-5b | 64 | True | 20.006 | 47.805 | 47.805 | 39.365 |
92+
| THUDM/CogVideoX-5b | 256 | True | 20.771 | 47.268 | 47.332 | 41.008 |
93+
94+
> [!NOTE]
95+
>

prepare_dataset.sh

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,13 @@
22

33
MODEL_ID="THUDM/CogVideoX-2b"
44

5+
NUM_GPUS=8
6+
57
# For more details on the expected data format, please refer to the README.
6-
DATA_ROOT="/raid/aryan/video-dataset-tom-and-jerry" # This needs to be the path to the base directory where your videos are located.
8+
DATA_ROOT="/path/to/my/datasets/video-dataset" # This needs to be the path to the base directory where your videos are located.
79
CAPTION_COLUMN="prompts.txt"
810
VIDEO_COLUMN="videos.txt"
9-
OUTPUT_DIR="/raid/aryan/video-dataset-tom-and-jerry-encoded"
11+
OUTPUT_DIR="/path/to/my/datasets/preprocessed-dataset"
1012
HEIGHT=480
1113
WIDTH=720
1214
MAX_NUM_FRAMES=49
@@ -17,19 +19,20 @@ DTYPE=fp32
1719

1820
# To create a folder-style dataset structure without pre-encoding videos and captions'
1921
CMD_WITHOUT_PRE_ENCODING="\
20-
python3 training/prepare_dataset.py \
21-
--model_id $MODEL_ID \
22-
--data_root $DATA_ROOT \
23-
--caption_column $CAPTION_COLUMN \
24-
--video_column $VIDEO_COLUMN \
25-
--output_dir $OUTPUT_DIR \
26-
--height $HEIGHT \
27-
--width $WIDTH \
28-
--max_num_frames $MAX_NUM_FRAMES \
29-
--max_sequence_length $MAX_SEQUENCE_LENGTH \
30-
--target_fps $TARGET_FPS \
31-
--batch_size $BATCH_SIZE \
32-
--dtype $DTYPE
22+
torchrun --nproc_per_node=$NUM_GPUS \
23+
training/prepare_dataset.py \
24+
--model_id $MODEL_ID \
25+
--data_root $DATA_ROOT \
26+
--caption_column $CAPTION_COLUMN \
27+
--video_column $VIDEO_COLUMN \
28+
--output_dir $OUTPUT_DIR \
29+
--height $HEIGHT \
30+
--width $WIDTH \
31+
--max_num_frames $MAX_NUM_FRAMES \
32+
--max_sequence_length $MAX_SEQUENCE_LENGTH \
33+
--target_fps $TARGET_FPS \
34+
--batch_size $BATCH_SIZE \
35+
--dtype $DTYPE
3336
"
3437

3538
CMD_WITH_PRE_ENCODING="$CMD_WITHOUT_PRE_ENCODING --save_tensors"

0 commit comments

Comments
 (0)