Skip to content

Commit 13083b5

Browse files
.github page
1 parent 353e4fc commit 13083b5

File tree

6 files changed

+133
-25
lines changed

6 files changed

+133
-25
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
name: "\U0001F41B Bug Report"
2+
description: Submit a bug report to help us improve CogVideoX-Factory / 提交一个 Bug 问题报告来帮助我们改进 CogVideoX-Factory 开源框架
3+
body:
4+
- type: textarea
5+
id: system-info
6+
attributes:
7+
label: System Info / 系統信息
8+
description: Your operating environment / 您的运行环境信息
9+
placeholder: Includes Cuda version, Diffusers version, Python version, operating system, hardware information (if you suspect a hardware problem)... / 包括Cuda版本,Diffusers,Python版本,操作系统,硬件信息(如果您怀疑是硬件方面的问题)...
10+
validations:
11+
required: true
12+
13+
- type: checkboxes
14+
id: information-scripts-examples
15+
attributes:
16+
label: Information / 问题信息
17+
description: 'The problem arises when using: / 问题出现在'
18+
options:
19+
- label: "The official example scripts / 官方的示例脚本"
20+
- label: "My own modified scripts / 我自己修改的脚本和任务"
21+
22+
- type: textarea
23+
id: reproduction
24+
validations:
25+
required: true
26+
attributes:
27+
label: Reproduction / 复现过程
28+
description: |
29+
Please provide a code example that reproduces the problem you encountered, preferably with a minimal reproduction unit.
30+
If you have code snippets, error messages, stack traces, please provide them here as well.
31+
Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
32+
Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code.
33+
34+
请提供能重现您遇到的问题的代码示例,最好是最小复现单元。
35+
如果您有代码片段、错误信息、堆栈跟踪,也请在此提供。
36+
请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
37+
请勿使用截图,因为截图难以阅读,而且(更重要的是)不允许他人复制粘贴您的代码。
38+
placeholder: |
39+
Steps to reproduce the behavior/复现Bug的步骤:
40+
41+
1.
42+
2.
43+
3.
44+
45+
- type: textarea
46+
id: expected-behavior
47+
validations:
48+
required: true
49+
attributes:
50+
label: Expected behavior / 期待表现
51+
description: "A clear and concise description of what you would expect to happen. /简单描述您期望发生的事情。"
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: "\U0001F680 Feature request"
2+
description: Submit a request for a new CogVideoX-Factory feature / 提交一个新的 CogVideoX-Factory 开源项目的功能建议
3+
labels: [ "feature" ]
4+
body:
5+
- type: textarea
6+
id: feature-request
7+
validations:
8+
required: true
9+
attributes:
10+
label: Feature request / 功能建议
11+
description: |
12+
A brief description of the functional proposal. Links to corresponding papers and code are desirable.
13+
对功能建议的简述。最好提供对应的论文和代码链接。
14+
15+
- type: textarea
16+
id: motivation
17+
validations:
18+
required: true
19+
attributes:
20+
label: Motivation / 动机
21+
description: |
22+
Your motivation for making the suggestion. If that motivation is related to another GitHub issue, link to it here.
23+
您提出建议的动机。如果该动机与另一个 GitHub 问题有关,请在此处提供对应的链接。
24+
25+
- type: textarea
26+
id: contribution
27+
validations:
28+
required: true
29+
attributes:
30+
label: Your contribution / 您的贡献
31+
description: |
32+
33+
Your PR link or any other link you can help with.
34+
您的PR链接或者其他您能提供帮助的链接。

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# CogVideoX Factory 🧪
22

3+
[中文阅读](./README_zh.md)
4+
35
Fine-tune Cog family of video models for custom video generation under 24GB of GPU memory ⚡️📼
46

57
<table align="center">

README_zh.md

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,19 @@
11
# CogVideoX Factory 🧪
22

3+
[Read this in English](./README_zh.md)
4+
35
在 24GB GPU 内存下微调 Cog 系列视频模型以生成自定义视频 ⚡️📼
46

5-
TODO:添加有趣的视频结果表
7+
<table align="center">
8+
<tr>
9+
<td align="center"><video src="https://github.com/user-attachments/assets/aad07161-87cb-4784-9e6b-16d06581e3e5">Your browser does not support the video tag.</video></td>
10+
</tr>
11+
</table>
12+
613

714
## 快速开始
815

9-
确保已安装所需的依赖`pip install -r requirements.txt`
16+
克隆此仓库并确保已安装所有依赖`pip install -r requirements.txt`
1017

1118
然后下载数据集:
1219

@@ -15,25 +22,39 @@ TODO:添加有趣的视频结果表
1522
huggingface-cli download --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset --local-dir video-dataset-disney
1623
```
1724

18-
然后启动文本到视频的 LoRA 微调:
25+
然后启动文本到视频的 LoRA 微调(根据您的需求修改不同的超参数、数据集根目录和其他配置选项)
1926

2027
```bash
21-
TODO
22-
```
28+
# 对 CogVideoX 文本到视频模型进行 LoRA 微调
29+
./train_text_to_video_lora.sh
2330

24-
我们现在可以使用训练好的模型进行推理:
31+
# 对 CogVideoX 文本到视频模型进行全微调
32+
./train_text_to_video_sft.sh
2533

26-
```python
27-
TODO
34+
# 对 CogVideoX 图像到视频模型进行 LoRA 微调
35+
./train_image_to_video_lora.sh
2836
```
2937

30-
我们还可以使用 LoRA 微调 5B 版本:
38+
假设您的 LoRA 已保存并推送到 HF Hub,并命名为 `my-awesome-name/my-awesome-lora`,我们现在可以使用微调后的模型进行推理:
39+
40+
```diff
41+
import torch
42+
from diffusers import CogVideoXPipeline
43+
from diffusers import export_to_video
3144

32-
```python
33-
TODO
45+
pipe = CogVideoXPipeline.from_pretrained(
46+
"THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16
47+
).to("cuda")
48+
+ pipe.load_lora_weights("my-awesome-name/my-awesome-lora", adapter_name=["cogvideox-lora"])
49+
+ pipe.set_adapters(["cogvideox-lora"], [1.0])
50+
51+
video = pipe("<my-awesome-prompt>").frames[0]
52+
export_to_video(video, "output.mp4", fps=8)
3453
```
3554

36-
在下方的部分中,我们提供了有关更多选项的详细信息,这些选项旨在使视频模型的微调尽可能易于使用。
55+
**注意:** 对于图像到视频的微调,您必须从 [](https://github.com/huggingface/diffusers/pull/9482) 分支安装 diffusers(该分支添加了 CogVideoX 图像到视频的 LoRA 加载支持),直到它被合并。
56+
57+
在下方的部分中,我们提供了在本仓库中探索的更多选项的详细信息。它们都试图通过尽可能减少内存需求,使视频模型的微调变得尽可能容易。
3758

3859
## 数据集准备
3960

@@ -83,9 +104,9 @@ TODO:添加一个关于创建和使用预计算嵌入的部分。
83104

84105
我们提供了与 [Cog 系列模型](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) 兼容的文本到视频和图像到视频生成的训练脚本。
85106

86-
查看 `*.sh` 文件
107+
查看 `*.sh` 文件
87108

88-
注意:未在 MPS 上测试
109+
注意:本代码未在 MPS 上测试,建议在 Linux 环境下使用 CUDA文件测试。
89110

90111
## 内存需求
91112

@@ -101,7 +122,8 @@ TODO:添加一个关于创建和使用预计算嵌入的部分。
101122
支持和验证的内存优化训练选项包括:
102123

103124
- [`torchao`](https://github.com/pytorch/ao) 中的 `CPUOffloadOptimizer`。您可以阅读它的能力和限制 [此处](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload)。简而言之,它允许您使用 CPU 存储可训练的参数和梯度。这导致优化器步骤在 CPU 上进行,需要一个快速的 CPU 优化器,例如 `torch.optim.AdamW(fused=True)` 或在优化器步骤上应用 `torch.compile`。此外,建议不要将模型编译用于训练。梯度裁剪和积累尚不支持。
104-
- [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/optimizers) 中的低位优化器。TODO:测试并使 [`torchao`](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim) 工作
125+
- [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/optimizers) 中的低位优化器。
126+
- TODO:测试并使 [`torchao`](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim) 工作
105127
- DeepSpeed Zero2:由于我们依赖 `accelerate`,请按照[本指南](https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed) 配置 `accelerate` 以启用 DeepSpeed Zero2 优化。
106128

107129
> [!IMPORTANT]
@@ -114,6 +136,6 @@ TODO:添加一个关于创建和使用预计算嵌入的部分。
114136
> [!NOTE]
115137
> 图像到视频 LoRA 微调的内存需求与 `THUDM/CogVideoX-5b` 上的文本到视频类似,因此未明确报告。
116138
>
117-
> 此外,要为 I2V 微调准备测试图像,您可以通过修改脚本动态生成它们,或使用以下命令从您的训练数据中提取一些帧:
139+
> I2V训练会使用视频的第一帧进行微调。 要为 I2V 微调准备测试图像,您可以通过修改脚本动态生成它们,或使用以下命令从您的训练数据中提取一些帧:
118140
> `ffmpeg -i input.mp4 -frames:v 1 frame.png`
119141
> 或提供一个有效且可访问的图像 URL。

assets/contribute.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# 欢迎你们的贡献
2+
3+
本项目属于非常初级的阶段

train_text_to_video_lora.sh

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ export WANDB_MODE="offline"
44
export NCCL_P2P_DISABLE=1
55
export TORCH_NCCL_ENABLE_MONITORING=0
66

7-
GPU_IDS="0,1,2,3,4,5,6,7"
7+
GPU_IDS="0"
88

99
# Training Configurations
1010
# Experiment with as many hyperparameters as you want!
@@ -19,26 +19,22 @@ ACCELERATE_CONFIG_FILE="accelerate_configs/uncompiled_1.yaml"
1919
# Absolute path to where the data is located. Make sure to have read the README for how to prepare data.
2020
# This example assumes you downloaded an already prepared dataset from HF CLI as follows:
2121
# huggingface-cli download --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset --local-dir /path/to/my/datasets/disney-dataset
22-
23-
DATA_ROOT="/share/home/zyx/disney_cogvideox-encoded-multi"
24-
CAPTION_COLUMN="prompts.txt"
22+
DATA_ROOT="/path/to/my/datasets/disney-dataset"
23+
CAPTION_COLUMN="prompt.txt"
2524
VIDEO_COLUMN="videos.txt"
26-
MODEL_PATH="/share/official_pretrains/hf_home/CogVideoX-5b"
27-
2825

2926
# Launch experiments with different hyperparameters
3027
for learning_rate in "${LEARNING_RATES[@]}"; do
3128
for lr_schedule in "${LR_SCHEDULES[@]}"; do
3229
for optimizer in "${OPTIMIZERS[@]}"; do
3330
for steps in "${MAX_TRAIN_STEPS[@]}"; do
34-
output_dir="cogvideox-lora__optimizer_${optimizer}__steps_${steps}__lr-schedule_${lr_schedule}__learning-rate_${learning_rate}/"
31+
output_dir="/path/to/my/models/cogvideox-lora__optimizer_${optimizer}__steps_${steps}__lr-schedule_${lr_schedule}__learning-rate_${learning_rate}/"
3532

3633
cmd="accelerate launch --config_file $ACCELERATE_CONFIG_FILE --gpu_ids $GPU_IDS training/cogvideox_text_to_video_lora.py \
37-
--pretrained_model_name_or_path $MODEL_PATH \
34+
--pretrained_model_name_or_path THUDM/CogVideoX-5b \
3835
--data_root $DATA_ROOT \
3936
--caption_column $CAPTION_COLUMN \
4037
--video_column $VIDEO_COLUMN \
41-
--load_tensors \
4238
--id_token BW_STYLE \
4339
--height_buckets 480 \
4440
--width_buckets 720 \

0 commit comments

Comments
 (0)