.github page

zRzRzRzRzRzRzR · zRzRzRzRzRzRzR · commit 13083b5bbf6a · 2024-10-10T22:14:25.000+08:00
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yaml b/.github/ISSUE_TEMPLATE/bug_report.yaml
@@ -0,0 +1,51 @@
+name: "\U0001F41B Bug Report"
+description: Submit a bug report to help us improve CogVideoX-Factory / 提交一个 Bug 问题报告来帮助我们改进 CogVideoX-Factory 开源框架
+body:
+  - type: textarea
+    id: system-info
+    attributes:
+      label: System Info / 系統信息
+      description: Your operating environment / 您的运行环境信息
+      placeholder: Includes Cuda version, Diffusers version, Python version, operating system, hardware information (if you suspect a hardware problem)... / 包括Cuda版本，Diffusers，Python版本，操作系统，硬件信息(如果您怀疑是硬件方面的问题)...
+    validations:
+      required: true
+
+  - type: checkboxes
+    id: information-scripts-examples
+    attributes:
+      label: Information / 问题信息
+      description: 'The problem arises when using: / 问题出现在'
+      options:
+        - label: "The official example scripts / 官方的示例脚本"
+        - label: "My own modified scripts / 我自己修改的脚本和任务"
+
+  - type: textarea
+    id: reproduction
+    validations:
+      required: true
+    attributes:
+      label: Reproduction / 复现过程
+      description: |
+        Please provide a code example that reproduces the problem you encountered, preferably with a minimal reproduction unit.
+        If you have code snippets, error messages, stack traces, please provide them here as well.
+        Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
+        Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code.
+        
+        请提供能重现您遇到的问题的代码示例,最好是最小复现单元。
+        如果您有代码片段、错误信息、堆栈跟踪，也请在此提供。
+        请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
+        请勿使用截图，因为截图难以阅读，而且（更重要的是）不允许他人复制粘贴您的代码。
+      placeholder: |
+        Steps to reproduce the behavior/复现Bug的步骤:
+          
+          1.
+          2.
+          3.
+
+  - type: textarea
+    id: expected-behavior
+    validations:
+      required: true
+    attributes:
+      label: Expected behavior / 期待表现
+      description: "A clear and concise description of what you would expect to happen. /简单描述您期望发生的事情。"
diff --git a/.github/ISSUE_TEMPLATE/feature-request.yaml b/.github/ISSUE_TEMPLATE/feature-request.yaml
@@ -0,0 +1,34 @@
+name: "\U0001F680 Feature request"
+description: Submit a request for a new CogVideoX-Factory feature / 提交一个新的 CogVideoX-Factory 开源项目的功能建议
+labels: [ "feature" ]
+body:
+  - type: textarea
+    id: feature-request
+    validations:
+      required: true
+    attributes:
+      label: Feature request  / 功能建议
+      description: |
+        A brief description of the functional proposal. Links to corresponding papers and code are desirable.
+        对功能建议的简述。最好提供对应的论文和代码链接。
+
+  - type: textarea
+    id: motivation
+    validations:
+      required: true
+    attributes:
+      label: Motivation / 动机
+      description: |
+        Your motivation for making the suggestion. If that motivation is related to another GitHub issue, link to it here.
+        您提出建议的动机。如果该动机与另一个 GitHub 问题有关，请在此处提供对应的链接。
+
+  - type: textarea
+    id: contribution
+    validations:
+      required: true
+    attributes:
+      label: Your contribution / 您的贡献
+      description: |
+        
+        Your PR link or any other link you can help with.
+        您的PR链接或者其他您能提供帮助的链接。
diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 # CogVideoX Factory 🧪
 
+[中文阅读](./README_zh.md)
+
 Fine-tune Cog family of video models for custom video generation under 24GB of GPU memory ⚡️📼
 
 <table align="center">
diff --git a/README_zh.md b/README_zh.md
@@ -1,12 +1,19 @@
 # CogVideoX Factory 🧪
 
+[Read this in English](./README_zh.md)
+
 在 24GB GPU 内存下微调 Cog 系列视频模型以生成自定义视频 ⚡️📼
 
-TODO：添加有趣的视频结果表
+<table align="center">
+<tr>
+  <td align="center"><video src="https://github.com/user-attachments/assets/aad07161-87cb-4784-9e6b-16d06581e3e5">Your browser does not support the video tag.</video></td>
+</tr>
+</table>
+
 
 ## 快速开始
 
-确保已安装所需的依赖：`pip install -r requirements.txt`。
+克隆此仓库并确保已安装所有依赖：`pip install -r requirements.txt`。
 
 然后下载数据集：
 
@@ -15,25 +22,39 @@ TODO：添加有趣的视频结果表
 huggingface-cli download   --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset   --local-dir video-dataset-disney
 ```
 
-然后启动文本到视频的 LoRA 微调：
+然后启动文本到视频的 LoRA 微调（根据您的需求修改不同的超参数、数据集根目录和其他配置选项）：
 
 ```bash
-TODO
-```
+# 对 CogVideoX 文本到视频模型进行 LoRA 微调
+./train_text_to_video_lora.sh
 
-我们现在可以使用训练好的模型进行推理：
+# 对 CogVideoX 文本到视频模型进行全微调
+./train_text_to_video_sft.sh
 
-```python
-TODO
+# 对 CogVideoX 图像到视频模型进行 LoRA 微调
+./train_image_to_video_lora.sh
 ```
 
-我们还可以使用 LoRA 微调 5B 版本：
+假设您的 LoRA 已保存并推送到 HF Hub，并命名为 `my-awesome-name/my-awesome-lora`，我们现在可以使用微调后的模型进行推理：
+
+```diff
+import torch
+from diffusers import CogVideoXPipeline
+from diffusers import export_to_video
 
-```python
-TODO
+pipe = CogVideoXPipeline.from_pretrained(
+    "THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16
+).to("cuda")
++ pipe.load_lora_weights("my-awesome-name/my-awesome-lora", adapter_name=["cogvideox-lora"])
++ pipe.set_adapters(["cogvideox-lora"], [1.0])
+
+video = pipe("<my-awesome-prompt>").frames[0]
+export_to_video(video, "output.mp4", fps=8)
 ```
 
-在下方的部分中，我们提供了有关更多选项的详细信息，这些选项旨在使视频模型的微调尽可能易于使用。
+**注意：** 对于图像到视频的微调，您必须从 [此](https://github.com/huggingface/diffusers/pull/9482) 分支安装 diffusers（该分支添加了 CogVideoX 图像到视频的 LoRA 加载支持），直到它被合并。
+
+在下方的部分中，我们提供了在本仓库中探索的更多选项的详细信息。它们都试图通过尽可能减少内存需求，使视频模型的微调变得尽可能容易。
 
 ## 数据集准备
 
@@ -83,9 +104,9 @@ TODO：添加一个关于创建和使用预计算嵌入的部分。
 
 我们提供了与 [Cog 系列模型](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce) 兼容的文本到视频和图像到视频生成的训练脚本。
 
-查看 `*.sh` 文件
+查看 `*.sh` 文件。
 
-注意：未在 MPS 上测试
+注意：本代码未在 MPS 上测试，建议在 Linux 环境下使用 CUDA文件测试。
 
 ## 内存需求
 
@@ -101,7 +122,8 @@ TODO：添加一个关于创建和使用预计算嵌入的部分。
 支持和验证的内存优化训练选项包括：
 
 - [`torchao`](https://github.com/pytorch/ao) 中的 `CPUOffloadOptimizer`。您可以阅读它的能力和限制 [此处](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload)。简而言之，它允许您使用 CPU 存储可训练的参数和梯度。这导致优化器步骤在 CPU 上进行，需要一个快速的 CPU 优化器，例如 `torch.optim.AdamW(fused=True)` 或在优化器步骤上应用 `torch.compile`。此外，建议不要将模型编译用于训练。梯度裁剪和积累尚不支持。
-- [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/optimizers) 中的低位优化器。TODO：测试并使 [`torchao`](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim) 工作
+- [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/optimizers) 中的低位优化器。
+  - TODO：测试并使 [`torchao`](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim) 工作
 - DeepSpeed Zero2：由于我们依赖 `accelerate`，请按照[本指南](https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed) 配置 `accelerate` 以启用 DeepSpeed Zero2 优化。
 
 > [!IMPORTANT]
@@ -114,6 +136,6 @@ TODO：添加一个关于创建和使用预计算嵌入的部分。
 > [!NOTE]
 > 图像到视频 LoRA 微调的内存需求与 `THUDM/CogVideoX-5b` 上的文本到视频类似，因此未明确报告。
 >
-> 此外，要为 I2V 微调准备测试图像，您可以通过修改脚本动态生成它们，或使用以下命令从您的训练数据中提取一些帧：
+> I2V训练会使用视频的第一帧进行微调。 要为 I2V 微调准备测试图像，您可以通过修改脚本动态生成它们，或使用以下命令从您的训练数据中提取一些帧：
 > `ffmpeg -i input.mp4 -frames:v 1 frame.png`，
 > 或提供一个有效且可访问的图像 URL。
diff --git a/assets/contribute.md b/assets/contribute.md
@@ -0,0 +1,3 @@
+# 欢迎你们的贡献
+
+本项目属于非常初级的阶段
diff --git a/train_text_to_video_lora.sh b/train_text_to_video_lora.sh
@@ -4,7 +4,7 @@ export WANDB_MODE="offline"
 export NCCL_P2P_DISABLE=1
 export TORCH_NCCL_ENABLE_MONITORING=0
 
-GPU_IDS="0,1,2,3,4,5,6,7"
+GPU_IDS="0"
 
 # Training Configurations
 # Experiment with as many hyperparameters as you want!
@@ -19,26 +19,22 @@ ACCELERATE_CONFIG_FILE="accelerate_configs/uncompiled_1.yaml"
 # Absolute path to where the data is located. Make sure to have read the README for how to prepare data.
 # This example assumes you downloaded an already prepared dataset from HF CLI as follows:
 #   huggingface-cli download --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset --local-dir /path/to/my/datasets/disney-dataset
-
-DATA_ROOT="/share/home/zyx/disney_cogvideox-encoded-multi"
-CAPTION_COLUMN="prompts.txt"
+DATA_ROOT="/path/to/my/datasets/disney-dataset"
+CAPTION_COLUMN="prompt.txt"
 VIDEO_COLUMN="videos.txt"
-MODEL_PATH="/share/official_pretrains/hf_home/CogVideoX-5b"
-
 
 # Launch experiments with different hyperparameters
 for learning_rate in "${LEARNING_RATES[@]}"; do
   for lr_schedule in "${LR_SCHEDULES[@]}"; do
     for optimizer in "${OPTIMIZERS[@]}"; do
       for steps in "${MAX_TRAIN_STEPS[@]}"; do
-        output_dir="cogvideox-lora__optimizer_${optimizer}__steps_${steps}__lr-schedule_${lr_schedule}__learning-rate_${learning_rate}/"
+        output_dir="/path/to/my/models/cogvideox-lora__optimizer_${optimizer}__steps_${steps}__lr-schedule_${lr_schedule}__learning-rate_${learning_rate}/"
 
         cmd="accelerate launch --config_file $ACCELERATE_CONFIG_FILE --gpu_ids $GPU_IDS training/cogvideox_text_to_video_lora.py \
-          --pretrained_model_name_or_path $MODEL_PATH \
+          --pretrained_model_name_or_path THUDM/CogVideoX-5b \
           --data_root $DATA_ROOT \
           --caption_column $CAPTION_COLUMN \
           --video_column $VIDEO_COLUMN \
-          --load_tensors \
           --id_token BW_STYLE \
           --height_buckets 480 \
           --width_buckets 720 \

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# 欢迎你们的贡献`
	`2`	`+`
	`3`	`+本项目属于非常初级的阶段`