You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-14Lines changed: 15 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,8 @@ Experience the CogVideoX-5B model online at <a href="https://huggingface.co/spac
22
22
23
23
## Project Updates
24
24
25
-
- 🔥🔥 News: ```2024/11/08```: We have released the CogVideoX1.5 model. CogVideoX1.5 is an upgraded version of the open-source model CogVideoX.
25
+
- 🔥🔥 **News**: ```2024/11/15```: We released the `CogVideoX1.5` model in the diffusers version. Only minor parameter adjustments are needed to continue using previous code.
26
+
- 🔥 News: ```2024/11/08```: We have released the CogVideoX1.5 model. CogVideoX1.5 is an upgraded version of the open-source model CogVideoX.
26
27
The CogVideoX1.5-5B series supports 10-second videos with higher resolution, and CogVideoX1.5-5B-I2V supports video generation at any resolution.
27
28
The SAT code has already been updated, while the diffusers version is still under adaptation. Download the SAT version code [here](https://huggingface.co/THUDM/CogVideoX1.5-5B-SAT).
28
29
- 🔥 **News**: ```2024/10/13```: A more cost-effective fine-tuning framework for `CogVideoX-5B` that works with a single
@@ -43,11 +44,11 @@ The SAT code has already been updated, while the diffusers version is still unde
43
44
model [CogVLM2-Caption](https://huggingface.co/THUDM/cogvlm2-llama3-caption), used in the training process of
44
45
CogVideoX to convert video data into text descriptions, has been open-sourced. Welcome to download and use it.
45
46
- 🔥 ```2024/8/27```: We have open-sourced a larger model in the CogVideoX series, **CogVideoX-5B**. We have
46
-
significantly optimized the model's inference performance, greatly lowering the inference threshold. You can run *
47
-
*CogVideoX-2B** on older GPUs like `GTX 1080TI`, and **CogVideoX-5B** on desktop GPUs like `RTX 3060`. Please strictly
47
+
significantly optimized the model's inference performance, greatly lowering the inference threshold.
48
+
You can run **CogVideoX-2B** on older GPUs like `GTX 1080TI`, and **CogVideoX-5B** on desktop GPUs like `RTX 3060`. Please strictly
48
49
follow the [requirements](requirements.txt) to update and install dependencies, and refer
49
-
to [cli_demo](inference/cli_demo.py) for inference code. Additionally, the open-source license for the **CogVideoX-2B
50
-
** model has been changed to the **Apache 2.0 License**.
50
+
to [cli_demo](inference/cli_demo.py) for inference code. Additionally, the open-source license for
51
+
the **CogVideoX-2B** model has been changed to the **Apache 2.0 License**.
51
52
- 🔥 ```2024/8/6```: We have open-sourced **3D Causal VAE**, used for **CogVideoX-2B**, which can reconstruct videos with
52
53
almost no loss.
53
54
- 🔥 ```2024/8/6```: We have open-sourced the first model of the CogVideoX series video generation models, **CogVideoX-2B
@@ -193,19 +194,19 @@ models we currently offer, along with their foundational information.
used to quantize the text encoder, transformer, and VAE modules to reduce the memory requirements of CogVideoX. This
279
281
allows the model to run on free T4 Colabs or GPUs with smaller memory! Also, note that TorchAO quantization is fully
280
282
compatible with `torch.compile`, which can significantly improve inference speed. FP8 precision must be used on
281
-
devices with NVIDIA H100 and above, requiring source installation of `torch`, `torchao`, `diffusers`, and `accelerate`
282
-
Python packages. CUDA 12.4 is recommended.
283
+
devices with NVIDIA H100 and above, requiring source installation of `torch`, `torchao` Python packages. CUDA 12.4 is recommended.
283
284
+ The inference speed tests also used the above memory optimization scheme. Without memory optimization, inference speed
284
285
increases by about 10%. Only the `diffusers` version of the model supports quantization.
285
286
+ The model only supports English input; other languages can be translated into English for use via large model
286
287
refinement.
287
-
+ The memory usage of model fine-tuning is tested in an `8 * H100` environment, and the program automatically
288
-
uses `Zero 2` optimization. If a specific number of GPUs is marked in the table, that number or more GPUs must be used
289
-
for fine-tuning.
288
+
290
289
291
290
## Friendly Links
292
291
@@ -319,6 +318,8 @@ works have already been adapted for CogVideoX, and we invite everyone to use the
319
318
+[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio): DiffSynth Studio is a diffusion engine. It has
320
319
restructured the architecture, including text encoders, UNet, VAE, etc., enhancing computational performance while
321
320
maintaining compatibility with open-source community models. The framework has been adapted for CogVideoX.
321
+
+[CogVideoX-Controlnet](https://github.com/TheDenk/cogvideox-controlnet): A simple ControlNet module code that includes the CogVideoX model.
322
+
+[VideoTuna](https://github.com/VideoVerses/VideoTuna): VideoTuna is the first repo that integrates multiple AI video generation models for text-to-video, image-to-video, text-to-image generation.
0 commit comments