Skip to content

Commit 27af518

Browse files
committed
readme i.
1 parent 0937e92 commit 27af518

File tree

1 file changed

+48
-6
lines changed

1 file changed

+48
-6
lines changed

README.md

Lines changed: 48 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,41 @@
1-
# Finetuning CogVideoX
1+
# CogVideoX Factory 🧪
22

3+
Fine-tune Cog family of video models for custom video generation under 24GB of GPU memory ⚡️📼
4+
5+
TODO: Add table with fun video results
6+
7+
## Quickstart
8+
9+
Make sure the requirements are installed: `pip install -r requirements.txt`.
10+
11+
Then download a dataset:
12+
13+
```bash
14+
# install `huggingface_hub`
15+
huggingface-cli download \
16+
--repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset \
17+
--local-dir video-dataset-disney
18+
```
19+
20+
Then launch LoRA fine-tuning for text-to-video:
21+
22+
```bash
23+
TODO
24+
```
25+
26+
We can now use the trained model for inference:
27+
28+
```python
29+
TODO
30+
```
31+
32+
We can also fine-tune the 5B variant with LoRA:
33+
34+
```python
35+
TODO
36+
```
37+
38+
Below we provide additional sections detailing on more options we provide in this repository. They all attempt to make fine-tuning for video models as accessible as possible.
339

440
## Dataset Preparation
541

@@ -43,9 +79,11 @@ As an example, let's use [this](https://huggingface.co/datasets/Wild-Heart/Disne
4379
huggingface-cli download --repo-type dataset Wild-Heart/Disney-VideoGeneration-Dataset --local-dir video-dataset-disney
4480
```
4581

82+
TODO: Add a section on creating and using precomputed embeddings.
83+
4684
## Training
4785

48-
TODO
86+
We provide training script for both text-to-video and image-to-video generation which are compatible with the [Cog family of models](https://huggingface.co/collections/THUDM/cogvideo-66c08e62f1685a3ade464cce).
4987

5088
Take a look at `training/*.sh`
5189

@@ -63,9 +101,10 @@ Note: Untested on MPS
63101
</table>
64102

65103
Supported and verified memory optimizations for training include:
66-
- `CPUOffloadOptimizer` from [TorchAO](https://github.com/pytorch/ao). You can read about its capabilities and limitations [here](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload). In short, it allows you to use the CPU for storing trainable parameters and gradients. This results in the optimizer step happening on the CPU, which requires a fast CPU optimizer, such as `torch.AdamW(fused=True)` or applying `torch.compile` on the optimizer step. Additionally, it is recommended to not `torch.compile` your model for training. Gradient clipping and accumulation is not supported yet either.
67-
- Low-bit optimizers from [bitsandbytes](https://huggingface.co/docs/bitsandbytes/optimizers). TODO: to test and make [TorchAO](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim) ones work
68-
- TODO: DeepSpeed ZeRO
104+
105+
- `CPUOffloadOptimizer` from [`torchao`](https://github.com/pytorch/ao). You can read about its capabilities and limitations [here](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload). In short, it allows you to use the CPU for storing trainable parameters and gradients. This results in the optimizer step happening on the CPU, which requires a fast CPU optimizer, such as `torch.optim.AdamW(fused=True)` or applying `torch.compile` on the optimizer step. Additionally, it is recommended to not `torch.compile` your model for training. Gradient clipping and accumulation is not supported yet either.
106+
- Low-bit optimizers from [`bitsandbytes`](https://huggingface.co/docs/bitsandbytes/optimizers). TODO: to test and make [`torchao`](https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim) ones work
107+
- DeepSpeed Zero2: Since we rely on `accelerate`, follow [this guide](https://huggingface.co/docs/accelerate/en/usage_guides/deepspeed) to configure your `accelerate` installation to enable training with DeepSpeed Zero2 optimizations.
69108

70109
> [!IMPORTANT]
71110
> The memory requirements are reported after running the `training/prepare_dataset.py`, which converts the videos and captions to latents and embeddings. During training, we directly load the latents and embeddings, and do not require the VAE or the T5 text encoder. However, if you perform validation/testing, these must be loaded and increase the amount of required memory. Not performing validation/testing saves a significant amount of memory, which can be used to focus solely on training if you're on smaller VRAM GPUs.
@@ -250,8 +289,11 @@ ValueError: Expected a cuda device, but got: cpu
250289

251290
- [ ] Make scripts compatible with DDP
252291
- [ ] Make scripts compatible with FSDP
253-
- [ ] Make scripts compatible with DeepSpeed
292+
- [x] Make scripts compatible with DeepSpeed
254293
- [x] Test scripts with memory-efficient optimizer from bitsandbytes
255294
- [x] Test scripts with CPUOffloadOptimizer, etc.
256295
- [ ] Test scripts with torchao quantization, and low bit memory optimizers, etc.
257296
- [x] Make 5B lora finetuning work in under 24GB
297+
298+
> [!IMPORTANT]
299+
> Since our goal is to make the scripts as memory-friendly as possible we don't guarantee multi-GPU training.

0 commit comments

Comments
 (0)