You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+64-2Lines changed: 64 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,6 +108,8 @@ Supported and verified memory optimizations for training include:
108
108
109
109
> [!IMPORTANT]
110
110
> The memory requirements are reported after running the `training/prepare_dataset.py`, which converts the videos and captions to latents and embeddings. During training, we directly load the latents and embeddings, and do not require the VAE or the T5 text encoder. However, if you perform validation/testing, these must be loaded and increase the amount of required memory. Not performing validation/testing saves a significant amount of memory, which can be used to focus solely on training if you're on smaller VRAM GPUs.
111
+
>
112
+
> If you choose to run validation/testing, you can save some memory on lower VRAM GPUs by specifying `--enable_model_cpu_offloading`.
111
113
112
114
### LoRA finetuning
113
115
@@ -307,7 +309,64 @@ With `train_batch_size = 4`:
307
309
### Full finetuning
308
310
309
311
> [!NOTE]
310
-
>`memory_after_validation`is indicative of the peak memory required for training. This is because apart from the activations, parameters and gradients stored for training, you also need to load the vae and text encoder in memory and spend some memory to perform inference. In order to reduce total memory required to perform training, one can choose to not perform validation/testing as part of the training script.
312
+
> Trying to run full finetuning without gradient checkpointing OOMs even on an A100 (80GB), so the memory measurements have not been specified.
>`memory_after_validation`is indicative of the peak memory required for training. This is because apart from the activations, parameters and gradients stored for training, you also need to load the vae and text encoder in memory and spend some memory to perform inference. In order to reduce total memory required to perform training, one can choose to not perform validation/testing as part of the training script.
395
+
396
+
- [x] Make scripts compatible withDDP
335
397
- [ ] Make scripts compatible withFSDP
336
398
- [x] Make scripts compatible with DeepSpeed
337
399
- [x] Test scripts with memory-efficient optimizer from bitsandbytes
0 commit comments