You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🤗🤗🤗 Videotuna is a useful codebase for text-to-video applications.
11
-
🌟 VideoTuna is the first repo that integrates multiple AI video generation models including `text-to-video (T2V)`, `image-to-video (I2V)`, `text-to-image (T2I)`, and `video-to-video (V2V)` generation for model inference and finetuning (to the best of our knowledge).
12
-
🌟 VideoTuna is the first repo that provides comprehensive pipelines in video generation, from fine-tuning to pre-training, continuous training, and post-training (alignment) (to the best of our knowledge).
13
-
🌟 An Emotion Control I2V model will be released soon.
10
+
🤗🤗🤗 Videotuna is a useful codebase for text-to-video applications.
11
+
🌟 VideoTuna is the first repo that integrates multiple AI video generation models including `text-to-video (T2V)`, `image-to-video (I2V)`, `text-to-image (T2I)`, and `video-to-video (V2V)` generation for model inference and finetuning (to the best of our knowledge).
12
+
🌟 VideoTuna is the first repo that provides comprehensive pipelines in video generation, from fine-tuning to pre-training, continuous training, and post-training (alignment) (to the best of our knowledge).
13
+
🌟 An Emotion Control I2V model will be released soon.
14
14
15
15
16
16
## Features
17
-
🌟 **All-in-one framework:** Inference and fine-tune up-to-date video generation models.
18
-
🌟 **Pre-training:** Build your own foundational text-to-video model.
19
-
🌟 **Continuous training:** Keep improving your model with new data.
20
-
🌟 **Domain-specific fine-tuning:** Adapt models to your specific scenario.
21
-
🌟 **Concept-specific fine-tuning:** Teach your models with unique concepts.
22
-
🌟 **Enhanced language understanding:** Improve model comprehension through continuous training.
23
-
🌟 **Post-processing:** Enhance the videos with video-to-video enhancement model.
24
-
🌟 **Post-training/Human preference alignment:** Post-training with RLHF for more attractive results.
17
+
🌟 **All-in-one framework:** Inference and fine-tune up-to-date video generation models.
18
+
🌟 **Pre-training:** Build your own foundational text-to-video model.
19
+
🌟 **Continuous training:** Keep improving your model with new data.
20
+
🌟 **Domain-specific fine-tuning:** Adapt models to your specific scenario.
21
+
🌟 **Concept-specific fine-tuning:** Teach your models with unique concepts.
22
+
🌟 **Enhanced language understanding:** Improve model comprehension through continuous training.
23
+
🌟 **Post-processing:** Enhance the videos with video-to-video enhancement model.
24
+
🌟 **Post-training/Human preference alignment:** Post-training with RLHF for more attractive results.
25
25
26
26
27
27
## 🔆 Updates
@@ -54,16 +54,16 @@
54
54
Video VAE+ can accurately compress and reconstruct the input videos with fine details.
Hunyuan model uses it to reduce memory usage and speed up inference. If it is not installed, the model will run in normal mode. Install the `flash-attn` via:
Please follow [docs/CHECKPOINTS.md](https://github.com/VideoVerses/VideoTuna/blob/main/docs/CHECKPOINTS.md) to download model checkpoints.
342
+
Please follow [docs/CHECKPOINTS.md](https://github.com/VideoVerses/VideoTuna/blob/main/docs/CHECKPOINTS.md) to download model checkpoints.
343
343
After downloading, the model checkpoints should be placed as [Checkpoint Structure](https://github.com/VideoVerses/VideoTuna/blob/main/docs/CHECKPOINTS.md#checkpoint-orgnization-structure).
First, run this command to convert the VC2 checkpoint as we make minor modifications on the keys of the state dict of the checkpoint. The converted checkpoint will be automatically save at `checkpoints/videocrafter/t2v_v2_512/model_converted.ckpt`.
398
+
First, run this command to convert the VC2 checkpoint as we make minor modifications on the keys of the state dict of the checkpoint. The converted checkpoint will be automatically save at `checkpoints/videocrafter/t2v_v2_512/model_converted.ckpt`.
Second, run this command to start training on the single GPU. The training results will be automatically saved at `results/train/${CURRENT_TIME}_${EXPNAME}`
403
+
Second, run this command to start training on the single GPU. The training results will be automatically saved at `results/train/${CURRENT_TIME}_${EXPNAME}`
404
404
```
405
405
poetry run train-videocrafter-v2
406
406
```
407
407
408
408
#### 2. VideoCrafter2 Lora Fine-tuning
409
409
410
-
We support lora finetuning to make the model to learn new concepts/characters/styles.
411
-
- Example config file: `configs/001_videocrafter2/vc2_t2v_lora.yaml`
412
-
- Training lora based on VideoCrafter2: `bash shscripts/train_videocrafter_lora.sh`
413
-
- Inference the trained models: `bash shscripts/inference_vc2_t2v_320x512_lora.sh`
410
+
We support lora finetuning to make the model to learn new concepts/characters/styles.
411
+
- Example config file: `configs/001_videocrafter2/vc2_t2v_lora.yaml`
412
+
- Training lora based on VideoCrafter2: `bash shscripts/train_videocrafter_lora.sh`
413
+
- Inference the trained models: `bash shscripts/inference_vc2_t2v_320x512_lora.sh`
414
414
415
415
#### 3. Open-Sora Fine-tuning
416
416
We support open-sora finetuning, you can simply run the following commands:
@@ -432,22 +432,22 @@ If you want to build your own dataset, please organize your data as `inputs/t2i/
432
432
```
433
433
owndata/
434
434
├── img1.jpg
435
-
├── img2.jpg
436
-
├── img3.jpg
435
+
├── img2.jpg
436
+
├── img3.jpg
437
437
├── ...
438
438
├── prompt1.txt # prompt of img1.jpg
439
439
├── prompt2.txt # prompt of img2.jpg
440
440
├── prompt3.txt # prompt of img3.jpg
441
441
├── ...
442
-
```
442
+
```
443
443
444
444
<!-- Please check [configs/train/003_vc2_lora_ft/README.md](configs/train/003_vc2_lora_ft/README.md) for details. -->
We support VBench evaluation to evaluate the T2V generation performance.
459
+
We support VBench evaluation to evaluate the T2V generation performance.
460
460
Please check [eval/README.md](docs/evaluation.md) for details.
461
461
462
462
<!-- ### 6. Alignment
463
463
We support video alignment post-training to align human perference for video diffusion models. Please check [configs/train/004_rlhf_vc2/README.md](configs/train/004_rlhf_vc2/README.md) for details. -->
464
464
465
+
# Contribute
466
+
467
+
## Git hooks
468
+
469
+
Git hooks are handled with [pre-commit](https://pre-commit.com) library.
470
+
471
+
### Hooks installation
472
+
473
+
Run the following command to install hooks on `commit`. They will check formatting, linting and types.
474
+
475
+
```shell
476
+
poetry run pre-commit install
477
+
poetry run pre-commit install --hook-type commit-msg
478
+
```
479
+
480
+
### Running the hooks without commiting
481
+
482
+
```shell
483
+
poetry run pre-commit run --all-files
484
+
```
465
485
466
486
## Acknowledgement
467
487
We thank the following repos for sharing their awesome models and codes!
0 commit comments