Skip to content

Commit be459a7

Browse files
authored
Merge pull request #34 from samidarko/pre-commit-hooks
chore: pre-commit hook setup
2 parents c7d397f + d969873 commit be459a7

File tree

176 files changed

+499
-413
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

176 files changed

+499
-413
lines changed

.dockerignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ hooks
77
junit.xml
88
coverage.xml
99
docker
10-
docs
10+
docs

.pre-commit-config.yaml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# See https://pre-commit.com for more information
2+
# See https://pre-commit.com/hooks.html for more hooks
3+
fail_fast: false
4+
repos:
5+
- repo: local
6+
hooks:
7+
- id: format-checking
8+
name: format checking
9+
entry: poetry run format-check
10+
pass_filenames: false
11+
language: system
12+
stages: [pre-commit]
13+
# - id: linting
14+
# name: linting
15+
# entry: poetry run lint
16+
# pass_filenames: false
17+
# language: system
18+
# stages: [commit]
19+
# - id: type-checking
20+
# name: type checking
21+
# entry: poetry run type-check
22+
# pass_filenames: false
23+
# language: system
24+
# stages: [commit]
25+
# - id: unit-tests
26+
# name: unit tests
27+
# entry: poetry run test
28+
# pass_filenames: false
29+
# language: system
30+
# stages: [commit]
31+
- repo: https://github.com/commitizen-tools/commitizen
32+
rev: v2.28.0
33+
hooks:
34+
- id: commitizen
35+
stages:
36+
- commit-msg
37+
- repo: https://github.com/pre-commit/pre-commit-hooks
38+
rev: v4.3.0
39+
hooks:
40+
- id: check-merge-conflict
41+
- id: trailing-whitespace
42+
- id: end-of-file-fixer
43+
- id: check-added-large-files
44+
- id: detect-private-key
45+
- id: check-case-conflict
46+
- id: mixed-line-ending
47+
- id: detect-private-key

README.md

Lines changed: 50 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,21 @@
77
![Version](https://img.shields.io/badge/version-0.1.0-blue) ![visitors](https://visitor-badge.laobi.icu/badge?page_id=VideoVerses.VideoTuna&left_color=green&right_color=red) [![](https://dcbadge.limes.pink/api/server/AammaaR2?style=flat)](https://discord.gg/AammaaR2) <a href='https://github.com/user-attachments/assets/a48d57a3-4d89-482c-8181-e0bce4f750fd'><img src='https://badges.aleen42.com/src/wechat.svg'></a> [![Homepage](https://img.shields.io/badge/Homepage-VideoTuna-orange)](https://videoverses.github.io/videotuna/) [![GitHub](https://img.shields.io/github/stars/VideoVerses/VideoTuna?style=social)](https://github.com/VideoVerses/VideoTuna)
88

99

10-
🤗🤗🤗 Videotuna is a useful codebase for text-to-video applications.
11-
🌟 VideoTuna is the first repo that integrates multiple AI video generation models including `text-to-video (T2V)`, `image-to-video (I2V)`, `text-to-image (T2I)`, and `video-to-video (V2V)` generation for model inference and finetuning (to the best of our knowledge).
12-
🌟 VideoTuna is the first repo that provides comprehensive pipelines in video generation, from fine-tuning to pre-training, continuous training, and post-training (alignment) (to the best of our knowledge).
13-
🌟 An Emotion Control I2V model will be released soon.
10+
🤗🤗🤗 Videotuna is a useful codebase for text-to-video applications.
11+
🌟 VideoTuna is the first repo that integrates multiple AI video generation models including `text-to-video (T2V)`, `image-to-video (I2V)`, `text-to-image (T2I)`, and `video-to-video (V2V)` generation for model inference and finetuning (to the best of our knowledge).
12+
🌟 VideoTuna is the first repo that provides comprehensive pipelines in video generation, from fine-tuning to pre-training, continuous training, and post-training (alignment) (to the best of our knowledge).
13+
🌟 An Emotion Control I2V model will be released soon.
1414

1515

1616
## Features
17-
🌟 **All-in-one framework:** Inference and fine-tune up-to-date video generation models.
18-
🌟 **Pre-training:** Build your own foundational text-to-video model.
19-
🌟 **Continuous training:** Keep improving your model with new data.
20-
🌟 **Domain-specific fine-tuning:** Adapt models to your specific scenario.
21-
🌟 **Concept-specific fine-tuning:** Teach your models with unique concepts.
22-
🌟 **Enhanced language understanding:** Improve model comprehension through continuous training.
23-
🌟 **Post-processing:** Enhance the videos with video-to-video enhancement model.
24-
🌟 **Post-training/Human preference alignment:** Post-training with RLHF for more attractive results.
17+
🌟 **All-in-one framework:** Inference and fine-tune up-to-date video generation models.
18+
🌟 **Pre-training:** Build your own foundational text-to-video model.
19+
🌟 **Continuous training:** Keep improving your model with new data.
20+
🌟 **Domain-specific fine-tuning:** Adapt models to your specific scenario.
21+
🌟 **Concept-specific fine-tuning:** Teach your models with unique concepts.
22+
🌟 **Enhanced language understanding:** Improve model comprehension through continuous training.
23+
🌟 **Post-processing:** Enhance the videos with video-to-video enhancement model.
24+
🌟 **Post-training/Human preference alignment:** Post-training with RLHF for more attractive results.
2525

2626

2727
## 🔆 Updates
@@ -54,16 +54,16 @@
5454
Video VAE+ can accurately compress and reconstruct the input videos with fine details.
5555

5656
<table class="center">
57-
57+
5858
<tr>
5959
<td style="text-align:center;" width="320">Ground Truth</td>
6060
<td style="text-align:center;" width="320">Reconstruction</td>
6161
</tr>
6262
<tr>
6363
<td><a href="https://github.com/user-attachments/assets/0efcbf80-0074-4421-810f-79a1f1733ed3"><img src="https://github.com/user-attachments/assets/0efcbf80-0074-4421-810f-79a1f1733ed3" width="320"></a></td>
6464
<td><a href="https://github.com/user-attachments/assets/4adf29f2-d413-49b1-bccc-48adfd64a4da"><img src="https://github.com/user-attachments/assets/4adf29f2-d413-49b1-bccc-48adfd64a4da" width="320"></a></td>
65-
</tr>
66-
65+
</tr>
66+
6767
</table>
6868

6969
### Emotion Control I2V
@@ -210,7 +210,7 @@ VideoTuna/
210210
├── data # data processing scripts and dataset files
211211
├── docs # documentations
212212
├── eval # evaluation scripts
213-
├── inputs # input examples for testing
213+
├── inputs # input examples for testing
214214
├── scripts # train and inference python scripts
215215
├── shsripts # train and inference shell scripts
216216
├── src # model-related source code
@@ -283,7 +283,7 @@ poetry run pip install "modelscope[cv]" -f https://modelscope.oss-cn-beijing.ali
283283

284284
Hunyuan model uses it to reduce memory usage and speed up inference. If it is not installed, the model will run in normal mode. Install the `flash-attn` via:
285285
``` shell
286-
poetry run install-flash-attn
286+
poetry run install-flash-attn
287287
```
288288

289289
#### (3) If you use MacOS
@@ -339,7 +339,7 @@ docker compose run -it --remove-orphans videotuna bash
339339

340340
### 2.Prepare checkpoints
341341

342-
Please follow [docs/CHECKPOINTS.md](https://github.com/VideoVerses/VideoTuna/blob/main/docs/CHECKPOINTS.md) to download model checkpoints.
342+
Please follow [docs/CHECKPOINTS.md](https://github.com/VideoVerses/VideoTuna/blob/main/docs/CHECKPOINTS.md) to download model checkpoints.
343343
After downloading, the model checkpoints should be placed as [Checkpoint Structure](https://github.com/VideoVerses/VideoTuna/blob/main/docs/CHECKPOINTS.md#checkpoint-orgnization-structure).
344344

345345
### 3.Inference state-of-the-art T2V/I2V/T2I models
@@ -395,22 +395,22 @@ Before started, we assume you have finished the following two preliminary steps:
395395
ll checkpoints/stablediffusion/v2-1_512-ema/model.ckpt
396396
```
397397

398-
First, run this command to convert the VC2 checkpoint as we make minor modifications on the keys of the state dict of the checkpoint. The converted checkpoint will be automatically save at `checkpoints/videocrafter/t2v_v2_512/model_converted.ckpt`.
398+
First, run this command to convert the VC2 checkpoint as we make minor modifications on the keys of the state dict of the checkpoint. The converted checkpoint will be automatically save at `checkpoints/videocrafter/t2v_v2_512/model_converted.ckpt`.
399399
```
400400
python tools/convert_checkpoint.py --input_path checkpoints/videocrafter/t2v_v2_512/model.ckpt
401401
```
402402

403-
Second, run this command to start training on the single GPU. The training results will be automatically saved at `results/train/${CURRENT_TIME}_${EXPNAME}`
403+
Second, run this command to start training on the single GPU. The training results will be automatically saved at `results/train/${CURRENT_TIME}_${EXPNAME}`
404404
```
405405
poetry run train-videocrafter-v2
406406
```
407407

408408
#### 2. VideoCrafter2 Lora Fine-tuning
409409

410-
We support lora finetuning to make the model to learn new concepts/characters/styles.
411-
- Example config file: `configs/001_videocrafter2/vc2_t2v_lora.yaml`
412-
- Training lora based on VideoCrafter2: `bash shscripts/train_videocrafter_lora.sh`
413-
- Inference the trained models: `bash shscripts/inference_vc2_t2v_320x512_lora.sh`
410+
We support lora finetuning to make the model to learn new concepts/characters/styles.
411+
- Example config file: `configs/001_videocrafter2/vc2_t2v_lora.yaml`
412+
- Training lora based on VideoCrafter2: `bash shscripts/train_videocrafter_lora.sh`
413+
- Inference the trained models: `bash shscripts/inference_vc2_t2v_320x512_lora.sh`
414414

415415
#### 3. Open-Sora Fine-tuning
416416
We support open-sora finetuning, you can simply run the following commands:
@@ -432,22 +432,22 @@ If you want to build your own dataset, please organize your data as `inputs/t2i/
432432
```
433433
owndata/
434434
├── img1.jpg
435-
├── img2.jpg
436-
├── img3.jpg
435+
├── img2.jpg
436+
├── img3.jpg
437437
├── ...
438438
├── prompt1.txt # prompt of img1.jpg
439439
├── prompt2.txt # prompt of img2.jpg
440440
├── prompt3.txt # prompt of img3.jpg
441441
├── ...
442-
```
442+
```
443443

444444
<!-- Please check [configs/train/003_vc2_lora_ft/README.md](configs/train/003_vc2_lora_ft/README.md) for details. -->
445-
<!--
445+
<!--
446446
447447
(1) Prepare data
448448
449449
450-
(2) Finetune
450+
(2) Finetune
451451
```
452452
bash configs/train/000_videocrafter2ft/run.sh
453453
``` -->
@@ -456,12 +456,32 @@ bash configs/train/000_videocrafter2ft/run.sh
456456

457457

458458
### 5. Evaluation
459-
We support VBench evaluation to evaluate the T2V generation performance.
459+
We support VBench evaluation to evaluate the T2V generation performance.
460460
Please check [eval/README.md](docs/evaluation.md) for details.
461461

462462
<!-- ### 6. Alignment
463463
We support video alignment post-training to align human perference for video diffusion models. Please check [configs/train/004_rlhf_vc2/README.md](configs/train/004_rlhf_vc2/README.md) for details. -->
464464

465+
# Contribute
466+
467+
## Git hooks
468+
469+
Git hooks are handled with [pre-commit](https://pre-commit.com) library.
470+
471+
### Hooks installation
472+
473+
Run the following command to install hooks on `commit`. They will check formatting, linting and types.
474+
475+
```shell
476+
poetry run pre-commit install
477+
poetry run pre-commit install --hook-type commit-msg
478+
```
479+
480+
### Running the hooks without commiting
481+
482+
```shell
483+
poetry run pre-commit run --all-files
484+
```
465485

466486
## Acknowledgement
467487
We thank the following repos for sharing their awesome models and codes!

configs/000_videocrafter/vc1_i2v_512.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ model:
2222

2323
diffusion_scheduler_config:
2424
target: videotuna.base.diffusion_schedulers.LDMScheduler
25-
params:
25+
params:
2626
timesteps: 1000
2727
linear_start: 0.00085
2828
linear_end: 0.012
@@ -87,4 +87,4 @@ model:
8787
img_cond_stage_config:
8888
target: videotuna.lvdm.modules.encoders.condition.FrozenOpenCLIPImageEmbedderV2
8989
params:
90-
freeze: true
90+
freeze: true

configs/000_videocrafter/vc1_t2v_1024.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,11 @@ model:
2121

2222
diffusion_scheduler_config:
2323
target: videotuna.base.diffusion_schedulers.LDMScheduler
24-
params:
24+
params:
2525
timesteps: 1000
2626
linear_start: 0.00085
2727
linear_end: 0.012
28-
28+
2929
unet_config:
3030
target: videotuna.lvdm.modules.networks.openaimodel3d.UNetModel
3131
params:

configs/001_videocrafter2/vc2_t2v_320x512.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ model:
2424

2525
diffusion_scheduler_config:
2626
target: videotuna.base.diffusion_schedulers.LDMScheduler
27-
params:
27+
params:
2828
timesteps: 1000
2929
linear_start: 0.00085
3030
linear_end: 0.012
@@ -137,4 +137,3 @@ lightning:
137137
save_weights_only: False
138138
every_n_epochs: 300
139139
every_n_train_steps: null
140-

configs/001_videocrafter2/vc2_t2v_lora.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ model:
55
target: videotuna.base.ddpm3d.LVDMFlow
66
params:
77
lora_args:
8-
# lora_ckpt: "/path/to/lora.ckpt" # no need for the first-time training, only used for resume training.
8+
# lora_ckpt: "/path/to/lora.ckpt" # no need for the first-time training, only used for resume training.
99
target_modules: ["to_q", "to_k", "to_v"]
1010
lora_rank: 4
1111
lora_alpha: 1
@@ -30,7 +30,7 @@ model:
3030

3131
diffusion_scheduler_config:
3232
target: videotuna.base.diffusion_schedulers.LDMScheduler
33-
params:
33+
params:
3434
timesteps: 1000
3535
linear_start: 0.00085
3636
linear_end: 0.012
@@ -145,4 +145,3 @@ lightning:
145145
save_weights_only: False
146146
# every_n_epochs: 300
147147
every_n_train_steps: 10
148-

configs/002_dynamicrafter/dc_i2v_1024.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ model:
2424

2525
diffusion_scheduler_config:
2626
target: videotuna.base.diffusion_schedulers.LDMScheduler
27-
params:
27+
params:
2828
timesteps: 1000
2929
linear_start: 0.00085
3030
linear_end: 0.012
@@ -96,7 +96,7 @@ model:
9696
target: videotuna.lvdm.modules.encoders.condition.FrozenOpenCLIPImageEmbedderV2
9797
params:
9898
freeze: true
99-
99+
100100
image_proj_stage_config:
101101
target: videotuna.lvdm.modules.encoders.ip_resampler.Resampler
102102
params:
@@ -169,4 +169,3 @@ lightning:
169169
filename: "{epoch:06}-{step:09}"
170170
save_weights_only: True
171171
every_n_train_steps: 10000
172-

configs/003_opensora/opensorav10_256x256.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ model:
1111
# cond_stage_trainable: false
1212
cond_stage_trainable: true
1313
conditioning_key: crossattn_stdit
14-
image_size: # TO CHECK
14+
image_size: # TO CHECK
1515
- 32
1616
- 32
1717
channels: 4
@@ -26,7 +26,7 @@ model:
2626

2727
diffusion_scheduler_config:
2828
target: videotuna.base.iddpm3d.OpenSoraScheduler
29-
params:
29+
params:
3030
timesteps: 1000
3131
linear_start: 0.00085
3232
linear_end: 0.012
@@ -42,7 +42,7 @@ model:
4242
input_size:
4343
- 16
4444
- 32
45-
- 32
45+
- 32
4646
first_stage_config:
4747
target: videotuna.lvdm.opensoravae.VideoAutoencoderKL
4848
params:
@@ -53,7 +53,7 @@ model:
5353
params:
5454
from_pretrained: "DeepFloyd/t5-v1_1-xxl"
5555
model_max_length: 120
56-
shardformer: False # TODO
56+
shardformer: False # TODO
5757

5858
data:
5959
target: videotuna.data.lightning_data.DataModuleFromConfig
@@ -107,4 +107,4 @@ lightning:
107107
# target: pytorch_lightning.callbacks.ModelCheckpoint
108108
# params:
109109
# every_n_epochs: 1
110-
# filename: "{epoch:04}-{step:06}"
110+
# filename: "{epoch:04}-{step:06}"

0 commit comments

Comments
 (0)