|
| 1 | +# AnimateDiff的微调和推理 |
| 2 | + |
| 3 | +SWIFT已经支持了AnimateDiff的微调和推理,目前支持两种方式:全参数微调和LoRA微调。 |
| 4 | + |
| 5 | +首先需要clone并安装SWIFT: |
| 6 | + |
| 7 | +```shell |
| 8 | +git clone https://github.com/modelscope/swift.git |
| 9 | +cd swift |
| 10 | +pip install ".[aigc]" |
| 11 | +``` |
| 12 | + |
| 13 | +## 全参数训练 |
| 14 | + |
| 15 | +### 训练效果 |
| 16 | + |
| 17 | +全参数微调可以复现[官方提供的模型animatediff-motion-adapter-v1-5-2](https://www.modelscope.cn/models/Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2/summary)的效果,需要的短视频数量较多,魔搭官方复现使用了官方数据集的subset版本:[WebVid 2.5M](https://maxbain.com/webvid-dataset/)。训练效果如下: |
| 18 | + |
| 19 | +```text |
| 20 | +Prompt:masterpiece, bestquality, highlydetailed, ultradetailed, girl, walking, on the street, flowers |
| 21 | +``` |
| 22 | + |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +```text |
| 28 | +Prompt: masterpiece, bestquality, highlydetailed, ultradetailed, beautiful house, mountain, snow top |
| 29 | +``` |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +2.5M子数据集训练的生成效果仍存在效果不稳定的情况,开发者使用10M数据集效果会更稳定。 |
| 34 | + |
| 35 | +### 运行命令 |
| 36 | + |
| 37 | +```shell |
| 38 | +# 该文件在swift/examples/pytorch/animatediff/scripts/full中 |
| 39 | +# Experimental environment: A100 * 4 |
| 40 | +# 200GB GPU memory totally |
| 41 | +PYTHONPATH=../../.. \ |
| 42 | +CUDA_VISIBLE_DEVICES=0,1,2,3 \ |
| 43 | +torchrun --nproc_per_node=4 animatediff_sft.py \ |
| 44 | + --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ |
| 45 | + --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \ |
| 46 | + --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \ |
| 47 | + --sft_type full \ |
| 48 | + --lr_scheduler_type constant \ |
| 49 | + --trainable_modules .*motion_modules.* \ |
| 50 | + --batch_size 4 \ |
| 51 | + --eval_steps 100 \ |
| 52 | + --gradient_accumulation_steps 16 \ |
| 53 | +``` |
| 54 | + |
| 55 | +我们使用了A100 * 4进行训练,共需要200GB显存,训练时长约40小时。数据格式如下: |
| 56 | + |
| 57 | +```text |
| 58 | +--csv_path 传入一个csv文件,该csv文件应包含如下格式: |
| 59 | +name,contentUrl |
| 60 | +Travel blogger shoot a story on top of mountains. young man holds camera in forest.,stock-footage-travel-blogger-shoot-a-story-on-top-of-mountains-young-man-holds-camera-in-forest.mp4 |
| 61 | +``` |
| 62 | + |
| 63 | +name字段代表该短视频的prompt,contentUrl代表该视频文件的名称 |
| 64 | + |
| 65 | +```text |
| 66 | +--video_folder 传入一个视频目录,该目录中包含了csv文件中,contentUrl指代的所有视频文件 |
| 67 | +``` |
| 68 | + |
| 69 | +使用全参数进行推理方式如下: |
| 70 | + |
| 71 | +```shell |
| 72 | +# 该文件在swift/examples/pytorch/animatediff/scripts/full中 |
| 73 | +# Experimental environment: A100 |
| 74 | +# 18GB GPU memory |
| 75 | +PYTHONPATH=../../.. \ |
| 76 | +CUDA_VISIBLE_DEVICES=0 \ |
| 77 | +python animatediff_infer.py \ |
| 78 | + --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ |
| 79 | + --sft_type full \ |
| 80 | + --ckpt_dir /output/path/like/checkpoints/iter-xxx \ |
| 81 | + --eval_human true \ |
| 82 | +``` |
| 83 | + |
| 84 | +其中的--ckpt_dir 传入训练时输出的文件夹即可。 |
| 85 | + |
| 86 | +## LoRA训练 |
| 87 | + |
| 88 | +### 运行命令 |
| 89 | + |
| 90 | +全参数训练会从0开始训练整个Motion-Adapter结构,用户可以使用现有的模型使用少量视频进行微调,只需要运行下面的命令: |
| 91 | + |
| 92 | +```shell |
| 93 | +# 该文件在swift/examples/pytorch/animatediff/scripts/lora中 |
| 94 | +# Experimental environment: A100 |
| 95 | +# 20GB GPU memory |
| 96 | +PYTHONPATH=../../.. \ |
| 97 | +CUDA_VISIBLE_DEVICES=0 \ |
| 98 | +python animatediff_sft.py \ |
| 99 | + --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ |
| 100 | + --csv_path /mnt/workspace/yzhao/tastelikefeet/webvid/results_2M_train.csv \ |
| 101 | + --video_folder /mnt/workspace/yzhao/tastelikefeet/webvid/videos2 \ |
| 102 | + --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \ |
| 103 | + --sft_type lora \ |
| 104 | + --lr_scheduler_type constant \ |
| 105 | + --trainable_modules .*motion_modules.* \ |
| 106 | + --batch_size 1 \ |
| 107 | + --eval_steps 200 \ |
| 108 | + --dataset_sample_size 10000 \ |
| 109 | + --gradient_accumulation_steps 16 \ |
| 110 | +``` |
| 111 | + |
| 112 | +视频数据参数同上。 |
| 113 | + |
| 114 | +推理命令如下: |
| 115 | + |
| 116 | +```shell |
| 117 | +# 该文件在swift/examples/pytorch/animatediff/scripts/lora中 |
| 118 | +# Experimental environment: A100 |
| 119 | +# 18GB GPU memory |
| 120 | +PYTHONPATH=../../.. \ |
| 121 | +CUDA_VISIBLE_DEVICES=0 \ |
| 122 | +python animatediff_infer.py \ |
| 123 | + --model_id_or_path wyj123456/Realistic_Vision_V5.1_noVAE \ |
| 124 | + --motion_adapter_id_or_path Shanghai_AI_Laboratory/animatediff-motion-adapter-v1-5-2 \ |
| 125 | + --sft_type lora \ |
| 126 | + --ckpt_dir /output/path/like/checkpoints/iter-xxx \ |
| 127 | + --eval_human true \ |
| 128 | +``` |
| 129 | + |
| 130 | +其中的--ckpt_dir 传入训练时输出的文件夹即可。 |
| 131 | + |
| 132 | +## 参数列表 |
| 133 | + |
| 134 | +下面给出训练和推理分别支持的参数列表及其含义: |
| 135 | + |
| 136 | +### 训练参数 |
| 137 | + |
| 138 | +```text |
| 139 | +motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径,指定这个参数可以基于现有的官方模型效果继续训练 |
| 140 | +motion_adapter_revision: Optional[str] = None # motion adapter的模型revision,仅在motion_adapter_id_or_path是模型id时有用 |
| 141 | +
|
| 142 | +model_id_or_path: str = None # sd基模型的模型id或模型路径 |
| 143 | +model_revision: str = None # sd基模型的revision,仅在model_id_or_path是模型id时有用 |
| 144 | +
|
| 145 | +dataset_sample_size: int = None # 数据集训练条数,默认代表全量训练 |
| 146 | +
|
| 147 | +sft_type: str = field( |
| 148 | + default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式,支持lora和全参数 |
| 149 | +
|
| 150 | +output_dir: str = 'output' # 输出文件夹 |
| 151 | +ddp_backend: str = field( |
| 152 | + default='nccl', metadata={'choices': ['nccl', 'gloo', 'mpi', 'ccl']}) # 如使用ddp训练,ddp backend |
| 153 | +
|
| 154 | +seed: int = 42 # 随机种子 |
| 155 | +
|
| 156 | +lora_rank: int = 8 # lora 参数 |
| 157 | +lora_alpha: int = 32 # lora 参数 |
| 158 | +lora_dropout_p: float = 0.05 # lora 参数 |
| 159 | +
|
| 160 | +gradient_checkpointing: bool = False # 是否开启gc,默认不开启。注:当前版本diffusers有问题,不支持该参数为True |
| 161 | +batch_size: int = 1 # batchsize |
| 162 | +num_train_epochs: int = 1 # epoch数 |
| 163 | +# if max_steps >= 0, override num_train_epochs |
| 164 | +learning_rate: Optional[float] = None # 学习率 |
| 165 | +weight_decay: float = 0.01 # adamw参数 |
| 166 | +gradient_accumulation_steps: int = 16 # ga大小 |
| 167 | +max_grad_norm: float = 1. # grad norm大小 |
| 168 | +lr_scheduler_type: str = 'cosine' # lr_scheduler的类型 |
| 169 | +warmup_ratio: float = 0.05 # 是否warmup及warmup占比 |
| 170 | +
|
| 171 | +eval_steps: int = 50 # eval step间隔 |
| 172 | +save_steps: Optional[int] = None # save step间隔 |
| 173 | +dataloader_num_workers: int = 1 # dataloader workers数量 |
| 174 | +
|
| 175 | +push_to_hub: bool = False # 是否推送到modelhub |
| 176 | +# 'user_name/repo_name' or 'repo_name' |
| 177 | +hub_model_id: Optional[str] = None # modelhub id |
| 178 | +hub_private_repo: bool = True |
| 179 | +push_hub_strategy: str = field( # 推送策略,推送最后一个还是每个都推送 |
| 180 | + default='push_best', |
| 181 | + metadata={'choices': ['push_last', 'all_checkpoints']}) |
| 182 | +# None: use env var `MODELSCOPE_API_TOKEN` |
| 183 | +hub_token: Optional[str] = field( # modelhub的token |
| 184 | + default=None, |
| 185 | + metadata={ |
| 186 | + 'help': |
| 187 | + 'SDK token can be found in https://modelscope.cn/my/myaccesstoken' |
| 188 | + }) |
| 189 | +
|
| 190 | +ignore_args_error: bool = False # True: notebook compatibility |
| 191 | +
|
| 192 | +text_dropout_rate: float = 0.1 # drop一定比例的文本保证模型鲁棒性 |
| 193 | +
|
| 194 | +validation_prompts_path: str = field( # 评测过程使用的prompt文件目录,默认使用swift/aigc/configs/validation.txt |
| 195 | + default=None, |
| 196 | + metadata={ |
| 197 | + 'help': |
| 198 | + 'The validation prompts file path, use aigc/configs/validation.txt is None' |
| 199 | + }) |
| 200 | +
|
| 201 | +trainable_modules: str = field( # 可训练模块,建议使用默认值 |
| 202 | + default='.*motion_modules.*', |
| 203 | + metadata={ |
| 204 | + 'help': |
| 205 | + 'The trainable modules, by default, the .*motion_modules.* will be trained' |
| 206 | + }) |
| 207 | +
|
| 208 | +mixed_precision: bool = True # 混合精度训练 |
| 209 | +
|
| 210 | +enable_xformers_memory_efficient_attention: bool = True # 使用xformers |
| 211 | +
|
| 212 | +num_inference_steps: int = 25 # |
| 213 | +guidance_scale: float = 8. |
| 214 | +sample_size: int = 256 |
| 215 | +sample_stride: int = 4 # 训练视频最大长度秒数 |
| 216 | +sample_n_frames: int = 16 # 每秒帧数 |
| 217 | +
|
| 218 | +csv_path: str = None # 输入数据集 |
| 219 | +video_folder: str = None # 输入数据集 |
| 220 | +
|
| 221 | +motion_num_attention_heads: int = 8 # motion adapter参数 |
| 222 | +motion_max_seq_length: int = 32 # motion adapter参数 |
| 223 | +num_train_timesteps: int = 1000 # 推理pipeline参数 |
| 224 | +beta_start: int = 0.00085 # 推理pipeline参数 |
| 225 | +beta_end: int = 0.012 # 推理pipeline参数 |
| 226 | +beta_schedule: str = 'linear' # 推理pipeline参数 |
| 227 | +steps_offset: int = 1 # 推理pipeline参数 |
| 228 | +clip_sample: bool = False # 推理pipeline参数 |
| 229 | +
|
| 230 | +use_wandb: bool = False # 是否使用wandb |
| 231 | +``` |
| 232 | + |
| 233 | +### 推理参数 |
| 234 | + |
| 235 | +```text |
| 236 | +motion_adapter_id_or_path: Optional[str] = None # motion adapter的模型id或模型路径,指定这个参数可以基于现有的官方模型效果继续训练 |
| 237 | +motion_adapter_revision: Optional[str] = None # motion adapter的模型revision,仅在motion_adapter_id_or_path是模型id时有用 |
| 238 | +
|
| 239 | +model_id_or_path: str = None # sd基模型的模型id或模型路径 |
| 240 | +model_revision: str = None # sd基模型的revision,仅在model_id_or_path是模型id时有用 |
| 241 | +
|
| 242 | +sft_type: str = field( |
| 243 | + default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式,支持lora和全参数 |
| 244 | +
|
| 245 | +ckpt_dir: Optional[str] = field( |
| 246 | + default=None, metadata={'help': '/path/to/your/vx_xxx/checkpoint-xxx'}) # 训练的输出文件夹 |
| 247 | +eval_human: bool = False # False: eval val_dataset # 是否使用人工输入评测 |
| 248 | +
|
| 249 | +seed: int = 42 # 随机种子 |
| 250 | +
|
| 251 | +# other |
| 252 | +ignore_args_error: bool = False # True: notebook compatibility |
| 253 | +
|
| 254 | +validation_prompts_path: str = None # 用于validation的文件,eval_human=False时使用,每一行一个prompt |
| 255 | +
|
| 256 | +output_path: str = './generated' # 输出gif的目录 |
| 257 | +
|
| 258 | +enable_xformers_memory_efficient_attention: bool = True # 使用xformers |
| 259 | +
|
| 260 | +num_inference_steps: int = 25 # |
| 261 | +guidance_scale: float = 8. |
| 262 | +sample_size: int = 256 |
| 263 | +sample_stride: int = 4 # 训练视频最大长度秒数 |
| 264 | +sample_n_frames: int = 16 # 每秒帧数 |
| 265 | +
|
| 266 | +motion_num_attention_heads: int = 8 # motion adapter参数 |
| 267 | +motion_max_seq_length: int = 32 # motion adapter参数 |
| 268 | +num_train_timesteps: int = 1000 # 推理pipeline参数 |
| 269 | +beta_start: int = 0.00085 # 推理pipeline参数 |
| 270 | +beta_end: int = 0.012 # 推理pipeline参数 |
| 271 | +beta_schedule: str = 'linear' # 推理pipeline参数 |
| 272 | +steps_offset: int = 1 # 推理pipeline参数 |
| 273 | +clip_sample: bool = False # 推理pipeline参数 |
| 274 | +``` |
0 commit comments