Question regarding the noise scheduler and training objectives

Thank you for open source the code for Wan video! The quality is truly amazing. I really have fun using the model to generate all kinds of videos. And they are all high quality!

I have one question regarding training the model. Specifically the noise schedule part. I read the technical report and the paper states that Wan is trained with the rectified flow objectives: 

$x_t = t x_1 + (1-t) x_0$

Thus the ground truth velocity $v_t = x_1 - x_0$ and the model's objective is trying to predict such velocity given the context, timestep, and $x_t$. 

But when I tried to train the TI2V-5B model, I found that the FlowMatchScheduler has different implementation. For instance, the `add_noise` and `training_target` here: https://github.com/modelscope/DiffSynth-Studio/blob/main/diffsynth/schedulers/flow_match.py#L94-L105

So I am wondering is this the same scheduler that was used to train the model released in the [repo of Wan 2.2](https://github.com/Wan-Video/Wan2.2)?

Thank you so much! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question regarding the noise scheduler and training objectives #1014

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question regarding the noise scheduler and training objectives #1014

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions