Training setting

Hello, I have some queations about training settings.

1. In the inference code, there is a line that says:

conditional_latents_mask = mask_token.repeat(bsz_cfg, num_frames-2, 1, latent_h, latent_w)

It seems like two batches were used for CFG, but instead of using 0 for the unconditional part, the same values as the conditional part were repeated. Is there a specific reason for this approach? Was the model trained entirely with conditional training without any separate unconditional training?

2. Also, in the original SVD Xtend code, a learning rate of 1e-5 is typically used, but the Framer paper mentions using a learning rate of 1e-4. Is there a specific reason for this difference?

3. The SVD pretrained model used here generates 25 frames at a resolution of 1024x576, but isn’t there also a model that generates 14 frames at 512x320? The frame setting seems closer to the latter; is there a reason for choosing the former model?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training setting #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training setting #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions