Skip to content

Training setting #7

@yongjinColinChoi

Description

@yongjinColinChoi

Hello, I have some queations about training settings.

  1. In the inference code, there is a line that says:

conditional_latents_mask = mask_token.repeat(bsz_cfg, num_frames-2, 1, latent_h, latent_w)

It seems like two batches were used for CFG, but instead of using 0 for the unconditional part, the same values as the conditional part were repeated. Is there a specific reason for this approach? Was the model trained entirely with conditional training without any separate unconditional training?

  1. Also, in the original SVD Xtend code, a learning rate of 1e-5 is typically used, but the Framer paper mentions using a learning rate of 1e-4. Is there a specific reason for this difference?

  2. The SVD pretrained model used here generates 25 frames at a resolution of 1024x576, but isn’t there also a model that generates 14 frames at 512x320? The frame setting seems closer to the latter; is there a reason for choosing the former model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions