Skip to content

[Feature] Ask about training arguments #617

@yjhong89

Description

@yjhong89

Motivation

Questions regarding training arguments in here: https://github.com/hao-ai-lab/FastVideo/blob/main/examples/training/finetune/wan_i2v_14b_480p/crush_smol/finetune_i2v.slurm

batch size

  • there are 2 batch size arguments: train_batch_size, train_sp_batch_size
    • what is the difference between these two?
  • As far as I understand, dataloader fetches train_batch_size per GPU (FSDP) and batch needs to be same across same SP group devices. Then train_sp_batch_size means the batch size for same SP group devices ?

num_height / num_width

  • Is these two value for sampling validation dataset ?

parallel

Precisoin

  • There are 3 precision related arguments: mixed_precision, allow_tf32, dit_precision
    • What is the role of each of them?
    • mixed_precision is set to bf16 and dit_precisoin is set to fp32. Doens't thie raise conflict??

Miscellaneous

  • not_apply_cfg_solver multi_phased_distill_scheduler ema_start_step for distillation?

tp size

Thanks!

Related resources

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions