[Feature] Ask about training arguments

### Motivation

Questions regarding training arguments in here: https://github.com/hao-ai-lab/FastVideo/blob/main/examples/training/finetune/wan_i2v_14b_480p/crush_smol/finetune_i2v.slurm

### batch size
- there are 2 batch size arguments: train_batch_size, train_sp_batch_size
  - what is the difference between these two?
- As far as I understand, dataloader fetches `train_batch_size` per GPU (FSDP) and batch needs to be same across same SP group devices. Then `train_sp_batch_size` means the batch size for same SP group devices ?
  - https://github.com/hao-ai-lab/FastVideo/blob/65ed58857083bfe8a03544af27b871b8bf7d5059/fastvideo/utils/communications.py#L281

### num_height / num_width
- Is these two value for sampling validation dataset ?

### parallel
- why `tp_size` is fixed to 1?
  - https://github.com/hao-ai-lab/FastVideo/blob/65ed58857083bfe8a03544af27b871b8bf7d5059/examples/training/finetune/wan_i2v_14b_480p/crush_smol/finetune_i2v.slurm#L69

### Precisoin
- There are 3 precision related arguments: mixed_precision, allow_tf32, dit_precision
  - What is the role of each of them?
  - mixed_precision is set to bf16 and dit_precisoin is set to fp32. Doens't thie raise conflict??

### Miscellaneous
- `not_apply_cfg_solver` `multi_phased_distill_scheduler` `ema_start_step` for distillation?

### tp size
- tp_size is set to num_gpus in inference https://github.com/hao-ai-lab/FastVideo/blob/65ed58857083bfe8a03544af27b871b8bf7d5059/scripts/inference/v1_inference_wan_i2v.sh#L10
- but is set to 1 in training: https://github.com/hao-ai-lab/FastVideo/blob/65ed58857083bfe8a03544af27b871b8bf7d5059/examples/training/finetune/wan_i2v_14b_480p/crush_smol/finetune_i2v.sh#L33
- why?

Thanks!  

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Ask about training arguments #617

Motivation

batch size

num_height / num_width

parallel

Precisoin

Miscellaneous

tp size

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Ask about training arguments #617

Description

Motivation

batch size

num_height / num_width

parallel

Precisoin

Miscellaneous

tp size

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions