MicroVerse/train at main · FreedomIntelligence/MicroVerse

Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
data/example	data/example
diffsynth	diffsynth
scripts	scripts
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
inference.py	inference.py
requirements.txt	requirements.txt
train.py	train.py

Fine-tuning the MicroVerse video generation model on microscopic biological simulation data, supporting both LoRA and full parameter training with automatic checkpoint resume.

Quick Start

1. Installation

cd train
pip install -r requirements.txt

2. Prepare Dataset

Create a directory with your videos and a metadata.json file:

data/your_dataset/
├── metadata.json
└── videos/
    ├── video_001.mp4
    ├── video_002.mp4
    └── ...

The metadata.json should be a JSON array:

[
  {
    "id": "video_001",
    "video": "videos/video_001.mp4",
    "prompt": "A detailed description of the microscopic process..."
  }
]

Fields:

Field	Type	Required	Description
`id`	string	No	Unique identifier for the sample
`video`	string	Yes	Relative path to the video file
`prompt`	string	Yes	Text description of the microscopic process

Video requirements:

Format: MP4 (H.264 recommended)
Resolution: Any (will be resized to training resolution)
Duration: 2–10 seconds recommended
FPS: Any (frames will be sampled automatically)

3. Training

LoRA Training (Recommended for getting started)

# 1.3B model — works on 1× 24 GB GPU
bash scripts/train_lora_1.3B.sh

# 14B model — works on 2× 24 GB GPUs
bash scripts/train_lora_14B.sh

Full Parameter Training

# 1.3B model — requires 2× 40 GB GPUs
bash scripts/train_full_1.3B.sh

# 14B model — requires 2× 80 GB GPUs with DeepSpeed
bash scripts/train_full_14B.sh

Edit the scripts to set:

--dataset_base_path — path to your dataset directory
--dataset_metadata_path — path to your metadata.json
--output_path — where to save checkpoints
--learning_rate — learning rate (1e-4 for LoRA, 1e-5 for full)
--num_epochs — number of training epochs
--save_steps — checkpoint save interval

4. Inference

Single video generation

python inference.py \
  --model_size 1.3B \
  --mode lora \
  --checkpoint ./outputs/lora_1.3B \
  --prompt "DNA replication inside a eukaryotic cell nucleus" \
  --output ./results/output.mp4

Batch generation from JSON

python inference.py \
  --model_size 14B \
  --mode full \
  --checkpoint ./outputs/full_14B \
  --metadata ./data/test_samples.json \
  --output ./results/batch_output/

Training Configurations

GPU Requirements

Mode	Model	Min GPUs	Min VRAM per GPU	Accelerate Config
LoRA	1.3B	1	24 GB	`accelerate_1gpu.yaml`
LoRA	14B	2	24 GB	`accelerate_2gpu.yaml`
Full	1.3B	2	40 GB	`accelerate_2gpu.yaml`
Full	14B	2	80 GB	`accelerate_deepspeed_2gpu.yaml`

LoRA vs Full Fine-tuning

Aspect	LoRA	Full
VRAM	Low	High
Speed	Fast	Slow
Checkpoint size	~100–500 MB	~3–27 GB
Quality	Good	Best
Recommended LR	1e-4	1e-5
Recommended epochs	3–5	1–2

Key Training Arguments

Argument	Default	Description
`--learning_rate`	1e-4	Learning rate
`--num_epochs`	1	Number of training epochs
`--height`	480	Video height (pixels)
`--width`	832	Video width (pixels)
`--num_frames`	81	Number of frames per video
`--save_steps`	None	Save checkpoint every N steps
`--lora_rank`	32	LoRA rank
`--lora_target_modules`	q,k,v,o,ffn.0,ffn.2	LoRA target modules
`--use_gradient_checkpointing_offload`	False	Offload gradient checkpoints to CPU
`--dataset_repeat`	1	Repeat dataset N times per epoch

Training automatically resumes from the latest checkpoint if interrupted. Checkpoints are saved as step-{N}.safetensors in the output directory.

Project Structure

train/
├── train.py                 # Training entry point
├── inference.py             # Inference entry point
├── requirements.txt         # Python dependencies
├── configs/                 # Accelerate/DeepSpeed configs
│   ├── accelerate_1gpu.yaml
│   ├── accelerate_2gpu.yaml
│   ├── accelerate_deepspeed_2gpu.yaml
│   └── accelerate_deepspeed_4gpu.yaml
├── scripts/                 # Ready-to-use shell scripts
│   ├── train_lora_1.3B.sh
│   ├── train_lora_14B.sh
│   ├── train_full_1.3B.sh
│   ├── train_full_14B.sh
│   ├── infer_lora_1.3B.sh
│   ├── infer_lora_14B.sh
│   ├── infer_full_1.3B.sh
│   ├── infer_full_14B.sh
│   └── infer_batch.sh
├── data/                    # Dataset directory
│   └── example/
│       ├── metadata.json    # Example metadata
│       └── videos/
└── diffsynth/               # Core training/inference engine

Acknowledgments

This training framework builds upon Wan2.1 and is adapted for MicroVerse microscale biological simulation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Quick Start

1. Installation

2. Prepare Dataset

3. Training

LoRA Training (Recommended for getting started)

Full Parameter Training

4. Inference

Single video generation

Batch generation from JSON

Training Configurations

GPU Requirements

LoRA vs Full Fine-tuning

Key Training Arguments

Project Structure

Acknowledgments

FilesExpand file tree

train

Directory actions

More options

Directory actions

More options

Latest commit

History

train

Folders and files

parent directory

README.md

Quick Start

1. Installation

2. Prepare Dataset

3. Training

LoRA Training (Recommended for getting started)

Full Parameter Training

4. Inference

Single video generation

Batch generation from JSON

Training Configurations

GPU Requirements

LoRA vs Full Fine-tuning

Key Training Arguments

Project Structure

Acknowledgments