Official implementation of Value Gradient Guidance for Flow Matching Alignment (VGG-Flow), NeurIPS 2025.
VGG-Flow is an efficient and robust RL finetuning method for flow-matching models.
This repository currently provides:
- SD3 LoRA finetuning with VGG-Flow.
- Multiple reward models (
aesthetic_score,pickscore,imagereward,hpscore). - Config-driven training with command-line overrides.
train_vggflow.py: main training entrypoint.config/default_config.py: base config values.config/*.py: reward-specific experiment configs.lib/: model, flow matching, reward, and training modules.run.sh: example launch command with many overrides.
- Python >= 3.8
- CUDA (tested with 12.4)
- PyTorch + TorchVision
Install dependencies:
pip install -r requirements.txtBy default, training loads stabilityai/stable-diffusion-3-medium-diffusers via Diffusers.
Make sure your environment has access to required Hugging Face model weights.
Before training, review config/default_config.py and update values as needed.
Important fields:
config.logging.wandb_key: replace"PLACEHOLDER"if using Weights & Biases.config.logging.use_wandb: set toFalseto disable W&B logging.config.logging.wandb_dir: local W&B output directory.config.saving.output_dir: checkpoint/output directory.
Reward presets:
config/aesthetic.pyconfig/pickscore.pyconfig/imagereward.pyconfig/hpsv2.py
Example: 2-GPU single-node training with the aesthetic reward preset:
torchrun --standalone --nproc_per_node=2 train_vggflow.py \
--config=config/aesthetic.py \
--seed=1 \
--exp_name=exp_aestheticSingle-GPU run:
torchrun --standalone --nproc_per_node=1 train_vggflow.py \
--config=config/aesthetic.py \
--seed=1 \
--exp_name=exp_aestheticOverride any config value from the command line, for example:
torchrun --standalone --nproc_per_node=2 train_vggflow.py \
--config=config/aesthetic.py \
--config.model.reward_scale=1e4 \
--config.sampling.num_steps=20 \
--config.training.lr=1e-3 \
--seed=1 \
--exp_name=exp_customconfig.model.reward_scale: reward strength; larger values push harder toward the reward objective.config.model.timestep_fraction: fraction of trajectory transitions used for updates.config.sampling.num_steps: number of Euler sampling steps.config.training.lr: optimizer learning rate.config.training.batch_size+config.training.gradient_accumulation_steps: effective optimization batch size.config.model.unet_reg_scale: regularization strength for preserving base behavior.
- Checkpoints are saved under:
config.saving.output_dir/<reward>_vggflow_<exp_name>_seed<seed>/checkpoint_epoch*
- Training stats are written to:
.../result.json(compressed pickle format)
- If enabled, metrics and sample images are also logged to W&B.
If you find this work useful, please cite:
@inproceedings{liu2025vggflow,
title={Value Gradient Guidance for Flow Matching Alignment},
author={Liu, Zhen and Xiao, Tim Z. and Liu, Weiyang and Domingo-Enrich, Carles and Zhang, Dinghuai},
booktitle={NeurIPS},
year={2025},
}