This is the official PyTorch implementation for SoFlow.
SoFlow: Solution Flow Models for One-Step Generative Modeling
Tianze Luo, Haotian Yuan, Zhuang Liu
Princeton University
Arxiv: https://arxiv.org/abs/2512.15657
The code structure of this repository is straightforward:
latent_dataset.py: Contains the DDP code for VAE latent pre-extraction for ImageNet-256x256 conditional generation using SD-VAE.dit.py: Contains the Diffusion Transformer implementation from the DiT repository with slight modifications to adapt to our model.augmentation.pyandunet.py: Contain the data augmentation and U-Net for unconditional CIFAR-10 generation used in the EDM repository, with some slight modifications to adapt to our model.models.py: Contains our model's implementation including loss computation.train.pyandinference.py: Contain our DDP training and inference code.evaluator.py: Contains the standard evaluation code provided in the ADM repository, modified slightly for more convenient usage.
Our checkpoints are available at: https://huggingface.co/zlab-princeton/SoFlow.
Training and Inference Environment
conda create -n soflow python=3.10 -y
conda activate soflow
pip install -r requirements_soflow.txtEvaluation Environment
conda create -n soflow_eval python=3.10 -y
conda activate soflow_eval
pip install -r requirements_soflow_eval.txtDataset Preparation:
For ImageNet training, latent_dataset.py processes the raw ImageNet dataset from --data-path and saves it to --save-path using DDP. Specifically, different processes will process the dataset into several HDF5 files and finally merge them into a single file named imagenet_latent.hdf5.
conda activate soflow
torchrun --nproc-per-node=8 latent_dataset.py --data-path ./imagenet --save-path ./imagenet_latent --image-size 256 --device-batch-size 256 --num-workers 4 --seed 42For CIFAR-10 training, no data processing is required; you can run the training process directly.
To ensure ease of use, we have integrated all hyperparameters into a YAML file. We have provided detailed explanations for the hyperparameters inside imagenet.yaml and cifar.yaml. To launch a new training process, you can simply run:
conda activate soflow
torchrun --nproc-per-node=8 train.py --config your_config.yamlSome implementation details:
- Directory Management: If
working_dirdoes not exist, the training command will create it and copy the training config YAML file into it. After training,working_dirwill contain aconfig.yamlfile, alog.txtfile, and three folders:ckpts,evals, andfigs. - Resuming Training: If the training command is run with an existing
working_dir, the program will automatically load the latest checkpoints fromworking_dir/ckptsto continue the training process. - Visualization: For every
eval_demo_stepssteps configured, the program will automatically generate an image mesh with shapeeval_demo_shapeusingeval_NFEinference steps for visualization. - Checkpoints & Evaluation: For every eval_step steps configured, the program will automatically save checkpoints to
ckptsand save 50,000 inference images (witheval_NFEinference steps) to a .npz file in evals for evaluation.
Some Training Tips:
The schedule function l_init_ratio or simply use a constant schedule.
Additionally, when using a non-constant schedule, the defined total steps will affect performance at the current step. This is because the schedule decays more slowly as the total steps increase. For example, comparing two models both trained for 200k iterations, the one scheduled for 400k total steps will outperform the one scheduled for 800k total steps.
You can download our checkpoints and their training configs, along with the CIFAR-10 and ImageNet 256x256 reference files, by cloning the repository:
git clone https://huggingface.co/zlab-princeton/SoFlow- The reference file for the ImageNet 256x256 dataset is downloaded from the ADM repository.
- The reference file for the CIFAR-10 dataset is the training set containing 50,000 images, following previous works.
For ImageNet checkpoints, using the commands below will achieve a 1-NFE / 2-NFE FID-50K of 2.9617 / 2.6606.
conda activate soflow
torchrun --nproc-per-node=8 inference.py --config ./SoFlow/XL-2-cond/config.yaml --ckpt-steps 1200000 --eval-NFE 1 --eval-batch-size 125 --seed 42
torchrun --nproc-per-node=8 inference.py --config ./SoFlow/XL-2-cond/config.yaml --ckpt-steps 1200000 --eval-NFE 2 --eval-batch-size 125 --seed 42
conda activate soflow_eval
python evaluator.py --ref_batch ./SoFlow/Ref/VIRTUAL_imagenet256_labeled.npz --sample_batch_dir ./SoFlow/XL-2-cond/evalsPerformance for other ImageNet 256x256 models:
| Models | Train Epochs | 1-NFE FID-50K | 2-NFE FID-50K |
|---|---|---|---|
| SoFlow-B/4 (uncond) | 80 | 58.5646 | 58.2240 |
| SoFlow-B/4 (cond) | 80 | 11.5897 | 8.2212 |
| SoFlow-B/2 (cond) | 240 | 4.8491 | 4.2435 |
| SoFlow-M/2 (cond) | 240 | 3.7329 | 3.4229 |
| SoFlow-L/2 (cond) | 240 | 3.2007 | 2.8995 |
| SoFlow-XL/2 (cond) | 240 | 2.9617 | 2.6606 |
The results in this table are achieved by standard DiT architectures, only with slight modification by adding another time embedding.
For CIFAR-10, using the commands below will achieve a 1-NFE / 2-NFE FID-50K of 2.8600 / 2.2827.
conda activate soflow
torchrun --nproc-per-node=8 inference.py --config ./SoFlow/UNet-uncond/config.yaml --ckpt-steps 800000 --eval-NFE 1 --eval-batch-size 125 --seed 42
torchrun --nproc-per-node=8 inference.py --config ./SoFlow/UNet-uncond/config.yaml --ckpt-steps 800000 --eval-NFE 2 --eval-batch-size 125 --seed 42
conda activate soflow_eval
python evaluator.py --ref_batch ./SoFlow/Ref/cifar10.npz --sample_batch_dir ./SoFlow/UNet-uncond/evals Our evaluator can automatically evaluate all npz files in --sample_batch_dir and save the results into eval_results.txt.
@article{luo2025soflow,
title={SoFlow: Solution Flow Models for One-Step Generative Modeling},
author={Luo, Tianze and Yuan, Haotian and Liu, Zhuang},
journal={arXiv preprint arXiv:2512.15657},
year={2025}
}
