- Uploaded inference code and corresponding model weights.
- Uploaded data collection, model training, and evaluation code.
- Upload the latest version of the paper.
- Check for hidden bugs in the codebase.
This section will guide you through setting up the environment required to run and develop with this project.
We recommend using Anaconda or Miniconda to manage your Python environments, but you can also use venv
or other tools.
# Create a new environment with Python 3.8+
conda create -n golden-noise python=3.8 -y
conda activate golden-noise
# Create a new virtual environment
python -m venv golden-noise-env
# Activate the environment (Windows)
golden-noise-env\Scripts\activate
# Activate the environment (Linux/MacOS)
source golden-noise-env/bin/activate
Please follow the official PyTorch installation guide to install the correct version for your CUDA driver. For example:
# Example: install PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Install the required Python packages:
pip install diffusers PIL numpy timm argparse einops
Note: If you encounter issues with package versions, please refer to the requirements in this README or open an issue.
The data/
folder provides several resources for both training and evaluation:
- drawbench.csv, HPD_prompt.csv, pickscore.csv: Three test sets containing prompts and evaluation data for benchmarking model performance.
- pickscore_train_prompts.json: A prompt dataset used for collecting training data, containing prompts for generating training pairs.
- train_60000.json: An example training dataset, containing prompts and seeds, demonstrating the format and structure for model training.
To prepare training data, use the script training/dataset_collection.py
. This script reads prompts and seeds from the selected prompt dataset (e.g., train_60000.json
), generates corresponding source noise and target noise for each prompt, and saves the results into .npz
files for efficient training.
Notice: Some folders cannot be directly downloaded from GitHub due to repository file size and policy restrictions. We provide external download links for these resources. Please download them manually and place them in the
training
directory as instructed. Download Link: google drive
Training scripts are located in the training/
directory.
main.py
: Entry point for model trainingdataset_collection.py
: Collects and processes training data by generating source and target noise pairs from prompt datasets. Run this script before training to prepare your.npz
training files.
python training/dataset_collection.py --prompt_dataset data/train_60000.json --output_dir data/train_npz/
This will generate .npz
files containing the source and target noise for each prompt in the specified dataset.
python training/main.py --pipeline=SDXL --model=svd_unet+unet --train=True --pick=True --all-file=True --discard=True --test=True --postfix=_test --evaluate=False --discard=True
--ddp
: Whether to use Distributed Data Parallel (DDP) training. Default: False.--pipeline
: Model pipeline to use. Options: 'SDXL', 'SD2.1', 'DS-turbo', 'DiT'. Default: 'SDXL'.--model
: Model architecture. Options: 'unet', 'vit', 'svd_unet', 'svd_unet+unet', 'e_unet', 'svd_unet+unet+dit'. Default: 'svd_unet+unet'.--benchmark-type
: Benchmark type. Options: 'pick', 'draw'. Default: 'pick'.--train
: Whether to run training. Default: False.--test
: Whether to run testing/inference. Default: False.--postfix
: Postfix for output files and checkpoints. Default: '_hps_sdxl_step_10_random_noise'.--acculumate-steps
: Number of gradient accumulation steps. Default: 64.--pick
: Whether to use PickScore for filtering or evaluation. Default: False.--do-classifier-free-guidance
: Whether to use classifier-free guidance during generation. Default: True.--inference-step
: Number of inference steps for the diffusion process. Default: 10.--size
: Image size (height and width). Default: 1024.--RatioT
: Ratio parameter for training (custom use). Default: 1.0.--guidance-scale
: Guidance scale for classifier-free guidance. Default: 5.5.--guidance-rescale
: Rescale factor for guidance. Default: 0.0.--all-file
: Whether to use all files in the dataset. Default: False.--epochs
: Number of training epochs. Default: 30.--batch-size
: Batch size for training. Default: 64.--num-workers
: Number of worker processes for data loading. Default: 16.--metric-version
: Metric for evaluation. Options: 'PickScore', 'HPS v2', 'AES', 'ImageReward'. Default: 'PickScore'.--prompt-path
: Path to the prompt JSON file for training. Default: './sdxl_step_10_training_seed.json'.--data-dir
: Directory containing the training data (noise pairs). Default: './datasets/noise_pairs_SDXL_10_pick_total/'.--pretrained-path
: Path to pretrained model weights. Default: './training/checkpoints'.--save-ckpt-path
: Path to save model checkpoints. Default: './training/checkpoints/SDXL-10/svd_unet+unet'.--discard
: Whether to discard bad samples during training. Default: False.
- Download the
metric
andreward_model
folders from Google Drive and place them under thetraining/
directory.
- Download link: google drive
- Final layout example:
training/
metric/
reward_model/
utils/
solver/
...
- The script will create a JSONL file that records each
prompt
and its correspondingrandom_seed
. This JSONL will be used later bymain.py
during training as--prompt-path
. - Saved data (.npz) will contain the paired latents/noise as defined in the code. You can edit
cal_score
to collect more metrics. The storage structure follows the logic in the code.
Multi-GPU data collection and index management:
- If different GPUs generate different index ranges, adjust the sample index to avoid collisions.
- Example:
- GPU 0 generates indices [0, 10000): do not change
idx
. - GPU 1 generates indices [10000, 20000): set
idx = idx + 10000
before saving.
- GPU 0 generates indices [0, 10000): do not change
- After collection, merge all JSONL files into a single file for training. The
.npz
files do not require merging or further processing.
- Recommended baseline:
--model=svd_unet+unet
. - Use
--pick True
as the typical setting. To filter bad samples, add--discard True
. The filtering logic is implemented intraining/utils/utils.py
viaload_pick_discard_prompt
; modify it if needed. - If you have multiple
.npz
folders, set--all-file True
; otherwise keep itFalse
. - Set
--prompt-path
to the JSONL created bydataset_collection.py
(if you produced multiple JSONLs, merge them into one file first).
Example command:
python training/main.py \
--pipeline SDXL \
--model svd_unet+unet \
--pick True \
--discard True \
--all-file True \
--prompt-path ./training/SDXL_step_10_training.json \
--data-dir ./training/datasets/noise_pairs_SDXL_10_pick_total/ \
--epochs 30 \
--batch-size 64
We provide evaluation code in training/metric/cal_metric.py
for quantitative assessment of generated images.
Usage Notes:
- You need to prepare the corresponding prompt file and the generated images for evaluation.
- The image folder should follow this structure:
images/
origin/
0.png
1.png
...
optim/
0.png
1.png
...
origin/
contains images generated by the baseline model, andoptim/
contains images generated by the optimized (e.g., NPNet) model. The file names should correspond to the prompt order.- Before running the evaluation, make sure you have downloaded the required weights for the reward model you want to use (e.g., PickScore, HPSv2, ImageReward, etc.).
Example command:
python training/metric/cal_metric.py --prompt_file data/your_prompts_file --image_folder images/ --metric PickScore
Replace --metric
with your desired evaluation metric and adjust paths as needed.
This guide provides instructions on how to use the NPNet, a noise prompt network aims to transform the random Gaussian noise into golden noise, by adding a small desirable perturbation derived from the text prompt to boost the overall quality and semantic faithfulness of the synthesized images.
Here we provide the inference code which supports different models like Stable Diffusion XL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT..
Besides, you can apply the checkpoint of NPNet on SDXL to the models like SDXL-Lightning, LCM, and PCM. The visualizations of these three models are shown below:
We directly use the checkpoint from SDXL to SDXL-Lightning, LCM, and PCM, and evaluate them on Geneval dataset:
Model | Method | PickScore↑ | HPSv2↑ | AES↑ | ImageReward↑ | CLIPScore↑ |
---|---|---|---|---|---|---|
SDXL-Lightning(4-step) | standard | 22.85 | 29.12 | 5.65 | 59.02 | 0.8093 |
SDXL-Lightning(4-step) | ours | 23.03 | 29.71 | 5.71 | 72.67 | 0.8150 |
LCM(4-step) | standard | 22.30 | 26.52 | 5.49 | 33.21 | 0.8050 |
LCM(4-step) | ours | 22.38 | 26.83 | 5.55 | 37.08 | 0.8123 |
PCM(8-step) | standard | 22.05 | 26.98 | 5.52 | 23.28 | 0.8031 |
PCM(8-step) | ours | 22.22 | 27.59 | 5.56 | 35.01 | 0.8175 |
The results demonstrate the effectiveness of our NPNet on few-steps image generation.
To use the NPNet pipeline, you need to run the npnet_pipeline.py
script with appropriate command-line arguments. Below are the available options:
--pipeline
: Select the model pipeline (SDXL
,DreamShaper
,DiT
). Default isSDXL
.--prompt
: The textual prompt based on which the image will be generated. Default is "A banana on the left of an apple."--inference-step
: Number of inference steps for the diffusion process. Default is 50.--cfg
: Classifier-free guidance scale. Default is 5.5.--pretrained-path
: Path to the pretrained model weights. Default is a specified path in the script.--size
: The size (height and width) of the generated image. Default is 1024.
Run the script from the command line by navigating to the directory containing npnet_pipeline.py
and executing:
python npnet_pipeline.py --pipeline SDXL --prompt "A banana on the left of an apple." --size 1024
This command will generate an image based on the prompt "A banana on the left of an apple." using the Stable Diffusion XL model with an image size of 1024x1024 pixels.
The script will save two images:
- A standard image generated by the diffusion model.
- A golden image generated by the diffusion model with the NPNet.
Both images will be saved in the current directory with names based on the model and prompt.
We provide the pre-trained NPNet weights of Stable Diffusion XL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT with google drive
If you find our code useful for your research, please cite our paper.
@inproceedings{zhou2025golden,
title={Golden Noise for Diffusion Models: A Learning Framework},
author={Zikai Zhou and Shitong Shao and Lichen Bai and Shufei Zhang and Zhiqiang Xu and Bo Han and Zeke Xie},
booktitle={International Conference on Computer Vision},
year={2025},
}
We thank the community and contributors for their invaluable support in developing NPNet. We thank @DataCTE for constructing the ComfyUI of NPNet inference code ComfyUI. We thank @asagi4 for constructing the ComfyUI of NPNet inference code ComfyUI.