GitHub - xtudbxk/FreCaS: The public source code of "FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling"

⭐ Update

2025.02.24: Accepted by ICLR2025. We update the code for SD21 and SD3 pretrained models.

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling (ICLR2025)

Zhengqiang Zhang^1,2 | Ruihuang Li ^1,2 | Lei Zhang^1,2

¹The Hong Kong Polytechnic University, ² The PolyU-OPPO Joint Innovation Lab

ABSTRACT While image generation with diffusion models has achieved a great success, generating images of higher resolution than the training size remains a challenging task due to the high computational cost. Current methods typically perform the entire sampling process at full resolution and process all frequency components simultaneously, contradicting with the inherent coarse-to-fine nature of latent diffusion models and wasting computations on processing premature high-frequency details at early diffusion stages. To address this issue, we introduce an efficient Frequency-aware Cascaded Sampling framework, FreCaS in short, for higher-resolution image generation. FreCaS decomposes the sampling process into cascaded stages with gradually increased resolutions, progressively expanding frequency bands and refining the corresponding details. We propose an innovative frequency-aware classifier-free guidance (FA-CFG) strategy to assign different guidance strengths for different frequency components, directing the diffusion model to add new details in the expanded frequency domain of each stage. Additionally, we fuse the cross-attention maps of previous and current stages to avoid synthesizing unfaithful layouts. Experiments demonstrate that FreCaS significantly outperforms state-of-the-art methods in image quality and generation speed. In particular, FreCaS is about 2.86× and 6.07× faster than ScaleCrafter and DemoFusion in generating a 2048×2048 image using a pre-trained SDXL model and achieves an FID_b improvement of 11.6 and 3.7, respectively. FreCaS can be easily extended to more complex models such as SD3.

FreCaS && FA-CFG

(a) The overall framework of FreCaS. The entire T-step sampling process is divided into N+1 stages of increasing resolutions and expanding frequency bands. FreCaS starts the sampling process at the training size and obtains the last latent z^s₀_L at that stage. Then, FreCaS continues the sampling from the first latent z^s₁_F at the next stage with a larger resolution and expanded frequency domain. This procedure is repeated until the final latent z^s_N₀ at stage N is obtained. A decoder is then used to generate the final image.

(b) FA-CFG strategy. FreCaS leverages the FA-CFG strategy to direct diffusion models more on generating contents of expanded frequency bands. We separate the original denoising scores into low-frequency and high-frequency components and assign a higher CFG strength to the high-frequency part. The two parts are then combined to obtain the final denoising score .

HOW TO USE

Download pre-trained models

Please download the pre-trained models from huggingface.

The mode of SD21: https://huggingface.co/stabilityai/stable-diffusion-2-1

The model of SDXL: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

The model of SD3: https://huggingface.co/stabilityai/stable-diffusion-3-medium

Then, just modify the paths in "main.py".

    pipeline_paths = {
        "sd21": "/home/notebook/data/sharedgroup/RG_YLab/aigc_share_group_data/LowLevelLLM/models/diffusion_models/stable-diffusion-2-1-base",
        "sdxl": "/home/notebook/data/group/LowLevelLLM/models/diffusion_models/stable-diffusion-xl-base-1.0",
        "sd3": "/home/notebook/data/group/LowLevelLLM/models/diffusion_models/stable-diffusion-3-medium-diffusers",
    }

Inference

Please use the following command to generate higher-resolution images:

python3 main.py 
    --name [sd21/sdxl/sd3] # the name of pre-trained models
    --output xxx # output dir
    --prompts ["xxx"|prompts.lst] # a string or prompt file
    --images-per-prompt xx #
    --gs xxx # the guidance scale for CFG in the first stage
    
    # settings for FreCaS
    --tsize [[h1,w1],[h2,w2]] # the sizes of each stage
    --msp_endtimes t1 t2 # the end timestep of each stage
    --msp_steps step1 step2 # the inference steps for each stage
    
    # settings for FA-CFG
    --facfg_weight wh wl # the guidance strength of high-/low-frequency compoents, respectively
    
    # settings for CA-maps reuse
    --camap_weight wc # the wc for CA-map reuse

One example of generating figures of 2048x2048 using SDXL models.
`python3 main.py --gs 7.5 --prompts prompts.lst --tsize [[1024,1024],[2048,2048]] --msp_endtimes 200 0 --msp_steps 40 10 --msp_gamma 1.5 --name sdxl --images-per-prompt 1 --facfg_weight 25.0 7.5 --camap_weight 0.8 --output results`

You can adjust the steps/endtimestep of each stages, the guidance strength of each frequency components w_h and w_l, or CA-maps reutilization weight w_c to seek a better tradeoff between quality and efficiency.

CITATION

@inproceedings{zhang2024frecas,
  title={FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling},
  author={Zhang, Zhengqiang and Li, Ruihuang and Zhang, Lei},
  booktitle={The Thirteenth International Conference on Learning Representations}
}

CONTACT

Please leave a issue or contact zhengqiang with [email protected]

License

This project is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
figures		figures
utils		utils
.gitignore		.gitignore
FreCaS_high.pdf		FreCaS_high.pdf
main.py		main.py
prompts.lst		prompts.lst
readme.md		readme.md
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⭐ Update

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling (ICLR2025)

FreCaS && FA-CFG

HOW TO USE

Download pre-trained models

Inference

CITATION

CONTACT

License

About

Uh oh!

Releases

Packages

Languages

xtudbxk/FreCaS

Folders and files

Latest commit

History

Repository files navigation

⭐ Update

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling (ICLR2025)

FreCaS && FA-CFG

HOW TO USE

Download pre-trained models

Inference

CITATION

CONTACT

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages