Skip to content

Sana 4k with use_resolution_binning not supported due to sample_size 128Β #10514

@rockerBOO

Description

@rockerBOO

Describe the bug

Using the new 4k model fails with defaults values. Specifically with use_resolution_binning=True which is the default.

Traceback (most recent call last):
  File "/home/rockerboo/code/others/sana-diffusers/main.py", line 28, in <module>
    image = pipe(
            ^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/diffusers/pipelines/pag/pipeline_pag_sana.py", line 736, in __call__
    raise ValueError("Invalid sample size")
ValueError: Invalid sample size

Specifically https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pipeline_pag_sana.py#L728-L736 limits the binning which doesn't support the 4k

https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers/blob/main/transformer/config.json#L20 the sample size is 128

Should just be a matter of adding the binning information for 4k.

Reproduction

https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers#1-how-to-use-sanapipeline-with-%F0%9F%A7%A8diffusers PAG or the non-PAG instructions here.

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers",
    variant="bf16",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

# for 4096x4096 image generation OOM issue
if pipe.transformer.config.sample_size == 128:
    from patch_conv import convert_model
    pipe.vae = convert_model(pipe.vae, splits=32)

prompt = 'A cute 🐼 eating πŸŽ‹, ink drawing style'
image = pipe(
    prompt=prompt,
    height=4096,
    width=4096,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]

image[0].save("sana.png")

Logs

A mixture of bf16 and non-bf16 filenames will be loaded.
Loaded bf16 filenames:
[vae/diffusion_pytorch_model.bf16.safetensors, transformer/diffusion_pytorch_model.bf16.safetensors, text_encoder/model.bf16-00002-of-00002.safetensors, text_encoder/model.bf16-00001-of-00002.safetensors]
Loaded non-bf16 filenames:
[transformer/diffusion_pytorch_model-00001-of-00002.safetensors, transformer/diffusion_pytorch_model-00002-of-00002.safetensors
If this behavior is not expected, please check your folder structure.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00,  1.97it/s]
Loading pipeline components...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:03<00:00,  1.65it/s]
Traceback (most recent call last):
  File "/home/rockerboo/code/others/sana-diffusers/main.py", line 28, in <module>
    image = pipe(
            ^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/diffusers/pipelines/pag/pipeline_pag_sana.py", line 736, in __call__
    raise ValueError("Invalid sample size")
ValueError: Invalid sample size


### System Info

- πŸ€— Diffusers version: 0.33.0.dev0
- Platform: Linux-6.12.6-arch1-1-x86_64-with-glibc2.40
- Running on Google Colab?: No
- Python version: 3.11.10
- PyTorch version (GPU?): 2.4.0+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.47.1
- Accelerate version: 1.2.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 2080, 8192 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

@yiyixuxu @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions