- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.5k
Open
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
Description
Describe the bug
Using the new 4k model fails with defaults values. Specifically with use_resolution_binning=True which is the default.
Traceback (most recent call last):
  File "/home/rockerboo/code/others/sana-diffusers/main.py", line 28, in <module>
    image = pipe(
            ^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/diffusers/pipelines/pag/pipeline_pag_sana.py", line 736, in __call__
    raise ValueError("Invalid sample size")
ValueError: Invalid sample size
Specifically https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pag/pipeline_pag_sana.py#L728-L736 limits the binning which doesn't support the 4k
https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers/blob/main/transformer/config.json#L20 the sample size is 128
Should just be a matter of adding the binning information for 4k.
Reproduction
https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers#1-how-to-use-sanapipeline-with-%F0%9F%A7%A8diffusers PAG or the non-PAG instructions here.
# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPipeline
pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers",
    variant="bf16",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)
# for 4096x4096 image generation OOM issue
if pipe.transformer.config.sample_size == 128:
    from patch_conv import convert_model
    pipe.vae = convert_model(pipe.vae, splits=32)
prompt = 'A cute πΌ eating π, ink drawing style'
image = pipe(
    prompt=prompt,
    height=4096,
    width=4096,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save("sana.png")Logs
A mixture of bf16 and non-bf16 filenames will be loaded.
Loaded bf16 filenames:
[vae/diffusion_pytorch_model.bf16.safetensors, transformer/diffusion_pytorch_model.bf16.safetensors, text_encoder/model.bf16-00002-of-00002.safetensors, text_encoder/model.bf16-00001-of-00002.safetensors]
Loaded non-bf16 filenames:
[transformer/diffusion_pytorch_model-00001-of-00002.safetensors, transformer/diffusion_pytorch_model-00002-of-00002.safetensors
If this behavior is not expected, please check your folder structure.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:01<00:00,  1.97it/s]
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:03<00:00,  1.65it/s]
Traceback (most recent call last):
  File "/home/rockerboo/code/others/sana-diffusers/main.py", line 28, in <module>
    image = pipe(
            ^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rockerboo/code/others/sana-diffusers/.venv/lib/python3.11/site-packages/diffusers/pipelines/pag/pipeline_pag_sana.py", line 736, in __call__
    raise ValueError("Invalid sample size")
ValueError: Invalid sample size
### System Info
- π€ Diffusers version: 0.33.0.dev0
- Platform: Linux-6.12.6-arch1-1-x86_64-with-glibc2.40
- Running on Google Colab?: No
- Python version: 3.11.10
- PyTorch version (GPU?): 2.4.0+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.47.1
- Accelerate version: 1.2.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 2080, 8192 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
### Who can help?
@yiyixuxu @DN6
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates