Skip to content

Prompt adherence for FluxPipeline is broken #11536

@dxqb

Description

@dxqb

Describe the bug

For prompts much shorter than the max_sequence_length (which is most prompts, because default 512), prompts are not followed because the attention calculation spends most of the attention to the padding tokens of the encoder hidden state.

Prompt: Portrait photo of an angry man
First picture: Flux pipeline with defaults
Second: No attention to padding; here, using the workaround of setting a short max_sequence_length. A generic implementation would use attention masking (and not always use the max_sequence_length but only pad up to the longest prompt in the batch)

Other model pipelines might be affected similarly.

This has been discussed here #10194 and other issues in more detail, but I decided to create a separate issue to bring to your attention that this also affects default use of the pipeline, not only finetuning.

Image

Image

Reproduction

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
).to('cuda')

prompt = "Portrait photo of an angry man"
prompt_T5_token_length = 7

generator = torch.Generator(device='cuda').manual_seed(1)
images = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    height=1024,
    width=1024,
    num_inference_steps=20,
    generator=generator,
).images

images[0].save(f"attention-to-padding.jpg")

generator = torch.Generator(device='cuda').manual_seed(1)
images = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    height=1024,
    width=1024,
    num_inference_steps=20,
    generator=generator,
    max_sequence_length = prompt_T5_token_length,
).images

images[0].save(f"no-attention-to-padding.jpg")

Logs

System Info

certifi==2025.4.26
charset-normalizer==3.4.2
diffusers==0.33.1
filelock==3.18.0
fsspec==2025.3.2
hf-xet==1.1.0
huggingface-hub==0.31.1
idna==3.10
importlib_metadata==8.7.0
Jinja2==3.1.6
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.5
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
packaging==25.0
pillow==11.2.1
protobuf==6.30.2
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.3
sentencepiece==0.2.0
sympy==1.14.0
tokenizers==0.21.1
torch==2.7.0
tqdm==4.67.1
transformers==4.51.3
triton==3.3.0
typing_extensions==4.13.2
urllib3==2.4.0
zipp==3.21.0

Who can help?

@DN6 @yiyixuxu @sayakpaul

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions