-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
I'm using StableDiffusionPipeline with lpw_stable_diffusion to do txt2img generations from long prompts.
When loading the safetensors checkpoint file using .from_single_file, prompts longer than 77 tokens still get truncated.
However, after converting the checkpoint to diffusers format with your convert_from_ckpt script and loading its path with .from_pretrained, prompts seem to get handled properly with no truncation.
Reproduction
- Install requirements, do imports, download checkpoint from Civitai, set generation prompts and parameters
Model used for test: https://civitai.com/models/25694/epicrealism
!pip install -U transformers
!pip install -U accelerate
!pip install -U diffusers
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.pipelines.stable_diffusion.convert_from_ckpt import download_from_original_stable_diffusion_ckpt
!wget -O model.safetensors 'https://civitai.com/api/download/models/143906?type=Model&format=SafeTensor&size=pruned&fp=fp16'
#repeated tokens, just for example
prompt = 'polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, polaroid photo, night photo, photo of 24 y.o beautiful woman, pale skin, bokeh, motion blur, '
negative_prompt = '(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers:1.4), (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation, ugly, unrealistic anatomy, deformed face, deformed limbs, deformed head, bad proportions, incomplete, (deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers:1.4), (deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation, ugly, unrealistic anatomy, deformed face, deformed limbs, deformed head, bad proportions, incomplete'
width=960
height=1440
num_inference_steps=25
guidance_scale=7- Load .safetensors checkpoint with
StableDiffusionPipeline.from_single_file, generate image
pipeline = StableDiffusionPipeline.from_single_file(
'model.safetensors',
custom_pipeline='lpw_stable_diffusion',
torch_dtype=torch.float16,
)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)
pipeline.to('cuda')
image = pipeline(
prompt = prompt,
negative_prompt = negative_prompt,
width=width,
height=height,
num_inference_steps=num_inference_steps,
num_images_per_prompt=1,
guidance_scale=guidance_scale,
max_embeddings_multiples = 6,
generator=torch.Generator(device='cuda').manual_seed(0)
).images[0]
imageOutput:

The prompt gets truncated.
- Convert the checkpoint file to diffusers format, load path with
StableDiffusionPipeline.from_pretrained, generate image
p = download_from_original_stable_diffusion_ckpt(
checkpoint_path_or_dict='model.safetensors',
from_safetensors=True,
)
p.save_pretrained('converted_model', safe_serialization=False)
pipeline = StableDiffusionPipeline.from_pretrained(
'converted_model',
custom_pipeline='lpw_stable_diffusion',
torch_dtype=torch.float16,
)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)
pipeline.to('cuda')
image = pipeline(
prompt = prompt,
negative_prompt = negative_prompt,
width=width,
height=height,
num_inference_steps=num_inference_steps,
num_images_per_prompt=1,
guidance_scale=guidance_scale,
max_embeddings_multiples = 6,
generator=torch.Generator(device='cuda').manual_seed(0)
).images[0]
imageNote that the "Token indices sequence length" is a false alarm according to the community pipeline documentation.
System Info
Google Colab clean environment with T4 GPU
transformers 4.40.2
diffusers 0.27.2
Who can help?
@DN6, maybe @SkyTNT?
I think the handling of long prompts is a great feature for diffusers pipelines, I hope it gets mantained over time :)
