Releases: huggingface/diffusers
[Patch release] Make sure we install correct PEFT version
Small patch release to make sure the correct PEFT version is installed.
All commits
- Improve setup.py and add dependency check by @patrickvonplaten in #5826
v0.23.0: LCM LoRA, SDXL LCM, Consistency Decoder from DALL-E 3
LCM LoRA, LCM SDXL, Consistency Decoder
LCM LoRA
Latent Consistency Models (LCM) made quite the mark in the Stable Diffusion community by enabling ultra-fast inference. LCM author @luosiallen, alongside @patil-suraj and @dg845, managed to extend the LCM support for Stable Diffusion XL (SDXL) and pack everything into a LoRA.
The approach is called LCM LoRA.
Below is an example of using LCM LoRA, taking just 4 inference steps:
from diffusers import DiffusionPipeline, LCMScheduler
import torch
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
lcm_lora_id = "latent-consistency/lcm-lora-sdxl"
pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16", torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt = "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux"
image = pipe(
prompt=prompt,
num_inference_steps=4,
guidance_scale=1,
).images[0]You can combine the LoRA with Img2Img, Inpaint, ControlNet, ...
as well as with other LoRAs 🤯
👉 Checkpoints
📜 Docs
If you want to learn more about the approach, please have a look at the following:
LCM SDXL
Continuing the work of Latent Consistency Models (LCM), we've applied the approach to SDXL as well and give you SSD-1B and SDXL fine-tuned checkpoints.
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
import torch
unet = UNet2DConditionModel.from_pretrained(
"latent-consistency/lcm-sdxl",
torch_dtype=torch.float16,
variant="fp16",
)
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
generator = torch.manual_seed(0)
image = pipe(
prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=1.0
).images[0]👉 Checkpoints
📜 Docs
Consistency Decoder
OpenAI open-sourced the consistency decoder used in DALL-E 3. It improves the decoding part in the Stable Diffusion v1 family of models.
import torch
from diffusers import DiffusionPipeline, ConsistencyDecoderVAE
vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=pipe.torch_dtype)
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16
).to("cuda")
pipe("horse", generator=torch.manual_seed(0)).imagesFind the documentation here to learn more.
All commits
- [Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten in #5659
- post release (v0.22.0) by @sayakpaul in #5658
- Add Pixart to AUTO_TEXT2IMAGE_PIPELINES_MAPPING by @Beinsezii in #5664
- Update custom diffusion attn processor by @DN6 in #5663
- Model tests xformers fixes by @DN6 in #5679
- Update free model hooks by @DN6 in #5680
- Fix Basic Transformer Block by @DN6 in #5683
- Explicit torch/flax dependency check by @DN6 in #5673
- [PixArt-Alpha] fix
mask_featureso that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677 - Make sure DDPM and
diffuserscan be used without Transformers by @sayakpaul in #5668 - [PixArt-Alpha] Support non-square images by @sayakpaul in #5672
- Improve LCMScheduler by @dg845 in #5681
- [
Docs] Fix typos, improve, update at Using Diffusers' Task page by @StandardAI in #5611 - Replacing the nn.Mish activation function with a get_activation function. by @hi-sushanta in #5651
- speed up Shap-E fast test by @yiyixuxu in #5686
- Fix the misaligned pipeline usage in dreamshaper docstrings by @kirill-fedyanin in #5700
- Fixed is_safetensors_compatible() handling of windows path separators by @PhilLab in #5650
- [LCM] Fix img2img by @patrickvonplaten in #5698
- [PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695
- Fix styling issues by @patrickvonplaten in #5699
- Add adapter fusing + PEFT to the docs by @apolinario in #5662
- Fix prompt bug in AnimateDiff by @DN6 in #5702
- [Bugfix] fix error of peft lora when xformers enabled by @okotaku in #5697
- Install accelerate from PyPI in PR test runner by @DN6 in #5721
- consistency decoder by @williamberman in #5694
- Correct consist dec by @patrickvonplaten in #5722
- LCM Add Tests by @patrickvonplaten in #5707
- [LCM] add: locm docs. by @sayakpaul in #5723
- Add LCM Scripts by @patil-suraj in #5727
v0.22.3: Fix PixArtAlpha and LCM Image-to-Image pipelines
🐛 There were some sneaky bugs in the PixArt-Alpha and LCM Image-to-Image pipelines which have been fixed in this release.
All commits
- [LCM] Fix img2img by @patrickvonplaten in #5698
- [PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695
Patch Release v0.22.2: Fix Animate Diff, fix DDPM import, Pixart various
- Fix Basic Transformer Block by @DN6 in #5683
- [PixArt-Alpha] fix
mask_featureso that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677 - Make sure DDPM and
diffuserscan be used without Transformers by @sayakpaul in #5668 - [PixArt-Alpha] Support non-square images by @sayakpaul in #5672
Patch Release: Fix community vs. hub pipelines revision
- [Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten
v0.22.0: LCM, PixArt-Alpha, AnimateDiff, PEFT integration for LoRA, and more
Latent Consistency Models (LCM)
LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32)
# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)
prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4
images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).imagesRefer to the documentation to learn more.
LCM comes with both text-to-image and image-to-image pipelines and they were contributed by @luosiallen, @nagolinc, and @dg845.
PixArt-Alpha
PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.
It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.
Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the PixArtAlphaPipeline:
from diffusers import PixArtAlphaPipeline
import torch
pipeline_id = "PixArt-alpha/PixArt-XL-2-1024-MS"
pipeline = PixArtAlphaPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload()
prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("sahara.png")Check out the docs to learn more.
AnimateDiff
AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.
These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a MotionAdapter and a UNetMotionModel. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.
The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")You can convert an existing 2D UNet into a UNetMotionModel:
from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel
unet = UNetMotionModel()
# Load from an existing 2D UNet and MotionAdapter
unet2D = UNet2DConditionModel.from_pretrained("SG161222/Realistic_Vision_V5.1_noVAE", subfolder="unet")
motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load motion adapter here
unet_motion = UNetMotionModel.from_unet2d(unet2D, motion_adapter: Optional = None)
# Or load motion modules after init
unet_motion.load_motion_modules(motion_adapter)
# freeze all 2D UNet layers except for the motion modules for finetuning
unet_motion.freeze_unet2d_params()
# Save only motion modules
unet_motion.save_motion_module(<path to save model>, push_to_hub=True)AnimateDiff also comes with motion LoRA modules, letting you control subtleties:
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")Check out the documentation to learn more.
PEFT 🤝 Diffusers
There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 PEFT integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.
Here is an example of combining multiple LoRAs using this new integration:
from diffusers import DiffusionPipeline
import torch
pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")
# Load LoRA 1.
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
# Load LoRA 2.
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
# Combine the adapters.
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])
# Perform inference.
prompt = "toy_face of a hacker with a hoodie, pixel art"
image = pipe(
prompt, num_inference_steps=30, cross_attention_kwargs={"scale": 1.0}, generator=torch.manual_seed(0)
).images[0]
imageRefer to the documentation to learn more.
Community components with community pipelines
We have had support for community pipelines for a while now. This enables fast integration for pipelines we cannot directly integrate within the core codebase of the library. However, community pipelines always rely on the building blocks from Diffusers, which can be restrictive for advanced use cases.
To elevate this, we’re elevating community pipelines with community components starting this release 🤗 By specifying trust_remote_code=True and writing the pipeline repository in a specific way, users can customize their pipeline and component code as flexibly as possible:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"<change-username>/<change-id>", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")
prompt = "hello"
# Text embeds
prompt_embeds, negative_embeds = pipeline.encode_prompt(prompt)
# Keyframes generation (8x64x40, 2fps)
video_frames = pipeline(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
num_frames=8,
height=40,
width=64,
num_inference_steps=2,
guidance_scale=9.0,
output_type="pt"
).framesRefer to the [documentation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview#commu...
Patch Release: Fix Lora fusing/unfusing
- [Lora] fix lora fuse unfuse in #5003 by @patrickvonplaten
Patch Release: Fix LoRA attention processor for xformers.
- [LoRA, Xformers] Fix xformers lora by @patrickvonplaten in #5201
Patch Release: CPU offloading + Lora load/Text inv load & Multi Adapter
- [Textual inversion] Refactor textual inversion to make it cleaner by @patrickvonplaten in #5076
- t2i Adapter community member fix by @williamberman in #5090
- remove unused adapter weights in constructor by @williamberman in #5088
- [LoRA] don't break offloading for incompatible lora ckpts. by @sayakpaul in #5085
Patch Release v0.21.1: Fix import and config loading for `from_single_file`
- Fix model offload bug when key isn't present by @DN6 in #5030
- [Import] Don't force transformers to be installed by @patrickvonplaten in #5035
- allow loading of sd models from safetensors without online lookups using local config files by @vladmandic in #5019
- [Import] Add missing settings / Correct some dummy imports by @patrickvonplaten in #5036




