Skip to content

Commit aa7659b

Browse files
committed
Merge branch 'main' into gguf-support
2 parents 4c0360a + 65ab105 commit aa7659b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2150
-1366
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,8 @@ Check out the [Quickstart](https://huggingface.co/docs/diffusers/quicktour) to l
112112
| **Documentation** | **What can I learn?** |
113113
|---------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
114114
| [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
115-
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
116-
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
115+
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
116+
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/overview_techniques) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
117117
| [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
118118
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
119119
## Contribution

docs/source/en/using-diffusers/loading_adapters.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,14 +134,16 @@ The [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] method loads L
134134
- the LoRA weights don't have separate identifiers for the UNet and text encoder
135135
- the LoRA weights have separate identifiers for the UNet and text encoder
136136

137-
But if you only need to load LoRA weights into the UNet, then you can use the [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method. Let's load the [jbilcke-hf/sdxl-cinematic-1](https://huggingface.co/jbilcke-hf/sdxl-cinematic-1) LoRA:
137+
To directly load (and save) a LoRA adapter at the *model-level*, use [`~PeftAdapterMixin.load_lora_adapter`], which builds and prepares the necessary model configuration for the adapter. Like [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`], [`PeftAdapterMixin.load_lora_adapter`] can load LoRAs for both the UNet and text encoder. For example, if you're loading a LoRA for the UNet, [`PeftAdapterMixin.load_lora_adapter`] ignores the keys for the text encoder.
138+
139+
Use the `weight_name` parameter to specify the specific weight file and the `prefix` parameter to filter for the appropriate state dicts (`"unet"` in this case) to load.
138140

139141
```py
140142
from diffusers import AutoPipelineForText2Image
141143
import torch
142144

143145
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
144-
pipeline.unet.load_attn_procs("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors")
146+
pipeline.unet.load_lora_adapter("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", prefix="unet")
145147

146148
# use cnmt in the prompt to trigger the LoRA
147149
prompt = "A cute cnmt eating a slice of pizza, stunning color scheme, masterpiece, illustration"
@@ -153,6 +155,8 @@ image
153155
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_attn_proc.png" />
154156
</div>
155157

158+
Save an adapter with [`~PeftAdapterMixin.save_lora_adapter`].
159+
156160
To unload the LoRA weights, use the [`~loaders.StableDiffusionLoraLoaderMixin.unload_lora_weights`] method to discard the LoRA weights and restore the model to its original weights:
157161

158162
```py

src/diffusers/loaders/ip_adapter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ def load_ip_adapter(
187187
state_dict = pretrained_model_name_or_path_or_dict
188188

189189
keys = list(state_dict.keys())
190-
if keys != ["image_proj", "ip_adapter"]:
190+
if "image_proj" not in keys and "ip_adapter" not in keys:
191191
raise ValueError("Required keys are (`image_proj` and `ip_adapter`) missing from the state dict.")
192192

193193
state_dicts.append(state_dict)

src/diffusers/models/attention_processor.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1908,7 +1908,9 @@ def __call__(
19081908
query = apply_rotary_emb(query, image_rotary_emb)
19091909
key = apply_rotary_emb(key, image_rotary_emb)
19101910

1911-
hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False)
1911+
hidden_states = F.scaled_dot_product_attention(
1912+
query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
1913+
)
19121914
hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
19131915
hidden_states = hidden_states.to(query.dtype)
19141916

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,7 @@ def create_forward(*inputs):
433433
hidden_states,
434434
temb,
435435
zq,
436-
conv_cache=conv_cache.get(conv_cache_key),
436+
conv_cache.get(conv_cache_key),
437437
)
438438
else:
439439
hidden_states, new_conv_cache[conv_cache_key] = resnet(
@@ -531,7 +531,7 @@ def create_forward(*inputs):
531531
return create_forward
532532

533533
hidden_states, new_conv_cache[conv_cache_key] = torch.utils.checkpoint.checkpoint(
534-
create_custom_forward(resnet), hidden_states, temb, zq, conv_cache=conv_cache.get(conv_cache_key)
534+
create_custom_forward(resnet), hidden_states, temb, zq, conv_cache.get(conv_cache_key)
535535
)
536536
else:
537537
hidden_states, new_conv_cache[conv_cache_key] = resnet(
@@ -649,7 +649,7 @@ def create_forward(*inputs):
649649
hidden_states,
650650
temb,
651651
zq,
652-
conv_cache=conv_cache.get(conv_cache_key),
652+
conv_cache.get(conv_cache_key),
653653
)
654654
else:
655655
hidden_states, new_conv_cache[conv_cache_key] = resnet(
@@ -789,7 +789,7 @@ def custom_forward(*inputs):
789789
hidden_states,
790790
temb,
791791
None,
792-
conv_cache=conv_cache.get(conv_cache_key),
792+
conv_cache.get(conv_cache_key),
793793
)
794794

795795
# 2. Mid
@@ -798,14 +798,14 @@ def custom_forward(*inputs):
798798
hidden_states,
799799
temb,
800800
None,
801-
conv_cache=conv_cache.get("mid_block"),
801+
conv_cache.get("mid_block"),
802802
)
803803
else:
804804
# 1. Down
805805
for i, down_block in enumerate(self.down_blocks):
806806
conv_cache_key = f"down_block_{i}"
807807
hidden_states, new_conv_cache[conv_cache_key] = down_block(
808-
hidden_states, temb, None, conv_cache=conv_cache.get(conv_cache_key)
808+
hidden_states, temb, None, conv_cache.get(conv_cache_key)
809809
)
810810

811811
# 2. Mid
@@ -953,7 +953,7 @@ def custom_forward(*inputs):
953953
hidden_states,
954954
temb,
955955
sample,
956-
conv_cache=conv_cache.get("mid_block"),
956+
conv_cache.get("mid_block"),
957957
)
958958

959959
# 2. Up
@@ -964,7 +964,7 @@ def custom_forward(*inputs):
964964
hidden_states,
965965
temb,
966966
sample,
967-
conv_cache=conv_cache.get(conv_cache_key),
967+
conv_cache.get(conv_cache_key),
968968
)
969969
else:
970970
# 1. Mid
@@ -1476,7 +1476,7 @@ def forward(
14761476
z = posterior.sample(generator=generator)
14771477
else:
14781478
z = posterior.mode()
1479-
dec = self.decode(z)
1479+
dec = self.decode(z).sample
14801480
if not return_dict:
14811481
return (dec,)
1482-
return dec
1482+
return DecoderOutput(sample=dec)

src/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -229,14 +229,6 @@ def __init__(
229229

230230
self.quant_conv = nn.Conv2d(2 * latent_channels, 2 * latent_channels, 1)
231231

232-
sample_size = (
233-
self.config.sample_size[0]
234-
if isinstance(self.config.sample_size, (list, tuple))
235-
else self.config.sample_size
236-
)
237-
self.tile_latent_min_size = int(sample_size / (2 ** (len(self.config.block_out_channels) - 1)))
238-
self.tile_overlap_factor = 0.25
239-
240232
def _set_gradient_checkpointing(self, module, value=False):
241233
if isinstance(module, (Encoder, TemporalDecoder)):
242234
module.gradient_checkpointing = value

src/diffusers/models/autoencoders/autoencoder_tiny.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,9 @@ def decode(
310310
self, x: torch.Tensor, generator: Optional[torch.Generator] = None, return_dict: bool = True
311311
) -> Union[DecoderOutput, Tuple[torch.Tensor]]:
312312
if self.use_slicing and x.shape[0] > 1:
313-
output = [self._tiled_decode(x_slice) if self.use_tiling else self.decoder(x) for x_slice in x.split(1)]
313+
output = [
314+
self._tiled_decode(x_slice) if self.use_tiling else self.decoder(x_slice) for x_slice in x.split(1)
315+
]
314316
output = torch.cat(output)
315317
else:
316318
output = self._tiled_decode(x) if self.use_tiling else self.decoder(x)
@@ -341,7 +343,7 @@ def forward(
341343
# as if we were loading the latents from an RGBA uint8 image.
342344
unscaled_enc = self.unscale_latents(scaled_enc / 255.0)
343345

344-
dec = self.decode(unscaled_enc)
346+
dec = self.decode(unscaled_enc).sample
345347

346348
if not return_dict:
347349
return (dec,)

src/diffusers/models/controlnets/controlnet_sd3.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -393,13 +393,19 @@ def custom_forward(*inputs):
393393
return custom_forward
394394

395395
ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False} if is_torch_version(">=", "1.11.0") else {}
396-
encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
397-
create_custom_forward(block),
398-
hidden_states,
399-
encoder_hidden_states,
400-
temb,
401-
**ckpt_kwargs,
402-
)
396+
if self.context_embedder is not None:
397+
encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
398+
create_custom_forward(block),
399+
hidden_states,
400+
encoder_hidden_states,
401+
temb,
402+
**ckpt_kwargs,
403+
)
404+
else:
405+
# SD3.5 8b controlnet use single transformer block, which does not use `encoder_hidden_states`
406+
hidden_states = torch.utils.checkpoint.checkpoint(
407+
create_custom_forward(block), hidden_states, temb, **ckpt_kwargs
408+
)
403409

404410
else:
405411
if self.context_embedder is not None:

src/diffusers/models/model_loading_utils.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,8 @@ def load_model_dict_into_meta(
182182
hf_quantizer=None,
183183
keep_in_fp32_modules=None,
184184
) -> List[str]:
185+
if device is not None and not isinstance(device, (str, torch.device)):
186+
raise ValueError(f"Expected device to have type `str` or `torch.device`, but got {type(device)=}.")
185187
device = device or torch.device("cpu")
186188
dtype = dtype or torch.float32
187189
is_quantized = hf_quantizer is not None

src/diffusers/models/modeling_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -836,7 +836,7 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
836836
param_device = "cpu"
837837
# TODO (sayakpaul, SunMarc): remove this after model loading refactor
838838
elif is_quant_method_bnb:
839-
param_device = torch.cuda.current_device()
839+
param_device = torch.device(torch.cuda.current_device())
840840
state_dict = load_state_dict(model_file, variant=variant)
841841
model._convert_deprecated_attention_blocks(state_dict)
842842

0 commit comments

Comments
 (0)