Skip to content

Commit 58da71b

Browse files
authored
Merge branch 'main' into ltx-rename
2 parents 59a3ba4 + d413881 commit 58da71b

File tree

81 files changed

+6380
-272
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+6380
-272
lines changed

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,8 @@
238238
title: Textual Inversion
239239
- local: api/loaders/unet
240240
title: UNet
241+
- local: api/loaders/transformer_sd3
242+
title: SD3Transformer2D
241243
- local: api/loaders/peft
242244
title: PEFT
243245
title: Loaders
@@ -400,6 +402,8 @@
400402
title: DiT
401403
- local: api/pipelines/flux
402404
title: Flux
405+
- local: api/pipelines/control_flux_inpaint
406+
title: FluxControlInpaint
403407
- local: api/pipelines/hunyuandit
404408
title: Hunyuan-DiT
405409
- local: api/pipelines/hunyuan_video

docs/source/en/api/attnprocessor.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,8 @@ An attention processor is a class for applying different types of attention mech
8686

8787
[[autodoc]] models.attention_processor.IPAdapterAttnProcessor2_0
8888

89+
[[autodoc]] models.attention_processor.SD3IPAdapterJointAttnProcessor2_0
90+
8991
## JointAttnProcessor2_0
9092

9193
[[autodoc]] models.attention_processor.JointAttnProcessor2_0

docs/source/en/api/loaders/ip_adapter.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,12 @@ Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading]
2424

2525
[[autodoc]] loaders.ip_adapter.IPAdapterMixin
2626

27+
## SD3IPAdapterMixin
28+
29+
[[autodoc]] loaders.ip_adapter.SD3IPAdapterMixin
30+
- all
31+
- is_ip_adapter_active
32+
2733
## IPAdapterMaskProcessor
2834

2935
[[autodoc]] image_processor.IPAdapterMaskProcessor
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# SD3Transformer2D
14+
15+
This class is useful when *only* loading weights into a [`SD3Transformer2DModel`]. If you need to load weights into the text encoder or a text encoder and SD3Transformer2DModel, check [`SD3LoraLoaderMixin`](lora#diffusers.loaders.SD3LoraLoaderMixin) class instead.
16+
17+
The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.
18+
19+
<Tip>
20+
21+
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
22+
23+
</Tip>
24+
25+
## SD3Transformer2DLoadersMixin
26+
27+
[[autodoc]] loaders.transformer_sd3.SD3Transformer2DLoadersMixin
28+
- all
29+
- _load_ip_adapter_weights
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
<!--Copyright 2024 The HuggingFace Team, The Black Forest Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# FluxControlInpaint
14+
15+
FluxControlInpaintPipeline is an implementation of Inpainting for Flux.1 Depth/Canny models. It is a pipeline that allows you to inpaint images using the Flux.1 Depth/Canny models. The pipeline takes an image and a mask as input and returns the inpainted image.
16+
17+
FLUX.1 Depth and Canny [dev] is a 12 billion parameter rectified flow transformer capable of generating an image based on a text description while following the structure of a given input image. **This is not a ControlNet model**.
18+
19+
| Control type | Developer | Link |
20+
| -------- | ---------- | ---- |
21+
| Depth | [Black Forest Labs](https://huggingface.co/black-forest-labs) | [Link](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) |
22+
| Canny | [Black Forest Labs](https://huggingface.co/black-forest-labs) | [Link](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) |
23+
24+
25+
<Tip>
26+
27+
Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).
28+
29+
</Tip>
30+
31+
```python
32+
import torch
33+
from diffusers import FluxControlInpaintPipeline
34+
from diffusers.models.transformers import FluxTransformer2DModel
35+
from transformers import T5EncoderModel
36+
from diffusers.utils import load_image, make_image_grid
37+
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
38+
from PIL import Image
39+
import numpy as np
40+
41+
pipe = FluxControlInpaintPipeline.from_pretrained(
42+
"black-forest-labs/FLUX.1-Depth-dev",
43+
torch_dtype=torch.bfloat16,
44+
)
45+
# use following lines if you have GPU constraints
46+
# ---------------------------------------------------------------
47+
transformer = FluxTransformer2DModel.from_pretrained(
48+
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="transformer", torch_dtype=torch.bfloat16
49+
)
50+
text_encoder_2 = T5EncoderModel.from_pretrained(
51+
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="text_encoder_2", torch_dtype=torch.bfloat16
52+
)
53+
pipe.transformer = transformer
54+
pipe.text_encoder_2 = text_encoder_2
55+
pipe.enable_model_cpu_offload()
56+
# ---------------------------------------------------------------
57+
pipe.to("cuda")
58+
59+
prompt = "a blue robot singing opera with human-like expressions"
60+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
61+
62+
head_mask = np.zeros_like(image)
63+
head_mask[65:580,300:642] = 255
64+
mask_image = Image.fromarray(head_mask)
65+
66+
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
67+
control_image = processor(image)[0].convert("RGB")
68+
69+
output = pipe(
70+
prompt=prompt,
71+
image=image,
72+
control_image=control_image,
73+
mask_image=mask_image,
74+
num_inference_steps=30,
75+
strength=0.9,
76+
guidance_scale=10.0,
77+
generator=torch.Generator().manual_seed(42),
78+
).images[0]
79+
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")
80+
```
81+
82+
## FluxControlInpaintPipeline
83+
[[autodoc]] FluxControlInpaintPipeline
84+
- all
85+
- __call__
86+
87+
88+
## FluxPipelineOutput
89+
[[autodoc]] pipelines.flux.pipeline_output.FluxPipelineOutput

docs/source/en/api/pipelines/flux.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,43 @@ images = pipe(
268268
images[0].save("flux-redux.png")
269269
```
270270

271+
## Combining Flux Turbo LoRAs with Flux Control, Fill, and Redux
272+
273+
We can combine Flux Turbo LoRAs with Flux Control and other pipelines like Fill and Redux to enable few-steps' inference. The example below shows how to do that for Flux Control LoRA for depth and turbo LoRA from [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).
274+
275+
```py
276+
from diffusers import FluxControlPipeline
277+
from image_gen_aux import DepthPreprocessor
278+
from diffusers.utils import load_image
279+
from huggingface_hub import hf_hub_download
280+
import torch
281+
282+
control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
283+
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
284+
control_pipe.load_lora_weights(
285+
hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
286+
)
287+
control_pipe.set_adapters(["depth", "hyper-sd"], adapter_weights=[0.85, 0.125])
288+
control_pipe.enable_model_cpu_offload()
289+
290+
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
291+
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
292+
293+
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
294+
control_image = processor(control_image)[0].convert("RGB")
295+
296+
image = control_pipe(
297+
prompt=prompt,
298+
control_image=control_image,
299+
height=1024,
300+
width=1024,
301+
num_inference_steps=8,
302+
guidance_scale=10.0,
303+
generator=torch.Generator().manual_seed(42),
304+
).images[0]
305+
image.save("output.png")
306+
```
307+
271308
## Running FP16 inference
272309

273310
Flux can generate high-quality images with FP16 (i.e. to accelerate inference on Turing/Volta GPUs) but produces different outputs compared to FP32/BF16. The issue is that some activations in the text encoders have to be clipped when running in FP16, which affects the overall image. Forcing text encoders to run with FP32 inference thus removes this output difference. See [here](https://github.com/huggingface/diffusers/pull/9097#issuecomment-2272292516) for details.

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,44 @@ pipe = LTXImageToVideoPipeline.from_single_file(
6161
)
6262
```
6363

64+
Loading [LTX GGUF checkpoints](https://huggingface.co/city96/LTX-Video-gguf) are also supported:
65+
66+
```py
67+
import torch
68+
from diffusers.utils import export_to_video
69+
from diffusers import LTXPipeline, LTXVideoTransformer3DModel, GGUFQuantizationConfig
70+
71+
ckpt_path = (
72+
"https://huggingface.co/city96/LTX-Video-gguf/blob/main/ltx-video-2b-v0.9-Q3_K_S.gguf"
73+
)
74+
transformer = LTXVideoTransformer3DModel.from_single_file(
75+
ckpt_path,
76+
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
77+
torch_dtype=torch.bfloat16,
78+
)
79+
pipe = LTXPipeline.from_pretrained(
80+
"Lightricks/LTX-Video",
81+
transformer=transformer,
82+
torch_dtype=torch.bfloat16,
83+
)
84+
pipe.enable_model_cpu_offload()
85+
86+
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
87+
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
88+
89+
video = pipe(
90+
prompt=prompt,
91+
negative_prompt=negative_prompt,
92+
width=704,
93+
height=480,
94+
num_frames=161,
95+
num_inference_steps=50,
96+
).frames[0]
97+
export_to_video(video, "output_gguf_ltx.mp4", fps=24)
98+
```
99+
100+
Make sure to read the [documentation on GGUF](../../quantization/gguf) to learn more about our GGUF support.
101+
64102
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
65103

66104
## LTXPipeline

0 commit comments

Comments
 (0)