Skip to content

Commit feb8064

Browse files
authored
Merge branch 'main' into main
2 parents 8f18aae + 5897137 commit feb8064

File tree

6 files changed

+235
-13
lines changed

6 files changed

+235
-13
lines changed

docs/source/en/api/pipelines/flux.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,53 @@ image.save("output.png")
309309

310310
When unloading the Control LoRA weights, call `pipe.unload_lora_weights(reset_to_overwritten_params=True)` to reset the `pipe.transformer` completely back to its original form. The resultant pipeline can then be used with methods like [`DiffusionPipeline.from_pipe`]. More details about this argument are available in [this PR](https://github.com/huggingface/diffusers/pull/10397).
311311

312+
## IP-Adapter
313+
314+
<Tip>
315+
316+
Check out [IP-Adapter](../../../using-diffusers/ip_adapter) to learn more about how IP-Adapters work.
317+
318+
</Tip>
319+
320+
An IP-Adapter lets you prompt Flux with images, in addition to the text prompt. This is especially useful when describing complex concepts that are difficult to articulate through text alone and you have reference images.
321+
322+
```python
323+
import torch
324+
from diffusers import FluxPipeline
325+
from diffusers.utils import load_image
326+
327+
pipe = FluxPipeline.from_pretrained(
328+
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
329+
).to("cuda")
330+
331+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flux_ip_adapter_input.jpg").resize((1024, 1024))
332+
333+
pipe.load_ip_adapter(
334+
"XLabs-AI/flux-ip-adapter",
335+
weight_name="ip_adapter.safetensors",
336+
image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
337+
)
338+
pipe.set_ip_adapter_scale(1.0)
339+
340+
image = pipe(
341+
width=1024,
342+
height=1024,
343+
prompt="wearing sunglasses",
344+
negative_prompt="",
345+
true_cfg=4.0,
346+
generator=torch.Generator().manual_seed(4444),
347+
ip_adapter_image=image,
348+
).images[0]
349+
350+
image.save('flux_ip_adapter_output.jpg')
351+
```
352+
353+
<div class="justify-center">
354+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flux_ip_adapter_output.jpg"/>
355+
<figcaption class="mt-2 text-sm text-center text-gray-500">IP-Adapter examples with prompt "wearing sunglasses"</figcaption>
356+
</div>
357+
358+
312359
## Running FP16 inference
313360

314361
Flux can generate high-quality images with FP16 (i.e. to accelerate inference on Turing/Volta GPUs) but produces different outputs compared to FP32/BF16. The issue is that some activations in the text encoders have to be clipped when running in FP16, which affects the overall image. Forcing text encoders to run with FP32 inference thus removes this output difference. See [here](https://github.com/huggingface/diffusers/pull/9097#issuecomment-2272292516) for details.

docs/source/en/installation.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,60 @@ You should install 🤗 Diffusers in a [virtual environment](https://docs.python
2323
If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
2424
A virtual environment makes it easier to manage different projects and avoid compatibility issues between dependencies.
2525

26-
Start by creating a virtual environment in your project directory:
26+
Create a virtual environment with Python or [uv](https://docs.astral.sh/uv/) (refer to [Installation](https://docs.astral.sh/uv/getting-started/installation/) for installation instructions), a fast Rust-based Python package and project manager.
27+
28+
<hfoptions id="install">
29+
<hfoption id="uv">
2730

2831
```bash
29-
python -m venv .env
32+
uv venv my-env
33+
source my-env/bin/activate
3034
```
3135

32-
Activate the virtual environment:
36+
</hfoption>
37+
<hfoption id="Python">
3338

3439
```bash
35-
source .env/bin/activate
40+
python -m venv my-env
41+
source my-env/bin/activate
3642
```
3743

38-
You should also install 🤗 Transformers because 🤗 Diffusers relies on its models:
44+
</hfoption>
45+
</hfoptions>
46+
47+
You should also install 🤗 Transformers because 🤗 Diffusers relies on its models.
3948

4049

4150
<frameworkcontent>
4251
<pt>
43-
Note - PyTorch only supports Python 3.8 - 3.11 on Windows.
52+
53+
PyTorch only supports Python 3.8 - 3.11 on Windows. Install Diffusers with uv.
54+
55+
```bash
56+
uv install diffusers["torch"] transformers
57+
```
58+
59+
You can also install Diffusers with pip.
60+
4461
```bash
4562
pip install diffusers["torch"] transformers
4663
```
64+
4765
</pt>
4866
<jax>
67+
68+
Install Diffusers with uv.
69+
70+
```bash
71+
uv pip install diffusers["flax"] transformers
72+
```
73+
74+
You can also install Diffusers with pip.
75+
4976
```bash
5077
pip install diffusers["flax"] transformers
5178
```
79+
5280
</jax>
5381
</frameworkcontent>
5482

docs/source/en/optimization/para_attn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ However, it is hard to decide when to reuse the cache to ensure quality generate
2929
This achieves a 2x speedup on FLUX.1-dev and HunyuanVideo inference with very good quality.
3030

3131
<figure>
32-
<img src="https://huggingface.co/datasets/chengzeyi/documentation-images/resolve/main/diffusers/para-attn/ada-cache.png" alt="Cache in Diffusion Transformer" />
32+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/para-attn/ada-cache.png" alt="Cache in Diffusion Transformer" />
3333
<figcaption>How AdaCache works, First Block Cache is a variant of it</figcaption>
3434
</figure>
3535

examples/text_to_image/train_text_to_image_lora.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -515,10 +515,6 @@ def main():
515515
elif accelerator.mixed_precision == "bf16":
516516
weight_dtype = torch.bfloat16
517517

518-
# Freeze the unet parameters before adding adapters
519-
for param in unet.parameters():
520-
param.requires_grad_(False)
521-
522518
unet_lora_config = LoraConfig(
523519
r=args.rank,
524520
lora_alpha=args.rank,

scripts/extract_lora_from_model.py

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
"""
2+
This script demonstrates how to extract a LoRA checkpoint from a fully finetuned model with the CogVideoX model.
3+
4+
To make it work for other models:
5+
6+
* Change the model class. Here we use `CogVideoXTransformer3DModel`. For Flux, it would be `FluxTransformer2DModel`,
7+
for example. (TODO: more reason to add `AutoModel`).
8+
* Spply path to the base checkpoint via `base_ckpt_path`.
9+
* Supply path to the fully fine-tuned checkpoint via `--finetune_ckpt_path`.
10+
* Change the `--rank` as needed.
11+
12+
Example usage:
13+
14+
```bash
15+
python extract_lora_from_model.py \
16+
--base_ckpt_path=THUDM/CogVideoX-5b \
17+
--finetune_ckpt_path=finetrainers/cakeify-v0 \
18+
--lora_out_path=cakeify_lora.safetensors
19+
```
20+
21+
Script is adapted from
22+
https://github.com/Stability-AI/stability-ComfyUI-nodes/blob/001154622564b17223ce0191803c5fff7b87146c/control_lora_create.py
23+
"""
24+
25+
import argparse
26+
27+
import torch
28+
from safetensors.torch import save_file
29+
from tqdm.auto import tqdm
30+
31+
from diffusers import CogVideoXTransformer3DModel
32+
33+
34+
RANK = 64
35+
CLAMP_QUANTILE = 0.99
36+
37+
38+
# Comes from
39+
# https://github.com/Stability-AI/stability-ComfyUI-nodes/blob/001154622564b17223ce0191803c5fff7b87146c/control_lora_create.py#L9
40+
def extract_lora(diff, rank):
41+
# Important to use CUDA otherwise, very slow!
42+
if torch.cuda.is_available():
43+
diff = diff.to("cuda")
44+
45+
is_conv2d = len(diff.shape) == 4
46+
kernel_size = None if not is_conv2d else diff.size()[2:4]
47+
is_conv2d_3x3 = is_conv2d and kernel_size != (1, 1)
48+
out_dim, in_dim = diff.size()[0:2]
49+
rank = min(rank, in_dim, out_dim)
50+
51+
if is_conv2d:
52+
if is_conv2d_3x3:
53+
diff = diff.flatten(start_dim=1)
54+
else:
55+
diff = diff.squeeze()
56+
57+
U, S, Vh = torch.linalg.svd(diff.float())
58+
U = U[:, :rank]
59+
S = S[:rank]
60+
U = U @ torch.diag(S)
61+
Vh = Vh[:rank, :]
62+
63+
dist = torch.cat([U.flatten(), Vh.flatten()])
64+
hi_val = torch.quantile(dist, CLAMP_QUANTILE)
65+
low_val = -hi_val
66+
67+
U = U.clamp(low_val, hi_val)
68+
Vh = Vh.clamp(low_val, hi_val)
69+
if is_conv2d:
70+
U = U.reshape(out_dim, rank, 1, 1)
71+
Vh = Vh.reshape(rank, in_dim, kernel_size[0], kernel_size[1])
72+
return (U.cpu(), Vh.cpu())
73+
74+
75+
def parse_args():
76+
parser = argparse.ArgumentParser()
77+
parser.add_argument(
78+
"--base_ckpt_path",
79+
default=None,
80+
type=str,
81+
required=True,
82+
help="Base checkpoint path from which the model was finetuned. Can be a model ID on the Hub.",
83+
)
84+
parser.add_argument(
85+
"--base_subfolder",
86+
default="transformer",
87+
type=str,
88+
help="subfolder to load the base checkpoint from if any.",
89+
)
90+
parser.add_argument(
91+
"--finetune_ckpt_path",
92+
default=None,
93+
type=str,
94+
required=True,
95+
help="Fully fine-tuned checkpoint path. Can be a model ID on the Hub.",
96+
)
97+
parser.add_argument(
98+
"--finetune_subfolder",
99+
default=None,
100+
type=str,
101+
help="subfolder to load the fulle finetuned checkpoint from if any.",
102+
)
103+
parser.add_argument("--rank", default=64, type=int)
104+
parser.add_argument("--lora_out_path", default=None, type=str, required=True)
105+
args = parser.parse_args()
106+
107+
if not args.lora_out_path.endswith(".safetensors"):
108+
raise ValueError("`lora_out_path` must end with `.safetensors`.")
109+
110+
return args
111+
112+
113+
@torch.no_grad()
114+
def main(args):
115+
model_finetuned = CogVideoXTransformer3DModel.from_pretrained(
116+
args.finetune_ckpt_path, subfolder=args.finetune_subfolder, torch_dtype=torch.bfloat16
117+
)
118+
state_dict_ft = model_finetuned.state_dict()
119+
120+
# Change the `subfolder` as needed.
121+
base_model = CogVideoXTransformer3DModel.from_pretrained(
122+
args.base_ckpt_path, subfolder=args.base_subfolder, torch_dtype=torch.bfloat16
123+
)
124+
state_dict = base_model.state_dict()
125+
output_dict = {}
126+
127+
for k in tqdm(state_dict, desc="Extracting LoRA..."):
128+
original_param = state_dict[k]
129+
finetuned_param = state_dict_ft[k]
130+
if len(original_param.shape) >= 2:
131+
diff = finetuned_param.float() - original_param.float()
132+
out = extract_lora(diff, RANK)
133+
name = k
134+
135+
if name.endswith(".weight"):
136+
name = name[: -len(".weight")]
137+
down_key = "{}.lora_A.weight".format(name)
138+
up_key = "{}.lora_B.weight".format(name)
139+
140+
output_dict[up_key] = out[0].contiguous().to(finetuned_param.dtype)
141+
output_dict[down_key] = out[1].contiguous().to(finetuned_param.dtype)
142+
143+
prefix = "transformer" if "transformer" in base_model.__class__.__name__.lower() else "unet"
144+
output_dict = {f"{prefix}.{k}": v for k, v in output_dict.items()}
145+
save_file(output_dict, args.lora_out_path)
146+
print(f"LoRA saved and it contains {len(output_dict)} keys.")
147+
148+
149+
if __name__ == "__main__":
150+
args = parse_args()
151+
main(args)

src/diffusers/pipelines/flux/pipeline_flux_controlnet_inpainting.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -930,8 +930,8 @@ def __call__(
930930
if isinstance(self.controlnet, FluxControlNetModel):
931931
control_image = self.prepare_image(
932932
image=control_image,
933-
width=height,
934-
height=width,
933+
width=width,
934+
height=height,
935935
batch_size=batch_size * num_images_per_prompt,
936936
num_images_per_prompt=num_images_per_prompt,
937937
device=device,

0 commit comments

Comments
 (0)