Skip to content

Commit 017d4e9

Browse files
authored
Merge branch 'main' into get-2d-sincos-pos-embed-np
2 parents c5bd771 + ad40e26 commit 017d4e9

29 files changed

+4137
-60
lines changed

docs/source/en/api/models/autoencoder_dc.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,26 @@ from diffusers import AutoencoderDC
3737
ae = AutoencoderDC.from_pretrained("mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers", torch_dtype=torch.float32).to("cuda")
3838
```
3939

40+
## Load a model in Diffusers via `from_single_file`
41+
42+
```python
43+
from difusers import AutoencoderDC
44+
45+
ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0/blob/main/model.safetensors"
46+
model = AutoencoderDC.from_single_file(ckpt_path)
47+
48+
```
49+
50+
The `AutoencoderDC` model has `in` and `mix` single file checkpoint variants that have matching checkpoint keys, but use different scaling factors. It is not possible for Diffusers to automatically infer the correct config file to use with the model based on just the checkpoint and will default to configuring the model using the `mix` variant config file. To override the automatically determined config, please use the `config` argument when using single file loading with `in` variant checkpoints.
51+
52+
```python
53+
from diffusers import AutoencoderDC
54+
55+
ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0/blob/main/model.safetensors"
56+
model = AutoencoderDC.from_single_file(ckpt_path, config="mit-han-lab/dc-ae-f128c512-in-1.0-diffusers")
57+
```
58+
59+
4060
## AutoencoderDC
4161

4262
[[autodoc]] AutoencoderDC

docs/source/en/api/pipelines/flux.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,35 @@ image = pipe(
143143
image.save("output.png")
144144
```
145145

146+
Canny Control is also possible with a LoRA variant of this condition. The usage is as follows:
147+
148+
```python
149+
# !pip install -U controlnet-aux
150+
import torch
151+
from controlnet_aux import CannyDetector
152+
from diffusers import FluxControlPipeline
153+
from diffusers.utils import load_image
154+
155+
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
156+
pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora")
157+
158+
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
159+
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
160+
161+
processor = CannyDetector()
162+
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)
163+
164+
image = pipe(
165+
prompt=prompt,
166+
control_image=control_image,
167+
height=1024,
168+
width=1024,
169+
num_inference_steps=50,
170+
guidance_scale=30.0,
171+
).images[0]
172+
image.save("output.png")
173+
```
174+
146175
### Depth Control
147176

148177
**Note:** `black-forest-labs/Flux.1-Depth-dev` is _not_ a ControlNet model. [`ControlNetModel`] models are a separate component from the UNet/Transformer whose residuals are added to the actual underlying model. Depth Control is an alternate architecture that achieves effectively the same results as a ControlNet model would, by using channel-wise concatenation with input control condition and ensuring the transformer learns structure control by following the condition as closely as possible.
@@ -174,6 +203,36 @@ image = pipe(
174203
image.save("output.png")
175204
```
176205

206+
Depth Control is also possible with a LoRA variant of this condition. The usage is as follows:
207+
208+
```python
209+
# !pip install git+https://github.com/huggingface/image_gen_aux
210+
import torch
211+
from diffusers import FluxControlPipeline, FluxTransformer2DModel
212+
from diffusers.utils import load_image
213+
from image_gen_aux import DepthPreprocessor
214+
215+
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
216+
pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora")
217+
218+
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
219+
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
220+
221+
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
222+
control_image = processor(control_image)[0].convert("RGB")
223+
224+
image = pipe(
225+
prompt=prompt,
226+
control_image=control_image,
227+
height=1024,
228+
width=1024,
229+
num_inference_steps=30,
230+
guidance_scale=10.0,
231+
generator=torch.Generator().manual_seed(42),
232+
).images[0]
233+
image.save("output.png")
234+
```
235+
177236
### Redux
178237

179238
* Flux Redux pipeline is an adapter for FLUX.1 base models. It can be used with both flux-dev and flux-schnell, for image-to-image generation.

docs/source/en/api/pipelines/pag.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,11 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
4848
- all
4949
- __call__
5050

51+
## StableDiffusionPAGInpaintPipeline
52+
[[autodoc]] StableDiffusionPAGInpaintPipeline
53+
- all
54+
- __call__
55+
5156
## StableDiffusionPAGPipeline
5257
[[autodoc]] StableDiffusionPAGPipeline
5358
- all

examples/cogvideo/train_cogvideox_image_to_video_lora.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -872,10 +872,9 @@ def prepare_rotary_positional_embeddings(
872872
crops_coords=grid_crops_coords,
873873
grid_size=(grid_height, grid_width),
874874
temporal_size=num_frames,
875+
device=device,
875876
)
876877

877-
freqs_cos = freqs_cos.to(device=device)
878-
freqs_sin = freqs_sin.to(device=device)
879878
return freqs_cos, freqs_sin
880879

881880

examples/cogvideo/train_cogvideox_lora.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -894,10 +894,9 @@ def prepare_rotary_positional_embeddings(
894894
crops_coords=grid_crops_coords,
895895
grid_size=(grid_height, grid_width),
896896
temporal_size=num_frames,
897+
device=device,
897898
)
898899

899-
freqs_cos = freqs_cos.to(device=device)
900-
freqs_sin = freqs_sin.to(device=device)
901900
return freqs_cos, freqs_sin
902901

903902

0 commit comments

Comments
 (0)