Skip to content

Commit a771982

Browse files
committed
Merge branch 'main' into metadata-lora
2 parents 201bd7b + e23705e commit a771982

File tree

223 files changed

+13849
-3023
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

223 files changed

+13849
-3023
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,55 @@ jobs:
180180
pip install slack_sdk tabulate
181181
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
182182
183+
run_torch_compile_tests:
184+
name: PyTorch Compile CUDA tests
185+
186+
runs-on:
187+
group: aws-g4dn-2xlarge
188+
189+
container:
190+
image: diffusers/diffusers-pytorch-compile-cuda
191+
options: --gpus 0 --shm-size "16gb" --ipc host
192+
193+
steps:
194+
- name: Checkout diffusers
195+
uses: actions/checkout@v3
196+
with:
197+
fetch-depth: 2
198+
199+
- name: NVIDIA-SMI
200+
run: |
201+
nvidia-smi
202+
- name: Install dependencies
203+
run: |
204+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
205+
python -m uv pip install -e [quality,test,training]
206+
- name: Environment
207+
run: |
208+
python utils/print_env.py
209+
- name: Run torch compile tests on GPU
210+
env:
211+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
212+
RUN_COMPILE: yes
213+
run: |
214+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
215+
- name: Failure short reports
216+
if: ${{ failure() }}
217+
run: cat reports/tests_torch_compile_cuda_failures_short.txt
218+
219+
- name: Test suite reports artifacts
220+
if: ${{ always() }}
221+
uses: actions/upload-artifact@v4
222+
with:
223+
name: torch_compile_test_reports
224+
path: reports
225+
226+
- name: Generate Report and Notify Channel
227+
if: always()
228+
run: |
229+
pip install slack_sdk tabulate
230+
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
231+
183232
run_big_gpu_torch_tests:
184233
name: Torch tests on big GPU
185234
strategy:

.github/workflows/release_tests_fast.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ jobs:
335335
- name: Environment
336336
run: |
337337
python utils/print_env.py
338-
- name: Run example tests on GPU
338+
- name: Run torch compile tests on GPU
339339
env:
340340
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
341341
RUN_COMPILE: yes

docs/source/en/_toctree.yml

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,8 @@
1717
title: AutoPipeline
1818
- local: tutorials/basic_training
1919
title: Train a diffusion model
20-
- local: tutorials/using_peft_for_inference
21-
title: Load LoRAs for inference
2220
- local: tutorials/fast_diffusion
2321
title: Accelerate inference of text-to-image diffusion models
24-
- local: tutorials/inference_with_big_models
25-
title: Working with big models
2622
title: Tutorials
2723
- sections:
2824
- local: using-diffusers/loading
@@ -33,11 +29,24 @@
3329
title: Load schedulers and models
3430
- local: using-diffusers/other-formats
3531
title: Model files and layouts
36-
- local: using-diffusers/loading_adapters
37-
title: Load adapters
3832
- local: using-diffusers/push_to_hub
3933
title: Push files to the Hub
4034
title: Load pipelines and adapters
35+
- sections:
36+
- local: tutorials/using_peft_for_inference
37+
title: LoRA
38+
- local: using-diffusers/ip_adapter
39+
title: IP-Adapter
40+
- local: using-diffusers/controlnet
41+
title: ControlNet
42+
- local: using-diffusers/t2i_adapter
43+
title: T2I-Adapter
44+
- local: using-diffusers/dreambooth
45+
title: DreamBooth
46+
- local: using-diffusers/textual_inversion_inference
47+
title: Textual inversion
48+
title: Adapters
49+
isExpanded: false
4150
- sections:
4251
- local: using-diffusers/unconditional_image_generation
4352
title: Unconditional image generation
@@ -59,8 +68,6 @@
5968
title: Create a server
6069
- local: training/distributed_inference
6170
title: Distributed inference
62-
- local: using-diffusers/merge_loras
63-
title: Merge LoRAs
6471
- local: using-diffusers/scheduler_features
6572
title: Scheduler features
6673
- local: using-diffusers/callback
@@ -97,20 +104,12 @@
97104
title: SDXL Turbo
98105
- local: using-diffusers/kandinsky
99106
title: Kandinsky
100-
- local: using-diffusers/ip_adapter
101-
title: IP-Adapter
102107
- local: using-diffusers/omnigen
103108
title: OmniGen
104109
- local: using-diffusers/pag
105110
title: PAG
106-
- local: using-diffusers/controlnet
107-
title: ControlNet
108-
- local: using-diffusers/t2i_adapter
109-
title: T2I-Adapter
110111
- local: using-diffusers/inference_with_lcm
111112
title: Latent Consistency Model
112-
- local: using-diffusers/textual_inversion_inference
113-
title: Textual inversion
114113
- local: using-diffusers/shap-e
115114
title: Shap-E
116115
- local: using-diffusers/diffedit
@@ -180,7 +179,7 @@
180179
title: Quantization Methods
181180
- sections:
182181
- local: optimization/fp16
183-
title: Speed up inference
182+
title: Accelerate inference
184183
- local: optimization/memory
185184
title: Reduce memory usage
186185
- local: optimization/torch2.0

docs/source/en/api/loaders/lora.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
2828
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
2929
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
3030
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
31+
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
3132
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
3233

3334
<Tip>
@@ -91,6 +92,10 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
9192

9293
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
9394

95+
## HiDreamImageLoraLoaderMixin
96+
97+
[[autodoc]] loaders.lora_pipeline.HiDreamImageLoraLoaderMixin
98+
9499
## LoraBaseMixin
95100

96101
[[autodoc]] loaders.lora_base.LoraBaseMixin

docs/source/en/api/pipelines/animatediff.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -966,7 +966,7 @@ pipe.to("cuda")
966966
prompt = {
967967
0: "A caterpillar on a leaf, high quality, photorealistic",
968968
40: "A caterpillar transforming into a cocoon, on a leaf, near flowers, photorealistic",
969-
80: "A cocoon on a leaf, flowers in the backgrond, photorealistic",
969+
80: "A cocoon on a leaf, flowers in the background, photorealistic",
970970
120: "A cocoon maturing and a butterfly being born, flowers and leaves visible in the background, photorealistic",
971971
160: "A beautiful butterfly, vibrant colors, sitting on a leaf, flowers in the background, photorealistic",
972972
200: "A beautiful butterfly, flying away in a forest, photorealistic",

docs/source/en/api/pipelines/flux.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ image = pipe(
347347
height=1024,
348348
prompt="wearing sunglasses",
349349
negative_prompt="",
350-
true_cfg=4.0,
350+
true_cfg_scale=4.0,
351351
generator=torch.Generator().manual_seed(4444),
352352
ip_adapter_image=image,
353353
).images[0]

docs/source/en/api/pipelines/ledits_pp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ You can find additional information about LEDITS++ on the [project page](https:/
2929
</Tip>
3030

3131
<Tip warning={true}>
32-
Due to some backward compatability issues with the current diffusers implementation of [`~schedulers.DPMSolverMultistepScheduler`] this implementation of LEdits++ can no longer guarantee perfect inversion.
32+
Due to some backward compatibility issues with the current diffusers implementation of [`~schedulers.DPMSolverMultistepScheduler`] this implementation of LEdits++ can no longer guarantee perfect inversion.
3333
This issue is unlikely to have any noticeable effects on applied use-cases. However, we provide an alternative implementation that guarantees perfect inversion in a dedicated [GitHub repo](https://github.com/ml-research/ledits_pp).
3434
</Tip>
3535

docs/source/en/api/pipelines/wan.md

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424

2525
## Generating Videos with Wan 2.1
2626

27-
We will first need to install some addtional dependencies.
27+
We will first need to install some additional dependencies.
2828

2929
```shell
3030
pip install -u ftfy imageio-ffmpeg imageio
@@ -133,6 +133,60 @@ output = pipe(
133133
export_to_video(output, "wan-i2v.mp4", fps=16)
134134
```
135135

136+
### First and Last Frame Interpolation
137+
138+
```python
139+
import numpy as np
140+
import torch
141+
import torchvision.transforms.functional as TF
142+
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
143+
from diffusers.utils import export_to_video, load_image
144+
from transformers import CLIPVisionModel
145+
146+
147+
model_id = "Wan-AI/Wan2.1-FLF2V-14B-720P-diffusers"
148+
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
149+
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
150+
pipe = WanImageToVideoPipeline.from_pretrained(
151+
model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16
152+
)
153+
pipe.to("cuda")
154+
155+
first_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
156+
last_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
157+
158+
def aspect_ratio_resize(image, pipe, max_area=720 * 1280):
159+
aspect_ratio = image.height / image.width
160+
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
161+
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
162+
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
163+
image = image.resize((width, height))
164+
return image, height, width
165+
166+
def center_crop_resize(image, height, width):
167+
# Calculate resize ratio to match first frame dimensions
168+
resize_ratio = max(width / image.width, height / image.height)
169+
170+
# Resize the image
171+
width = round(image.width * resize_ratio)
172+
height = round(image.height * resize_ratio)
173+
size = [width, height]
174+
image = TF.center_crop(image, size)
175+
176+
return image, height, width
177+
178+
first_frame, height, width = aspect_ratio_resize(first_frame, pipe)
179+
if last_frame.size != first_frame.size:
180+
last_frame, _, _ = center_crop_resize(last_frame, height, width)
181+
182+
prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective."
183+
184+
output = pipe(
185+
image=first_frame, last_image=last_frame, prompt=prompt, height=height, width=width, guidance_scale=5.5
186+
).frames[0]
187+
export_to_video(output, "output.mp4", fps=16)
188+
```
189+
136190
### Video to Video Generation
137191

138192
```python
@@ -231,7 +285,7 @@ pipe = WanImageToVideoPipeline.from_pretrained(
231285
image_encoder=image_encoder,
232286
torch_dtype=torch.bfloat16
233287
)
234-
# Since we've offloaded the larger models alrady, we can move the rest of the model components to GPU
288+
# Since we've offloaded the larger models already, we can move the rest of the model components to GPU
235289
pipe.to("cuda")
236290

237291
image = load_image(
@@ -314,7 +368,7 @@ pipe = WanImageToVideoPipeline.from_pretrained(
314368
image_encoder=image_encoder,
315369
torch_dtype=torch.bfloat16
316370
)
317-
# Since we've offloaded the larger models alrady, we can move the rest of the model components to GPU
371+
# Since we've offloaded the larger models already, we can move the rest of the model components to GPU
318372
pipe.to("cuda")
319373

320374
image = load_image(

0 commit comments

Comments
 (0)