Skip to content

Commit de9d3b7

Browse files
authored
Merge branch 'main' into flux-quantized-w-lora
2 parents 12a837b + 723dbdd commit de9d3b7

File tree

59 files changed

+4616
-307
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+4616
-307
lines changed

.github/workflows/pr_style_bot.yml

Lines changed: 0 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -13,39 +13,5 @@ jobs:
1313
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@main
1414
with:
1515
python_quality_dependencies: "[quality]"
16-
pre_commit_script_name: "Download and Compare files from the main branch"
17-
pre_commit_script: |
18-
echo "Downloading the files from the main branch"
19-
20-
curl -o main_Makefile https://raw.githubusercontent.com/huggingface/diffusers/main/Makefile
21-
curl -o main_setup.py https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/setup.py
22-
curl -o main_check_doc_toc.py https://raw.githubusercontent.com/huggingface/diffusers/refs/heads/main/utils/check_doc_toc.py
23-
24-
echo "Compare the files and raise error if needed"
25-
26-
diff_failed=0
27-
if ! diff -q main_Makefile Makefile; then
28-
echo "Error: The Makefile has changed. Please ensure it matches the main branch."
29-
diff_failed=1
30-
fi
31-
32-
if ! diff -q main_setup.py setup.py; then
33-
echo "Error: The setup.py has changed. Please ensure it matches the main branch."
34-
diff_failed=1
35-
fi
36-
37-
if ! diff -q main_check_doc_toc.py utils/check_doc_toc.py; then
38-
echo "Error: The utils/check_doc_toc.py has changed. Please ensure it matches the main branch."
39-
diff_failed=1
40-
fi
41-
42-
if [ $diff_failed -eq 1 ]; then
43-
echo "❌ Error happened as we detected changes in the files that should not be changed ❌"
44-
exit 1
45-
fi
46-
47-
echo "No changes in the files. Proceeding..."
48-
rm -rf main_Makefile main_setup.py main_check_doc_toc.py
49-
style_command: "make style && make quality"
5016
secrets:
5117
bot_token: ${{ secrets.GITHUB_TOKEN }}

docs/source/en/api/pipelines/deepfloyd_if.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
<div class="flex flex-wrap space-x-1">
1616
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
1718
</div>
1819

1920
## Overview

docs/source/en/api/pipelines/flux.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
<div class="flex flex-wrap space-x-1">
1616
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
1718
</div>
1819

1920
Flux is a series of text-to-image generation models based on diffusion transformers. To know more about Flux, check out the original [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/) by the creators of Flux, Black Forest Labs.

docs/source/en/api/pipelines/kolors.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
<div class="flex flex-wrap space-x-1">
1616
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
1718
</div>
1819

1920
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/kolors/kolors_header_collage.png)

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
<div class="flex flex-wrap space-x-1">
1818
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
1920
</div>
2021

2122
[LTX Video](https://huggingface.co/Lightricks/LTX-Video) is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. We provide a model for both text-to-video as well as image + text-to-video usecases.
@@ -32,6 +33,7 @@ Available models:
3233
|:-------------:|:-----------------:|
3334
| [`LTX Video 0.9.0`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors) | `torch.bfloat16` |
3435
| [`LTX Video 0.9.1`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) | `torch.bfloat16` |
36+
| [`LTX Video 0.9.5`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.5.safetensors) | `torch.bfloat16` |
3537

3638
Note: The recommended dtype is for the transformer component. The VAE and text encoders can be either `torch.float32`, `torch.bfloat16` or `torch.float16` but the recommended dtype is `torch.bfloat16` as used in the original repository.
3739

docs/source/en/api/pipelines/sana.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
<div class="flex flex-wrap space-x-1">
1818
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
1920
</div>
2021

2122
[SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://huggingface.co/papers/2410.10629) from NVIDIA and MIT HAN Lab, by Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han.

docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
<div class="flex flex-wrap space-x-1">
1616
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
1718
</div>
1819

1920
Stable Diffusion 3 (SD3) was proposed in [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/pdf/2403.03206.pdf) by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Muller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach.

docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ specific language governing permissions and limitations under the License.
1414

1515
<div class="flex flex-wrap space-x-1">
1616
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
1718
</div>
1819

1920
Stable Diffusion XL (SDXL) was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://huggingface.co/papers/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach.

docs/source/en/api/pipelines/wan.md

Lines changed: 44 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,46 @@ output = pipe(
133133
export_to_video(output, "wan-i2v.mp4", fps=16)
134134
```
135135

136+
### Video to Video Generation
137+
138+
```python
139+
import torch
140+
from diffusers.utils import load_video, export_to_video
141+
from diffusers import AutoencoderKLWan, WanVideoToVideoPipeline, UniPCMultistepScheduler
142+
143+
# Available models: Wan-AI/Wan2.1-T2V-14B-Diffusers, Wan-AI/Wan2.1-T2V-1.3B-Diffusers
144+
model_id = "Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
145+
vae = AutoencoderKLWan.from_pretrained(
146+
model_id, subfolder="vae", torch_dtype=torch.float32
147+
)
148+
pipe = WanVideoToVideoPipeline.from_pretrained(
149+
model_id, vae=vae, torch_dtype=torch.bfloat16
150+
)
151+
flow_shift = 3.0 # 5.0 for 720P, 3.0 for 480P
152+
pipe.scheduler = UniPCMultistepScheduler.from_config(
153+
pipe.scheduler.config, flow_shift=flow_shift
154+
)
155+
# change to pipe.to("cuda") if you have sufficient VRAM
156+
pipe.enable_model_cpu_offload()
157+
158+
prompt = "A robot standing on a mountain top. The sun is setting in the background"
159+
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
160+
video = load_video(
161+
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4"
162+
)
163+
output = pipe(
164+
video=video,
165+
prompt=prompt,
166+
negative_prompt=negative_prompt,
167+
height=480,
168+
width=512,
169+
guidance_scale=7.0,
170+
strength=0.7,
171+
).frames[0]
172+
173+
export_to_video(output, "wan-v2v.mp4", fps=16)
174+
```
175+
136176
## Memory Optimizations for Wan 2.1
137177

138178
Base inference with the large 14B Wan 2.1 models can take up to 35GB of VRAM when generating videos at 720p resolution. We'll outline a few memory optimizations we can apply to reduce the VRAM required to run the model.
@@ -323,7 +363,7 @@ import numpy as np
323363
from diffusers import AutoencoderKLWan, WanTransformer3DModel, WanImageToVideoPipeline
324364
from diffusers.hooks.group_offloading import apply_group_offloading
325365
from diffusers.utils import export_to_video, load_image
326-
from transformers import UMT5EncoderModel, CLIPVisionMode
366+
from transformers import UMT5EncoderModel, CLIPVisionModel
327367

328368
model_id = "Wan-AI/Wan2.1-I2V-14B-720P-Diffusers"
329369
image_encoder = CLIPVisionModel.from_pretrained(
@@ -356,7 +396,7 @@ prompt = (
356396
"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in "
357397
"the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
358398
)
359-
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards
399+
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"
360400
num_frames = 33
361401

362402
output = pipe(
@@ -372,7 +412,7 @@ output = pipe(
372412
export_to_video(output, "wan-i2v.mp4", fps=16)
373413
```
374414

375-
### Using a Custom Scheduler
415+
## Using a Custom Scheduler
376416

377417
Wan can be used with many different schedulers, each with their own benefits regarding speed and generation quality. By default, Wan uses the `UniPCMultistepScheduler(prediction_type="flow_prediction", use_flow_sigmas=True, flow_shift=3.0)` scheduler. You can use a different scheduler as follows:
378418

@@ -403,7 +443,7 @@ transformer = WanTransformer3DModel.from_single_file(ckpt_path, torch_dtype=torc
403443
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", transformer=transformer)
404444
```
405445

406-
## Recommendations for Inference:
446+
## Recommendations for Inference
407447
- Keep `AutencoderKLWan` in `torch.float32` for better decoding quality.
408448
- `num_frames` should satisfy the following constraint: `(num_frames - 1) % 4 == 0`
409449
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution videos, try higher values (between `7.0` and `12.0`). The default value is `3.0` for Wan.

docs/source/en/installation.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -161,10 +161,10 @@ Your Python environment will find the `main` version of 🤗 Diffusers on the ne
161161

162162
Model weights and files are downloaded from the Hub to a cache which is usually your home directory. You can change the cache location by specifying the `HF_HOME` or `HUGGINFACE_HUB_CACHE` environment variables or configuring the `cache_dir` parameter in methods like [`~DiffusionPipeline.from_pretrained`].
163163

164-
Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `True` and 🤗 Diffusers will only load previously downloaded files in the cache.
164+
Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `1` and 🤗 Diffusers will only load previously downloaded files in the cache.
165165

166166
```shell
167-
export HF_HUB_OFFLINE=True
167+
export HF_HUB_OFFLINE=1
168168
```
169169

170170
For more details about managing and cleaning the cache, take a look at the [caching](https://huggingface.co/docs/huggingface_hub/guides/manage-cache) guide.
@@ -179,14 +179,16 @@ Telemetry is only sent when loading models and pipelines from the Hub,
179179
and it is not collected if you're loading local files.
180180

181181
We understand that not everyone wants to share additional information,and we respect your privacy.
182-
You can disable telemetry collection by setting the `DISABLE_TELEMETRY` environment variable from your terminal:
182+
You can disable telemetry collection by setting the `HF_HUB_DISABLE_TELEMETRY` environment variable from your terminal:
183183

184184
On Linux/MacOS:
185+
185186
```bash
186-
export DISABLE_TELEMETRY=YES
187+
export HF_HUB_DISABLE_TELEMETRY=1
187188
```
188189

189190
On Windows:
191+
190192
```bash
191-
set DISABLE_TELEMETRY=YES
193+
set HF_HUB_DISABLE_TELEMETRY=1
192194
```

0 commit comments

Comments
 (0)