Skip to content

Commit 0e64457

Browse files
authored
Merge branch 'main' into test-pipeline-timesteps
2 parents 67124eb + 96c376a commit 0e64457

File tree

112 files changed

+21645
-525
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

112 files changed

+21645
-525
lines changed

.github/workflows/pr_test_peft_backend.yml

Lines changed: 0 additions & 134 deletions
This file was deleted.

.github/workflows/pr_tests.yml

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,3 +234,67 @@ jobs:
234234
with:
235235
name: pr_${{ matrix.config.report }}_test_reports
236236
path: reports
237+
238+
run_lora_tests:
239+
needs: [check_code_quality, check_repository_consistency]
240+
strategy:
241+
fail-fast: false
242+
243+
name: LoRA tests with PEFT main
244+
245+
runs-on:
246+
group: aws-general-8-plus
247+
248+
container:
249+
image: diffusers/diffusers-pytorch-cpu
250+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
251+
252+
defaults:
253+
run:
254+
shell: bash
255+
256+
steps:
257+
- name: Checkout diffusers
258+
uses: actions/checkout@v3
259+
with:
260+
fetch-depth: 2
261+
262+
- name: Install dependencies
263+
run: |
264+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
265+
python -m uv pip install -e [quality,test]
266+
# TODO (sayakpaul, DN6): revisit `--no-deps`
267+
python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
268+
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
269+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
270+
271+
- name: Environment
272+
run: |
273+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
274+
python utils/print_env.py
275+
276+
- name: Run fast PyTorch LoRA tests with PEFT
277+
run: |
278+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
279+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
280+
-s -v \
281+
--make-reports=tests_peft_main \
282+
tests/lora/
283+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
284+
-s -v \
285+
--make-reports=tests_models_lora_peft_main \
286+
tests/models/ -k "lora"
287+
288+
- name: Failure short reports
289+
if: ${{ failure() }}
290+
run: |
291+
cat reports/tests_lora_failures_short.txt
292+
cat reports/tests_models_lora_failures_short.txt
293+
294+
- name: Test suite reports artifacts
295+
if: ${{ always() }}
296+
uses: actions/upload-artifact@v4
297+
with:
298+
name: pr_main_test_reports
299+
path: reports
300+

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,8 @@ Check out the [Quickstart](https://huggingface.co/docs/diffusers/quicktour) to l
112112
| **Documentation** | **What can I learn?** |
113113
|---------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
114114
| [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
115-
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
116-
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
115+
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
116+
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/overview_techniques) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
117117
| [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
118118
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
119119
## Contribution

docs/source/en/_toctree.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,8 @@
252252
title: SD3ControlNetModel
253253
- local: api/models/controlnet_sparsectrl
254254
title: SparseControlNetModel
255+
- local: api/models/controlnet_union
256+
title: ControlNetUnionModel
255257
title: ControlNets
256258
- sections:
257259
- local: api/models/allegro_transformer3d
@@ -272,6 +274,8 @@
272274
title: LatteTransformer3DModel
273275
- local: api/models/lumina_nextdit2d
274276
title: LuminaNextDiT2DModel
277+
- local: api/models/ltx_video_transformer3d
278+
title: LTXVideoTransformer3DModel
275279
- local: api/models/mochi_transformer3d
276280
title: MochiTransformer3DModel
277281
- local: api/models/pixart_transformer2d
@@ -310,10 +314,14 @@
310314
title: AutoencoderKLAllegro
311315
- local: api/models/autoencoderkl_cogvideox
312316
title: AutoencoderKLCogVideoX
317+
- local: api/models/autoencoderkl_ltx_video
318+
title: AutoencoderKLLTXVideo
313319
- local: api/models/autoencoderkl_mochi
314320
title: AutoencoderKLMochi
315321
- local: api/models/asymmetricautoencoderkl
316322
title: AsymmetricAutoencoderKL
323+
- local: api/models/autoencoder_dc
324+
title: AutoencoderDC
317325
- local: api/models/consistency_decoder_vae
318326
title: ConsistencyDecoderVAE
319327
- local: api/models/autoencoder_oobleck
@@ -366,6 +374,8 @@
366374
title: ControlNet-XS
367375
- local: api/pipelines/controlnetxs_sdxl
368376
title: ControlNet-XS with Stable Diffusion XL
377+
- local: api/pipelines/controlnet_union
378+
title: ControlNetUnion
369379
- local: api/pipelines/dance_diffusion
370380
title: Dance Diffusion
371381
- local: api/pipelines/ddim
@@ -402,6 +412,8 @@
402412
title: Latte
403413
- local: api/pipelines/ledits_pp
404414
title: LEDITS++
415+
- local: api/pipelines/ltx_video
416+
title: LTX
405417
- local: api/pipelines/lumina
406418
title: Lumina-T2X
407419
- local: api/pipelines/marigold
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderDC
13+
14+
The 2D Autoencoder model used in [SANA](https://huggingface.co/papers/2410.10629) and introduced in [DCAE](https://huggingface.co/papers/2410.10733) by authors Junyu Chen\*, Han Cai\*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han from MIT HAN Lab.
15+
16+
The abstract from the paper is:
17+
18+
*We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at [this https URL](https://github.com/mit-han-lab/efficientvit).*
19+
20+
The following DCAE models are released and supported in Diffusers.
21+
22+
| Diffusers format | Original format |
23+
|:----------------:|:---------------:|
24+
| [`mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-sana-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0)
25+
| [`mit-han-lab/dc-ae-f32c32-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0)
26+
| [`mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0)
27+
| [`mit-han-lab/dc-ae-f64c128-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0)
28+
| [`mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0)
29+
| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
30+
| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)
31+
32+
Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].
33+
34+
```python
35+
from diffusers import AutoencoderDC
36+
37+
ae = AutoencoderDC.from_pretrained("mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers", torch_dtype=torch.float32).to("cuda")
38+
```
39+
40+
## Load a model in Diffusers via `from_single_file`
41+
42+
```python
43+
from difusers import AutoencoderDC
44+
45+
ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0/blob/main/model.safetensors"
46+
model = AutoencoderDC.from_single_file(ckpt_path)
47+
48+
```
49+
50+
The `AutoencoderDC` model has `in` and `mix` single file checkpoint variants that have matching checkpoint keys, but use different scaling factors. It is not possible for Diffusers to automatically infer the correct config file to use with the model based on just the checkpoint and will default to configuring the model using the `mix` variant config file. To override the automatically determined config, please use the `config` argument when using single file loading with `in` variant checkpoints.
51+
52+
```python
53+
from diffusers import AutoencoderDC
54+
55+
ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0/blob/main/model.safetensors"
56+
model = AutoencoderDC.from_single_file(ckpt_path, config="mit-han-lab/dc-ae-f128c512-in-1.0-diffusers")
57+
```
58+
59+
60+
## AutoencoderDC
61+
62+
[[autodoc]] AutoencoderDC
63+
- encode
64+
- decode
65+
- all
66+
67+
## DecoderOutput
68+
69+
[[autodoc]] models.autoencoders.vae.DecoderOutput
70+
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLLTXVideo
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [LTX](https://huggingface.co/Lightricks/LTX-Video) was introduced by Lightricks.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLLTXVideo
20+
21+
vae = AutoencoderKLLTXVideo.from_pretrained("TODO/TODO", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLLTXVideo
25+
26+
[[autodoc]] AutoencoderKLLTXVideo
27+
- decode
28+
- encode
29+
- all
30+
31+
## AutoencoderKLOutput
32+
33+
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
34+
35+
## DecoderOutput
36+
37+
[[autodoc]] models.autoencoders.vae.DecoderOutput

0 commit comments

Comments
 (0)