Skip to content

Commit 2b9bfa3

Browse files
authored
Merge branch 'main' into flux-control-lora
2 parents 3204627 + 188bca3 commit 2b9bfa3

File tree

19 files changed

+3040
-21
lines changed

19 files changed

+3040
-21
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,8 @@
314314
title: AutoencoderKLMochi
315315
- local: api/models/asymmetricautoencoderkl
316316
title: AsymmetricAutoencoderKL
317+
- local: api/models/autoencoder_dc
318+
title: AutoencoderDC
317319
- local: api/models/consistency_decoder_vae
318320
title: ConsistencyDecoderVAE
319321
- local: api/models/autoencoder_oobleck
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderDC
13+
14+
The 2D Autoencoder model used in [SANA](https://huggingface.co/papers/2410.10629) and introduced in [DCAE](https://huggingface.co/papers/2410.10733) by authors Junyu Chen\*, Han Cai\*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han from MIT HAN Lab.
15+
16+
The abstract from the paper is:
17+
18+
*We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at [this https URL](https://github.com/mit-han-lab/efficientvit).*
19+
20+
The following DCAE models are released and supported in Diffusers.
21+
22+
| Diffusers format | Original format |
23+
|:----------------:|:---------------:|
24+
| [`mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-sana-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0)
25+
| [`mit-han-lab/dc-ae-f32c32-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0)
26+
| [`mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0)
27+
| [`mit-han-lab/dc-ae-f64c128-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0)
28+
| [`mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0)
29+
| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
30+
| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)
31+
32+
Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].
33+
34+
```python
35+
from diffusers import AutoencoderDC
36+
37+
ae = AutoencoderDC.from_pretrained("mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers", torch_dtype=torch.float32).to("cuda")
38+
```
39+
40+
## AutoencoderDC
41+
42+
[[autodoc]] AutoencoderDC
43+
- encode
44+
- decode
45+
- all
46+
47+
## DecoderOutput
48+
49+
[[autodoc]] models.autoencoders.vae.DecoderOutput
50+

examples/model_search/README.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
# Search models on Civitai and Hugging Face
2+
3+
The [auto_diffusers](https://github.com/suzukimain/auto_diffusers) library provides additional functionalities to Diffusers such as searching for models on Civitai and the Hugging Face Hub.
4+
Please refer to the original library [here](https://pypi.org/project/auto-diffusers/)
5+
6+
## Installation
7+
8+
Before running the scripts, make sure to install the library's training dependencies:
9+
10+
> [!IMPORTANT]
11+
> To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the installation up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment.
12+
13+
```bash
14+
git clone https://github.com/huggingface/diffusers
15+
cd diffusers
16+
pip install .
17+
```
18+
Set up the pipeline. You can also cd to this folder and run it.
19+
```bash
20+
!wget https://raw.githubusercontent.com/suzukimain/auto_diffusers/refs/heads/master/src/auto_diffusers/pipeline_easy.py
21+
```
22+
23+
## Load from Civitai
24+
```python
25+
from pipeline_easy import (
26+
EasyPipelineForText2Image,
27+
EasyPipelineForImage2Image,
28+
EasyPipelineForInpainting,
29+
)
30+
31+
# Text-to-Image
32+
pipeline = EasyPipelineForText2Image.from_civitai(
33+
"search_word",
34+
base_model="SD 1.5",
35+
).to("cuda")
36+
37+
38+
# Image-to-Image
39+
pipeline = EasyPipelineForImage2Image.from_civitai(
40+
"search_word",
41+
base_model="SD 1.5",
42+
).to("cuda")
43+
44+
45+
# Inpainting
46+
pipeline = EasyPipelineForInpainting.from_civitai(
47+
"search_word",
48+
base_model="SD 1.5",
49+
).to("cuda")
50+
```
51+
52+
## Load from Hugging Face
53+
```python
54+
from pipeline_easy import (
55+
EasyPipelineForText2Image,
56+
EasyPipelineForImage2Image,
57+
EasyPipelineForInpainting,
58+
)
59+
60+
# Text-to-Image
61+
pipeline = EasyPipelineForText2Image.from_huggingface(
62+
"search_word",
63+
checkpoint_format="diffusers",
64+
).to("cuda")
65+
66+
67+
# Image-to-Image
68+
pipeline = EasyPipelineForImage2Image.from_huggingface(
69+
"search_word",
70+
checkpoint_format="diffusers",
71+
).to("cuda")
72+
73+
74+
# Inpainting
75+
pipeline = EasyPipelineForInpainting.from_huggingface(
76+
"search_word",
77+
checkpoint_format="diffusers",
78+
).to("cuda")
79+
```
80+
81+
82+
## Search Civitai and Huggingface
83+
84+
```python
85+
from pipeline_easy import (
86+
search_huggingface,
87+
search_civitai,
88+
)
89+
90+
# Search Lora
91+
Lora = search_civitai(
92+
"Keyword_to_search_Lora",
93+
model_type="LORA",
94+
base_model = "SD 1.5",
95+
download=True,
96+
)
97+
# Load Lora into the pipeline.
98+
pipeline.load_lora_weights(Lora)
99+
100+
101+
# Search TextualInversion
102+
TextualInversion = search_civitai(
103+
"EasyNegative",
104+
model_type="TextualInversion",
105+
base_model = "SD 1.5",
106+
download=True
107+
)
108+
# Load TextualInversion into the pipeline.
109+
pipeline.load_textual_inversion(TextualInversion, token="EasyNegative")
110+
```
111+
112+
### Search Civitai
113+
114+
> [!TIP]
115+
> **If an error occurs, insert the `token` and run again.**
116+
117+
#### `EasyPipeline.from_civitai` parameters
118+
119+
| Name | Type | Default | Description |
120+
|:---------------:|:----------------------:|:-------------:|:-----------------------------------------------------------------------------------:|
121+
| search_word | string, Path || The search query string. Can be a keyword, Civitai URL, local directory or file path. |
122+
| model_type | string | `Checkpoint` | The type of model to search for. <br>(for example `Checkpoint`, `TextualInversion`, `Controlnet`, `LORA`, `Hypernetwork`, `AestheticGradient`, `Poses`) |
123+
| base_model | string | None | Trained model tag (for example `SD 1.5`, `SD 3.5`, `SDXL 1.0`) |
124+
| torch_dtype | string, torch.dtype | None | Override the default `torch.dtype` and load the model with another dtype. |
125+
| force_download | bool | False | Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. |
126+
| cache_dir | string, Path | None | Path to the folder where cached files are stored. |
127+
| resume | bool | False | Whether to resume an incomplete download. |
128+
| token | string | None | API token for Civitai authentication. |
129+
130+
131+
#### `search_civitai` parameters
132+
133+
| Name | Type | Default | Description |
134+
|:---------------:|:--------------:|:-------------:|:-----------------------------------------------------------------------------------:|
135+
| search_word | string, Path || The search query string. Can be a keyword, Civitai URL, local directory or file path. |
136+
| model_type | string | `Checkpoint` | The type of model to search for. <br>(for example `Checkpoint`, `TextualInversion`, `Controlnet`, `LORA`, `Hypernetwork`, `AestheticGradient`, `Poses`) |
137+
| base_model | string | None | Trained model tag (for example `SD 1.5`, `SD 3.5`, `SDXL 1.0`) |
138+
| download | bool | False | Whether to download the model. |
139+
| force_download | bool | False | Whether to force the download if the model already exists. |
140+
| cache_dir | string, Path | None | Path to the folder where cached files are stored. |
141+
| resume | bool | False | Whether to resume an incomplete download. |
142+
| token | string | None | API token for Civitai authentication. |
143+
| include_params | bool | False | Whether to include parameters in the returned data. |
144+
| skip_error | bool | False | Whether to skip errors and return None. |
145+
146+
### Search Huggingface
147+
148+
> [!TIP]
149+
> **If an error occurs, insert the `token` and run again.**
150+
151+
#### `EasyPipeline.from_huggingface` parameters
152+
153+
| Name | Type | Default | Description |
154+
|:---------------------:|:-------------------:|:--------------:|:----------------------------------------------------------------:|
155+
| search_word | string, Path || The search query string. Can be a keyword, Hugging Face URL, local directory or file path, or a Hugging Face path (`<creator>/<repo>`). |
156+
| checkpoint_format | string | `single_file` | The format of the model checkpoint.<br>● `single_file` to search for `single file checkpoint` <br>●`diffusers` to search for `multifolder diffusers format checkpoint` |
157+
| torch_dtype | string, torch.dtype | None | Override the default `torch.dtype` and load the model with another dtype. |
158+
| force_download | bool | False | Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. |
159+
| cache_dir | string, Path | None | Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used. |
160+
| token | string, bool | None | The token to use as HTTP bearer authorization for remote files. |
161+
162+
163+
#### `search_huggingface` parameters
164+
165+
| Name | Type | Default | Description |
166+
|:---------------------:|:-------------------:|:--------------:|:----------------------------------------------------------------:|
167+
| search_word | string, Path || The search query string. Can be a keyword, Hugging Face URL, local directory or file path, or a Hugging Face path (`<creator>/<repo>`). |
168+
| checkpoint_format | string | `single_file` | The format of the model checkpoint. <br>● `single_file` to search for `single file checkpoint` <br>●`diffusers` to search for `multifolder diffusers format checkpoint` |
169+
| pipeline_tag | string | None | Tag to filter models by pipeline. |
170+
| download | bool | False | Whether to download the model. |
171+
| force_download | bool | False | Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. |
172+
| cache_dir | string, Path | None | Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used. |
173+
| token | string, bool | None | The token to use as HTTP bearer authorization for remote files. |
174+
| include_params | bool | False | Whether to include parameters in the returned data. |
175+
| skip_error | bool | False | Whether to skip errors and return None. |

0 commit comments

Comments
 (0)