| 
 | 1 | +<!-- Copyright 2024 The HuggingFace Team. All rights reserved.  | 
 | 2 | +
  | 
 | 3 | +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with  | 
 | 4 | +the License. You may obtain a copy of the License at  | 
 | 5 | +
  | 
 | 6 | +http://www.apache.org/licenses/LICENSE-2.0  | 
 | 7 | +
  | 
 | 8 | +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on  | 
 | 9 | +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the  | 
 | 10 | +specific language governing permissions and limitations under the License. -->  | 
 | 11 | + | 
 | 12 | +# AutoencoderDC  | 
 | 13 | + | 
 | 14 | +The 2D Autoencoder model used in [SANA](https://huggingface.co/papers/2410.10629) and introduced in [DCAE](https://huggingface.co/papers/2410.10733) by authors Junyu Chen\*, Han Cai\*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han from MIT HAN Lab.  | 
 | 15 | + | 
 | 16 | +The abstract from the paper is:  | 
 | 17 | + | 
 | 18 | +*We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at [this https URL](https://github.com/mit-han-lab/efficientvit).*  | 
 | 19 | + | 
 | 20 | +The following DCAE models are released and supported in Diffusers.  | 
 | 21 | + | 
 | 22 | +| Diffusers format | Original format |  | 
 | 23 | +|:----------------:|:---------------:|  | 
 | 24 | +| [`mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-sana-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0)  | 
 | 25 | +| [`mit-han-lab/dc-ae-f32c32-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0)  | 
 | 26 | +| [`mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0)  | 
 | 27 | +| [`mit-han-lab/dc-ae-f64c128-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0)  | 
 | 28 | +| [`mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0)  | 
 | 29 | +| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)  | 
 | 30 | +| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)  | 
 | 31 | + | 
 | 32 | +Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].  | 
 | 33 | + | 
 | 34 | +```python  | 
 | 35 | +from diffusers import AutoencoderDC  | 
 | 36 | + | 
 | 37 | +ae = AutoencoderDC.from_pretrained("mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers", torch_dtype=torch.float32).to("cuda")  | 
 | 38 | +```  | 
 | 39 | + | 
 | 40 | +## AutoencoderDC  | 
 | 41 | + | 
 | 42 | +[[autodoc]] AutoencoderDC  | 
 | 43 | +  - encode  | 
 | 44 | +  - decode  | 
 | 45 | +  - all  | 
 | 46 | + | 
 | 47 | +## DecoderOutput  | 
 | 48 | + | 
 | 49 | +[[autodoc]] models.autoencoders.vae.DecoderOutput  | 
 | 50 | + | 
0 commit comments