Enable DC Vae support with Pixart-Sigma pipeline #11217

frutiemax92 · 2025-04-07T00:22:56Z

This PR enables the use of the DC VAE Encoder/Decoder with the Pixart-Sigma pipeline.
https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers

This is the code I am using to test:

from transformers import T5EncoderModel
from diffusers import PixArtSigmaPipeline, PixArtTransformer2DModel, AutoencoderDC
import torch
from datetime import datetime

text_encoder = T5EncoderModel.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
    subfolder="text_encoder",
    load_in_8bit=True,
    device_map="auto",
)
pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
    text_encoder=text_encoder,
    transformer=None,
    device_map="balanced"
)

with torch.no_grad():
    prompt = """cat
                """
    neg_prompt = 'photo'
    prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt,
                                                                                                               neg_prompt)

import gc

def flush():
    gc.collect()
    torch.cuda.empty_cache()

del text_encoder
del pipe
flush()

prompt_embeds = prompt_embeds.repeat(2, 1, 1)
prompt_attention_mask = prompt_attention_mask.repeat(2, 1)

negative_embeds = negative_embeds.repeat(2, 1 ,1)
negative_prompt_attention_mask = negative_prompt_attention_mask.repeat(2, 1)

config = PixArtTransformer2DModel.load_config('PixArt-alpha/PixArt-Sigma-XL-2-1024-MS', subfolder='transformer')
config['in_channels'] = 512
config['out_channels'] = 1024
transformer = PixArtTransformer2DModel.from_config(config)
vae = AutoencoderDC.from_pretrained('mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers').to(torch.bfloat16)
pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
    text_encoder=None,
    transformer=transformer.to(torch.bfloat16),
    torch_dtype=torch.bfloat16,
    vae=vae,
).to("cuda")

del pipe.transformer
flush()

with torch.no_grad():
    image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save(f"cat.png")

This code generates noise as this is basically inference with an untrained model. The DC Encoder/Decoder has some different configuration naming conventions, so I had to adjust it and the shape of the latent generated were not the same as the original one, so I chose to clip the exceeding "pixels".

HuggingFaceDocBuilderDev · 2025-04-07T07:07:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

Hi @frutiemax92. What's your intended use case here? As you mentioned, PixArt is not compatible with this VAE. Typically support for something in pipelines/modeling code comes after there's a model to use with it.

Add support for DC encoder and fix latent shape errors

405225c

frutiemax92 force-pushed the feature_pixartsigma_dcencoder branch from 47649f3 to 405225c Compare April 7, 2025 00:31

hlky reviewed Apr 7, 2025

View reviewed changes

frutiemax92 closed this Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable DC Vae support with Pixart-Sigma pipeline #11217

Enable DC Vae support with Pixart-Sigma pipeline #11217

Uh oh!

frutiemax92 commented Apr 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 7, 2025

Uh oh!

hlky left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Enable DC Vae support with Pixart-Sigma pipeline #11217

Enable DC Vae support with Pixart-Sigma pipeline #11217

Uh oh!

Conversation

frutiemax92 commented Apr 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 7, 2025

Uh oh!

hlky left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants