Deep Compression Autoencoder (refactored) #10064

a-r-r-o-w · 2024-11-30T21:24:44Z

Based on the code from #9708, this is a modified implementation of DCAE to follow some of Diffusers convention.

I believe this should be good for an initial review @yiyixuxu. We can do two things:

merge this PR into @lawrence-cj's branch and continue review there (would be able to benefit from the additional reviews already provided there)
continue the integration in this PR if it is easier to address

I think either should be okay but if we are okay with continuing in this one, @lawrence-cj could you please let me know all the co-authors I need to add to this PR 🤗 So far, only you and @chenjy2003 are added (because I forked off your PR branch).

I believe this version matches the original Sana VAE checkpoint completely. I am yet to verify the correctness of all the other variants, so I'll share the unit tests after completing this testing.

To run the conversion, I use:

python3 scripts/convert_dcae_to_diffusers.py --vae_ckpt_path /raid/aryan/dc-ae-sana/model.safetensors --output_path /raid/aryan/sana-vae-diffusers

Here is some inference code for testing:

code

import numpy as np
import torch
from diffusers import AutoencoderDC
from diffusers.utils import load_image
from PIL import Image

@torch.no_grad()
def main():
    ae = AutoencoderDC.from_pretrained("/raid/aryan/sana-vae-diffusers/")
    ae = ae.to("cuda")

    image = load_image("inputs/astronaut.jpg").resize((512, 512))
    image = np.array(image)
    image = torch.from_numpy(image)
    image = image / 127.5 - 1.0
    image = image.unsqueeze(0).permute(0, 3, 1, 2).to("cuda")

    encoded = ae.encode(image)
    print("encoded:", encoded.shape)

    decoded = ae.decode(encoded)
    print("decoded:", decoded.shape)

    output = decoded[0].permute(1, 2, 0)
    output = (output + 1) / 2.0 * 255.0
    output = output.clamp(0.0, 255.0)
    output = output.detach().cpu().numpy().astype(np.uint8)
    output = Image.fromarray(output)
    output.save("output.png")

    original_encoded = torch.load("original_dcae_encoded.pt", weights_only=True)
    original_decoded = torch.load("original_dcae_decoded.pt", weights_only=True)
    encoded_diff = encoded - original_encoded
    decoded_diff = decoded - original_decoded
    print(encoded_diff.abs().max(), encoded_diff.abs().sum())
    print(decoded_diff.abs().max(), decoded_diff.abs().sum())

main()

Original image	Reconstruction

I think it is okay to skip the diffusers-side VAE tests for now, and pick it up in a follow up PR after #9808 is merged. Will add the documentation after verifying all checkpoints work as expected and finalizing the diffusers implementation following reviews.

cc: @lawrence-cj @chenjy2003

# Conflicts: # src/diffusers/models/normalization.py

Co-authored-by: YiYi Xu <[email protected]>

a-r-r-o-w · 2024-11-30T21:27:16Z

I've moved LiteMLA to the same file as VAE btw for the time being. If it is used in the transformer implementation as well, we could consider placing it in attention.py @lawrence-cj

HuggingFaceDocBuilderDev · 2024-11-30T21:34:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chenjy2003 · 2024-12-01T02:23:48Z

@a-r-r-o-w Thanks for your work! I'm wondering whether @lawrence-cj and I can add commits to this PR. I see some minor issues such as vae -> ae.

Also, according to @yiyixuxu 's previous comments, I think get_norm_layer and get_block_from_block_type might need to be removed.

a-r-r-o-w · 2024-12-01T04:29:42Z

I think those functions should be okay for now since we've made it minimal, but we can wait for another review. I think it's okay because otherwise there would be some copied if-else logic that would make the code a bit more bloated.

Regarding your team being able to push to this PR, I'm not sure if github would allow that without us adding some permissions in HF org. Instead I will try resolving conflicts with your original branch and push it there in some time, does that work?

lawrence-cj · 2024-12-01T08:22:56Z

Instead I will try resolving conflicts with your original branch and push it there in some time, does that work?

Sounds great. Let's do it! @a-r-r-o-w

a-r-r-o-w · 2024-12-01T11:45:05Z

Closing in favor of original PR where these changes have now been merged

lawrence-cj and others added 30 commits October 18, 2024 17:40

first add a script for DC-AE;

6e616a9

Merge remote-tracking branch 'upstream/main' into DC-AE

d2e187a

DC-AE init

90e8939

replace triton with custom implementation

825c975

1. rename file and remove un-used codes;

3a44fa4

no longer rely on omegaconf and dataclass

55b2615

merge

6fb7fdb

Merge remote-tracking branch 'upstream/main' into DC-AE

c323e76

replace custom activation with diffuers activation

da7caa5

remove dc_ae attention in attention_processor.py

fb6d92a

iinherit from ModelMixin

5e63a1a

inherit from ConfigMixin

72cce2b

dc-ae reduce to one file

8f9b4e4

Merge remote-tracking branch 'upstream/main' into DC-AE

b7f68f9

Merge branch 'huggingface:main' into DC-AE

6d96b95

Merge remote-tracking branch 'refs/remotes/origin/main' into DC-AE

3c3cc51

# Conflicts: # src/diffusers/models/normalization.py

update downsample and upsample

1448681

merge

bf40fe8

clean code

dd7718a

support DecoderOutput

19986a5

Merge branch 'main' into DC-AE

3481e23

Merge branch 'main' into DC-AE

0e818df

remove get_same_padding and val2tuple

c6eb233

remove autocast and some assert

59de0a3

update ResBlock

ea604a4

remove contents within super().__init__

80dce02

Update src/diffusers/models/autoencoders/dc_ae.py

1752afd

Co-authored-by: YiYi Xu <[email protected]>

remove opsequential

883bcf4

Merge branch 'DC-AE' of github.com:lawrence-cj/diffusers into DC-AE

25ae389

update other blocks to support the removal of build_norm

96e844b

chenjy2003 and others added 19 commits November 26, 2024 04:36

remove build_stage_main

2f6bbad

Merge branch 'main' into DC-AE

4495783

change file name to autoencoder_dc

4d3c026

Merge branch 'DC-AE' of github.com:lawrence-cj/diffusers into DC-AE

e007057

move LiteMLA to attention.py

d3d9c84

update

5ed50e9

quick push before dgx disappears again

c1c02a2

update

1f8a3b3

make style

7b9d7e5

update

bf6c211

update

a2ec5f8

fix

f5876c5

refactor

44034a6

refactor

6379241

refactor

77571a8

update

c4d0867

possibly change to nn.Linear

0bdb7ef

refactor

54e933b

Merge branch 'main' into aryan-dcae

babc9f5

make fix-copies

3d5faaf

a-r-r-o-w requested a review from yiyixuxu November 30, 2024 21:37

a-r-r-o-w closed this Dec 1, 2024

a-r-r-o-w deleted the aryan-dcae branch December 1, 2024 11:45

a-r-r-o-w mentioned this pull request Dec 1, 2024

[DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); #9708

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Deep Compression Autoencoder (refactored) #10064

Deep Compression Autoencoder (refactored) #10064

Uh oh!

a-r-r-o-w commented Nov 30, 2024 •

edited

Loading

Uh oh!

a-r-r-o-w commented Nov 30, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Nov 30, 2024

Uh oh!

chenjy2003 commented Dec 1, 2024

Uh oh!

a-r-r-o-w commented Dec 1, 2024 •

edited

Loading

Uh oh!

lawrence-cj commented Dec 1, 2024

Uh oh!

a-r-r-o-w commented Dec 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Deep Compression Autoencoder (refactored) #10064

Deep Compression Autoencoder (refactored) #10064

Uh oh!

Conversation

a-r-r-o-w commented Nov 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Nov 30, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Nov 30, 2024

Uh oh!

chenjy2003 commented Dec 1, 2024

Uh oh!

a-r-r-o-w commented Dec 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lawrence-cj commented Dec 1, 2024

Uh oh!

a-r-r-o-w commented Dec 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

a-r-r-o-w commented Nov 30, 2024 •

edited

Loading

a-r-r-o-w commented Dec 1, 2024 •

edited

Loading