Skip to content

Conversation

@a-r-r-o-w
Copy link
Contributor

@a-r-r-o-w a-r-r-o-w commented Nov 30, 2024

Based on the code from #9708, this is a modified implementation of DCAE to follow some of Diffusers convention.

I believe this should be good for an initial review @yiyixuxu. We can do two things:

  • merge this PR into @lawrence-cj's branch and continue review there (would be able to benefit from the additional reviews already provided there)
  • continue the integration in this PR if it is easier to address

I think either should be okay but if we are okay with continuing in this one, @lawrence-cj could you please let me know all the co-authors I need to add to this PR 🤗 So far, only you and @chenjy2003 are added (because I forked off your PR branch).

I believe this version matches the original Sana VAE checkpoint completely. I am yet to verify the correctness of all the other variants, so I'll share the unit tests after completing this testing.

To run the conversion, I use:

python3 scripts/convert_dcae_to_diffusers.py --vae_ckpt_path /raid/aryan/dc-ae-sana/model.safetensors --output_path /raid/aryan/sana-vae-diffusers

Here is some inference code for testing:

code
import numpy as np
import torch
from diffusers import AutoencoderDC
from diffusers.utils import load_image
from PIL import Image

@torch.no_grad()
def main():
    ae = AutoencoderDC.from_pretrained("/raid/aryan/sana-vae-diffusers/")
    ae = ae.to("cuda")

    image = load_image("inputs/astronaut.jpg").resize((512, 512))
    image = np.array(image)
    image = torch.from_numpy(image)
    image = image / 127.5 - 1.0
    image = image.unsqueeze(0).permute(0, 3, 1, 2).to("cuda")

    encoded = ae.encode(image)
    print("encoded:", encoded.shape)

    decoded = ae.decode(encoded)
    print("decoded:", decoded.shape)

    output = decoded[0].permute(1, 2, 0)
    output = (output + 1) / 2.0 * 255.0
    output = output.clamp(0.0, 255.0)
    output = output.detach().cpu().numpy().astype(np.uint8)
    output = Image.fromarray(output)
    output.save("output.png")

    original_encoded = torch.load("original_dcae_encoded.pt", weights_only=True)
    original_decoded = torch.load("original_dcae_decoded.pt", weights_only=True)
    encoded_diff = encoded - original_encoded
    decoded_diff = decoded - original_decoded
    print(encoded_diff.abs().max(), encoded_diff.abs().sum())
    print(decoded_diff.abs().max(), decoded_diff.abs().sum())

main()
Original image Reconstruction

I think it is okay to skip the diffusers-side VAE tests for now, and pick it up in a follow up PR after #9808 is merged. Will add the documentation after verifying all checkpoints work as expected and finalizing the diffusers implementation following reviews.

cc: @lawrence-cj @chenjy2003

@a-r-r-o-w
Copy link
Contributor Author

I've moved LiteMLA to the same file as VAE btw for the time being. If it is used in the transformer implementation as well, we could consider placing it in attention.py @lawrence-cj

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu November 30, 2024 21:37
@chenjy2003
Copy link
Contributor

@a-r-r-o-w Thanks for your work! I'm wondering whether @lawrence-cj and I can add commits to this PR. I see some minor issues such as vae -> ae.

Also, according to @yiyixuxu 's previous comments, I think get_norm_layer and get_block_from_block_type might need to be removed.

@a-r-r-o-w
Copy link
Contributor Author

a-r-r-o-w commented Dec 1, 2024

I think those functions should be okay for now since we've made it minimal, but we can wait for another review. I think it's okay because otherwise there would be some copied if-else logic that would make the code a bit more bloated.

Regarding your team being able to push to this PR, I'm not sure if github would allow that without us adding some permissions in HF org. Instead I will try resolving conflicts with your original branch and push it there in some time, does that work?

@lawrence-cj
Copy link
Contributor

Instead I will try resolving conflicts with your original branch and push it there in some time, does that work?

Sounds great. Let's do it! @a-r-r-o-w

@a-r-r-o-w
Copy link
Contributor Author

Closing in favor of original PR where these changes have now been merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants