Skip to content

Conversation

@animemory
Copy link

@animemory animemory commented Dec 2, 2024

What does this PR do?

This PR did the following things:

  1. Created AniMemoryPipeline in src/diffusers/pipelines/animemory/
  2. Created EulerAncestralDiscreteXPredScheduler in src/diffusers/schedulers/
  3. Uploaded the safetensors model to our huggingface: animEEEmpire/AniMemory-alpha
  4. Tested the pipeline and the outputs are as expected.

Usage:

import torch
from diffusers import AniMemoryPipeline
pipe = AniMemoryPipeline.from_pretrained(
    "animEEEmpire/AniMemory-alpha",
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

prompt = '一只凶恶的狼,猩红的眼神,在午夜咆哮,月光皎洁'
negative_prompt = 'nsfw, worst quality, low quality, normal quality, low resolution, monochrome, blurry, wrong, Mutated hands and fingers, text, ugly faces, twisted, jpeg artifacts, watermark, low contrast, realistic'
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=40,
    height=1024, width=1024,
    guidance_scale=7.0
).images[0]
image.save("output.png")

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

We will update doc soon.
Thank you so much! I'll be there and help with everything.

@yiyixuxu @asomoza

Copy link
Contributor

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just highlighting differences compared to scheduling_euler_ancestral_discrete for reference in the upcoming scheduling refactor.

@animemory
Copy link
Author

Just highlighting differences compared to scheduling_euler_ancestral_discrete for reference in the upcoming scheduling refactor.

Thanks for the comparison! I have finished scheduler refactor and tested the output is the same as before the modification.
Now, this PR is ready for review!

@animemory animemory marked this pull request as ready for review December 3, 2024 09:38
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @animemory 🤗

The comparison was just for our reference as we're planning to refactor scheduling, thanks for making those changes though!

It would be great to see some example outputs etc. in this PR and you can add more information in the docs. See the files under docs in this PR as an example. cc @stevhliu for docs

from transformers.models.t5.modeling_t5 import T5Stack


class AniMemoryT5(torch.nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would AniMemoryT5 and AniMemoryAltCLip be better added to transformers directly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two model are mostly similar to the original T5 and AltCLip, which just features some little arch modifications and tokenizer replacement. It's very unique for this model design for bilingual alignments.

@bghira
Copy link
Contributor

bghira commented Dec 3, 2024

why can't we roll the scheduler changes into the existing EulerAncestralDiscreteScheduler? it's already got v-prediction and x-prediction, but is there something wrong with that implementation? maybe it can be fixed.

self.model_hf.gradient_checkpointing_enable()

def forward(self, text, attention_mask):
hidden_states = self.model_hf.text_model.embeddings(input_ids=text, position_ids=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to pass them in? this will prevent eg. precaching the inputs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi which one parameter do u mean? The caching function is just for training and should be easily adapted I suppose.

@hlky
Copy link
Contributor

hlky commented Dec 3, 2024

@bghira Some of the changes are currently unique so would need a branch if merged into EulerAncestralDiscreteScheduler. It will be covered later in the planned scheduling refactor so it's ok for now. Priority should be the rest of the implementation for this new model.

@stevhliu
Copy link
Member

stevhliu commented Dec 3, 2024

Hi, thanks for the contribution! Feel free to let me know if you need any help with the docs 🙂

@animemory
Copy link
Author

Hi I did the following things:

  1. added two docs in docs/source/en/api:
  • docs/source/en/api/pipelines/animemory.md
  • docs/source/en/api/schedulers/euler_ancestral_x_pred.md
  1. modified the docs/source/en/_toctree.yml file.

some example outputs and more details can be found in the docs.

please review and comment, thx!

@hlky @stevhliu

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding, docs look good to me 🤗

Comment on lines +24 to +25
### Usage
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Usage
```
## Usage
```py

images.save("output.png")
```

Use pipe.enable_sequential_cpu_offload() to offload the model into CPU for less GPU memory cost (about 14.25 G, compared to 25.67 G if CPU offload is not enabled), but the inference time will increase significantly(5.18s v.s. 17.74s on A100 40G).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use pipe.enable_sequential_cpu_offload() to offload the model into CPU for less GPU memory cost (about 14.25 G, compared to 25.67 G if CPU offload is not enabled), but the inference time will increase significantly(5.18s v.s. 17.74s on A100 40G).
Use [`~DiffusionPipeline.enable_sequential_cpu_offload`] to offload the model into CPU to reduce GPU memory cost, about 14.25GB compared to 25.67GB if CPU offload is not enabled). However, the inference time will increase significantly from 5.18s vs 17.74s on A100 40GB.


# EulerAncestralDiscreteXPredScheduler

An improved scheduler(SingDiffusion) that addresses the sampling challenge at the initial singular time step. To know more about SingDiffusion, check out the original [blog post](https://pangzecheung.github.io/SingDiffusion/). Our original paper can be found [here](https://arxiv.org/abs/2403.08381).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An improved scheduler(SingDiffusion) that addresses the sampling challenge at the initial singular time step. To know more about SingDiffusion, check out the original [blog post](https://pangzecheung.github.io/SingDiffusion/). Our original paper can be found [here](https://arxiv.org/abs/2403.08381).
An improved scheduler (SingDiffusion) that addresses the sampling challenge at the initial singular time step. To learn more about SingDiffusion, check out the original [blog post](https://pangzecheung.github.io/SingDiffusion/). Our original paper can be found [here](https://arxiv.org/abs/2403.08381).

@hlky
Copy link
Contributor

hlky commented Dec 5, 2024

Hi @animemory! We think this model would be a great new example for custom components which we can use to rework the documentation. The benefit of custom components for model authors is friction-free day-1 support for Diffusers. While I'm testing this with custom components would you mind taking a quick look at the current documentation and providing some feedback? For example, is the process clear, is anything missing, etc.

@hlky
Copy link
Contributor

hlky commented Dec 5, 2024

@animemory I've created a PR on the Hub to add remote code.

@animemory
Copy link
Author

Hi @animemory! We think this model would be a great new example for custom components which we can use to rework the documentation. The benefit of custom components for model authors is friction-free day-1 support for Diffusers. While I'm testing this with custom components would you mind taking a quick look at the current documentation and providing some feedback? For example, is the process clear, is anything missing, etc.

yes, i think the process is clear. It'd be helpful to add or link some tips on how to upload code and checkpoints to the pipeline repo, especially for the beginners.

@hlky
Copy link
Contributor

hlky commented Dec 6, 2024

Thanks for the feedback @animemory and thank you for working with us to use custom components for this model. We can leave this PR open for now as we can revisit integration at some point.

As a note for anyone else viewing, we can now use this model/pipeline in Diffusers with remote code:

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("animEEEmpire/AniMemory-alpha", trust_remote_code=True, revision="fad02d2", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "一只凶恶的狼,猩红的眼神,在午夜咆哮,月光皎洁"
negative_prompt = "nsfw, worst quality, low quality, normal quality, low resolution, monochrome, blurry, wrong, Mutated hands and fingers, text, ugly faces, twisted, jpeg artifacts, watermark, low contrast, realistic"

images = pipe(prompt=prompt,
              negative_prompt=negative_prompt,
              num_inference_steps=40,
              height=1024, width=1024,
              guidance_scale=7,
              )[0]
images.save("output.png")

@bghira
Copy link
Contributor

bghira commented Dec 6, 2024

i think any remote code examples should provide a pinned revision

@hlky
Copy link
Contributor

hlky commented Dec 6, 2024

Thanks @bghira, I've added a revision to the example here.

@a-r-r-o-w
Copy link
Contributor

@hlky Is this good to merge?

@a-r-r-o-w a-r-r-o-w requested a review from hlky December 11, 2024 23:05
@hlky
Copy link
Contributor

hlky commented Dec 11, 2024

@a-r-r-o-w not yet, I think we will revisit when the model gains more traction, it's been supported with remote code for now, see modeling_movq.py and modeling_text_encoder.py for why

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants