-
Couldn't load subscription status.
- Fork 6.5k
pixart sigma: callbacks(interrupt, latent, pos/neg embeds) and cfg_rescale #8661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Had to change some things around vs how the original file was, in order to get the callbacks to work correctly. I tried to base a lot of the layout from newer diffusers like SD3,
based on (https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
I think this PR should focus on adding the new callback and cfg_rescale, I think we should not include these changes introduced for encode_prompt in this PR
additionally, can you test out the dynamic classifier-free guidance on pixart using the new callback API? https://huggingface.co/docs/diffusers/using-diffusers/callback#dynamic-classifier-free-guidance
I'll try to take a look at it as well, but after I figure out how to handle the old callback/callbacksteps deprecation. |
The lecay callback will require (self, step_idx, t, latents), but has 1:1 parity with the newer callback_on_step_end method. I also included a deprecation warning and an error if both are used at the same time.
|
@yiyixuxu Alright, I reworked the legacy callback/callback_steps back in. I tested both in my app and was able to get latent callbacks for previews and For struckout stuff, read my comment below. I reverted the old callback to callback(steps_idx, t, latents) again. |
|
@yiyixuxu @sayakpaul I reverted the negative_prompt changes and fully updated my original post to be more clear about the changes. |
Since the original implementation doesn't appear to have the ability to interrupt, I'm just going to roll this back. If people want to interrupt, they need to use the newer method anyways, since the older callback method is deprecating. The legacy callback will still provide step_idx, t and latents, like before.
|
Since the legacy implementation of callback doesn't appear to have the ability to interrupt, I'm just going to roll this back. If people want to interrupt, they should be using the newer method anyways, since the older callback method is being deprecated. The legacy callback will still function the same as before |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@RandomGitUser321 apologies for the delay on our end. But would love to come to the PR. What is blocking for this PR currently? How can we help? |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR refactors the pipeline to mirror other common pipelines by adding in
callback_on_step_endandcallback_on_step_end_tensor_inputs, along with cfg rescaling. In the callbacks, you can interrupt, retrieve latents and/or retrieve pos/neg embeds. The older callback method will continue to provide steps_idx, t and latents, but that's it.I've added in deprecation warnings for those still using the legacy
callbackandcallback_stepsmethod, instead of the newercallback_on_step_endandcallback_on_step_end_tensor_inputsmethod, as well added in an error for if you tried to use both at the same time.To sum it up, this adds:
callback_on_step_endandcallback_on_step_end_tensor_inputs, which allow you to obtain latents and pos/neg embedscallbackandcallback_stepsmethodsself._interrupt=Trueoncallback_on_step_endSome snippets of the code in my app that I tested it with to verify that it works:
and
A test image showing the latent callbacks working at each step(can also be used to generate realtime previews in apps like shown in my update 4 comment):

Example of the interrupt callback working while using my app:

Example of cfg rescale working(gif compression degrades quality a lot):

Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.