-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[Community] MotionCtrl SVD #7005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sayakpaul I thought about moving to |
|
At the moment, I use this in
https://github.com/a-r-r-o-w/diffusers/blob/45b8a980b8513c18d796ff3bde5d9cdbec0a5d18/examples/community/pipeline_stable_video_motionctrl_diffusion.py#L161.
I think we could get rid of this change by using
`module.norm1.normalized_shape` if it is not ideal.
…On Sun, 18 Feb 2024 at 10:37, Dejia Xu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/diffusers/models/attention.py
<#7005 (comment)>
:
> @@ -434,6 +434,7 @@ def __init__(
cross_attention_dim: Optional[int] = None,
):
super().__init__()
+ self.time_mix_inner_dim = time_mix_inner_dim
Hi, is this variable by chance being used elsewhere?
—
Reply to this email directly, view it on GitHub
<#7005 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARHLFGTWXEVBBHYQTZDZN6LYUGEBZAVCNFSM6AAAAABDNV5GIKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQOBXGA4DKMBTGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
examples/community/pipeline_stable_video_motionctrl_diffusion.py
Outdated
Show resolved
Hide resolved
| cross_attention_dim: Optional[int] = None, | ||
| ): | ||
| super().__init__() | ||
| self.time_mix_inner_dim = time_mix_inner_dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DN6 I think it would really help if every module could store its init parameters as attributes. It helps with customizing models and I've experienced roadblocks in getting the correct dimensions when modding different layers for experimentation. Here's an example to demonstrate the use case:
for _, module in self.named_modules():
if isinstance(module, ResnetBlock2d):
new_layer = nn.Linear(module.in_channels, module.out_channels)
module.add_module("new_layer", new_layer)
new_forward = custom_resnetblock2d_forward.__get__(module, module.__class__)
setattr(module, "forward", new_forward)Many modelling blocks already do this, as is the case with ResnetBlock2D, but many don't such as TemporalBasicTransformerBlock. It would help if it was consistent across all modelling components. WDYT?
src/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py
Outdated
Show resolved
Hide resolved
|
@DN6 @sayakpaul Ready for another review :) |
|
Could you verify if you're using this branch for running? That is, installed diffusers using Your error hints towards quant_conv layers being used despite saying not to in the config as those weights are being expected. I haven't been able to reproduce this with my branch, but that error does come up if you're using, say, the main/pypi branch. |
|
@a-r-r-o-w |
|
Awesome, glad to know that worked! Regarding training the SVD version, since a few projection layers are the only addition for Camera Motion module, I went ahead and repurposed the stable diffusion training script last weekend. However, when actually trying to train, 24/32 GB GPUs were not enough (out of memory errors) and I lack access to better compute for testing at the moment, which has put it on hold for me. Would be awesome if you're able to create it :) The idea you mention in our email thread is very cool and lots of potential applications, hope it's a success! |
|
@a-r-r-o-w Oh, Cool |
|
This is what I used too. Only minor changes needed and copying the UNet modifications from here and freezing remaining params. Problem is I run into out-of-memory and can't verify correctness of script. I will put it in a PR some time in near future when I am able to test on A100. |
|
@a-r-r-o-w |
You can use SVD without classifier free guidance by setting both
Camera pose is a tensor of shape |
|
@a-r-r-o-w @DN6 |
|
Can you share an example of difference comparing the output of theirs vs. what we have here? I'm on a bit of a vacation and an not carrying my personal laptop but I can try debugging the difference in implementation code wise Have you made sure that same seed is used? It could also be possible that the order of operations that depend on random generator is different. |
|
I used the example images like below with same camera pose(Pan Down). Results of the repo(https://github.com/TencentARC/MotionCtrl/tree/svd) Results of the your repo(https://github.com/a-r-r-o-w/diffusers/tree/re-motionctrl) I tell you my environment
Thank you in advance |
|
@a-r-r-o-w Did you solve it? |
Hi. I'm on a bit of a vacation and am not carrying my personal laptop to test things out. Apologies for the delay... If you're able to find the mistake, please feel free to fork my branch and add changes. I should be more free in 2-3 days to figure out the problems |
|
@a-r-r-o-w |
|
@jhj7905 |
did you install diffusers from my branch? I'm guessing that could be the issue. Try:
|
You can solve it by using pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@a-r-r-o-w do we still want this? |
|
Nah, I think it's okay to close it. There are better video generation models now, and this one's only got 230 downloads all time and did not catch up to the hype. I believe Dhruv wanted it initially (there's a community issue open) but I think okay to close now. |
Hi, thanks for your work on implementation of integrating MotionCtrl into diffusers. But When I pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl, I met errors: Could you give me some advice? Thanks again! |















What does this PR do?
This PR adds support for the Stable Video Diffusion version of MotionCtrl as a community pipeline. This is the continuation of #6844 to keep the changes clean. This version of MotionCtrl only supports camera control. For more details, you can check out the linked issue below.
Fixes #6688.
Colab: https://colab.research.google.com/drive/17xIdW-xWk4hCAIkGq0OfiJYUqwWSPSAz?usp=sharing
Paper: https://arxiv.org/abs/2312.03641
Project site: https://wzhouxiff.github.io/projects/MotionCtrl/
Authors: @wzhouxiff @jiangyzy @xinntao Tianshui Chen Menghan Xia Ping Luo Ying Shan
Update: MotionCtrl was just featured on Two Minute Papers. What a time to be alive!
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@DN6 @sayakpaul