[Community] MotionCtrl SVD #7005

a-r-r-o-w · 2024-02-18T00:16:34Z

What does this PR do?

This PR adds support for the Stable Video Diffusion version of MotionCtrl as a community pipeline. This is the continuation of #6844 to keep the changes clean. This version of MotionCtrl only supports camera control. For more details, you can check out the linked issue below.

Fixes #6688.

Colab: https://colab.research.google.com/drive/17xIdW-xWk4hCAIkGq0OfiJYUqwWSPSAz?usp=sharing
Paper: https://arxiv.org/abs/2312.03641
Project site: https://wzhouxiff.github.io/projects/MotionCtrl/
Authors: @wzhouxiff @jiangyzy @xinntao Tianshui Chen Menghan Xia Ping Luo Ying Shan

Update: MotionCtrl was just featured on Two Minute Papers. What a time to be alive!

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@DN6 @sayakpaul

a-r-r-o-w · 2024-02-18T00:24:47Z

Results (manually downscaled):

I've also taken the liberty to add linear interpolation between the camera projection embeddings and original hidden states in SVD. Apart from that, I believe the implementation is faithful to the original implementation. From the results, it can be seen that MotionCtrl with SVD, currently, allows panning, zooming, and other complex camera motions. With the lerp, we can have some more object motion (see results below; 0 is essentially normal SVD because it is not making use of camera proj embeds, with the difference being the attn2 layer weights of TemporalBasicTransformerBlock from original SVD as they were trained too). If you want a more panning/zooming effect, set motionctrl_scale higher using pipe.unet.set_motionctrl_scale(0.8). Increasing camera_speed allows for faster panning/zooming.

motionctrl_scale=0	motionctrl_scale=0.2	motionctrl_scale=0.4

motionctrl_scale=0.6	motionctrl_scale=0.8	motionctrl_scale=1.0

a-r-r-o-w · 2024-02-18T00:56:47Z

@sayakpaul I thought about moving to research_projects but the amount of code duplication was insane (2000+ lines), so I feel it is better here with the hacky unet. I'd like to know your thoughts on the use_legacy flag (which enables/disables the quant_conv layer because some of the new checkpoints like dragnuwa/motionctrl do not use it), and whether the from examples.community.pipeline_stable_video_motionctrl_diffusion import UNetSpatioTemporalConditionMotionCtrlModel part is okay (because you can only use this unet if you clone the repo, and it is not possible if you're trying to directly use after pip install diffusers)

a-r-r-o-w · 2024-02-18T09:53:19Z

At the moment, I use this in https://github.com/a-r-r-o-w/diffusers/blob/45b8a980b8513c18d796ff3bde5d9cdbec0a5d18/examples/community/pipeline_stable_video_motionctrl_diffusion.py#L161. I think we could get rid of this change by using `module.norm1.normalized_shape` if it is not ideal.

…

On Sun, 18 Feb 2024 at 10:37, Dejia Xu ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/diffusers/models/attention.py <#7005 (comment)> : > @@ -434,6 +434,7 @@ def __init__( cross_attention_dim: Optional[int] = None, ): super().__init__() + self.time_mix_inner_dim = time_mix_inner_dim Hi, is this variable by chance being used elsewhere? — Reply to this email directly, view it on GitHub <#7005 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARHLFGTWXEVBBHYQTZDZN6LYUGEBZAVCNFSM6AAAAABDNV5GIKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQOBXGA4DKMBTGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

examples/community/README.md

examples/community/pipeline_stable_video_motionctrl_diffusion.py

a-r-r-o-w · 2024-02-28T06:15:57Z

src/diffusers/models/attention.py

        cross_attention_dim: Optional[int] = None,
    ):
        super().__init__()
+        self.time_mix_inner_dim = time_mix_inner_dim


@DN6 I think it would really help if every module could store its init parameters as attributes. It helps with customizing models and I've experienced roadblocks in getting the correct dimensions when modding different layers for experimentation. Here's an example to demonstrate the use case:

for _, module in self.named_modules(): if isinstance(module, ResnetBlock2d): new_layer = nn.Linear(module.in_channels, module.out_channels) module.add_module("new_layer", new_layer) new_forward = custom_resnetblock2d_forward.__get__(module, module.__class__) setattr(module, "forward", new_forward)

Many modelling blocks already do this, as is the case with ResnetBlock2D, but many don't such as TemporalBasicTransformerBlock. It would help if it was consistent across all modelling components. WDYT?

src/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py

a-r-r-o-w · 2024-03-21T08:33:47Z

@DN6 @sayakpaul Ready for another review :)

a-r-r-o-w · 2024-03-26T06:58:26Z

Could you verify if you're using this branch for running? That is, installed diffusers using pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl or similar. It is required because there are changes made to the autoencoder file here.

Your error hints towards quant_conv layers being used despite saying not to in the config as those weights are being expected. I haven't been able to reproduce this with my branch, but that error does come up if you're using, say, the main/pypi branch.

jhj7905 · 2024-03-26T07:06:05Z

@a-r-r-o-w
Thank you for replying it.
installed diffusers using pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl -> solve the problem..
now i implement the training code with your code

a-r-r-o-w · 2024-03-26T07:24:36Z

Awesome, glad to know that worked!

Regarding training the SVD version, since a few projection layers are the only addition for Camera Motion module, I went ahead and repurposed the stable diffusion training script last weekend. However, when actually trying to train, 24/32 GB GPUs were not enough (out of memory errors) and I lack access to better compute for testing at the moment, which has put it on hold for me. Would be awesome if you're able to create it :) The idea you mention in our email thread is very cool and lots of potential applications, hope it's a success!

jhj7905 · 2024-03-26T08:49:08Z

@a-r-r-o-w Oh, Cool
How about sharing your training code?
I checked the SVD training code with this repo(https://github.com/pixeli99/SVD_Xtend/tree/main)
and implemented it

a-r-r-o-w · 2024-03-26T20:35:10Z

This is what I used too. Only minor changes needed and copying the UNet modifications from here and freezing remaining params. Problem is I run into out-of-memory and can't verify correctness of script. I will put it in a PR some time in near future when I am able to test on A100.

jhj7905 · 2024-03-28T11:25:53Z

@a-r-r-o-w
Hello, I have implemented training code with this repo(https://github.com/pixeli99/SVD_Xtend/tree/main)
I have questions..
First, There is 'do_classifier_free_guidance' to in StableVideoMotionCtrlDiffusionPipeline Class
is it necessary in train code?
Second, camera_pose = camera_pose.repeat(2, 1, 1) is it right?

a-r-r-o-w · 2024-03-28T12:13:11Z

@jhj7905

First, There is 'do_classifier_free_guidance' to in StableVideoMotionCtrlDiffusionPipeline Class
is it necessary in train code?

You can use SVD without classifier free guidance by setting both max_guidance_scale to a value <= 1. You are correct, it should not be necessary in training (or inference).

Second, camera_pose = camera_pose.repeat(2, 1, 1) is it right?

Camera pose is a tensor of shape (num_frames, 3, 4). We repeat the num_frames dimension because we have to apply it for both unconditional and conditional latents. This will always be the case when max_guidance_scale > 1. But I do see a mistake from my side here, and that is the repeat should only happen when do_classifier_free_guidance is True because otherwise there will not be any unconditional latent. I will fix this soon. Is this causing problems in training when classifier free guidance is enabled?

jhj7905 · 2024-04-03T11:44:37Z

@a-r-r-o-w @DN6
Hello, I finished implementing the training code.
But the result was quite weird. So I started to debug the code...
On my debug process, At first, I found out that the output(video) was different when i ran the code repo(https://github.com/TencentARC/MotionCtrl/tree/svd, https://huggingface.co/TencentARC/MotionCtrl/tree/main) and repo(https://github.com/a-r-r-o-w/diffusers/tree/re-motionctrl, https://huggingface.co/a-r-r-o-w/motionctrl-svd/tree/main) with the same image....

a-r-r-o-w · 2024-04-03T11:54:16Z

Can you share an example of difference comparing the output of theirs vs. what we have here? I'm on a bit of a vacation and an not carrying my personal laptop but I can try debugging the difference in implementation code wise

Have you made sure that same seed is used? It could also be possible that the order of operations that depend on random generator is different.

jhj7905 · 2024-04-04T01:23:46Z

@a-r-r-o-w

I used the example images like below with same camera pose(Pan Down).

Results of the repo(https://github.com/TencentARC/MotionCtrl/tree/svd)
https://github.com/huggingface/diffusers/assets/21155309/5a6fcc72-ea04-481a-a69c-f6611b31b896
https://github.com/huggingface/diffusers/assets/21155309/44e60d7a-3d73-4d67-a56b-e5c11a2a3919

Results of the your repo(https://github.com/a-r-r-o-w/diffusers/tree/re-motionctrl)

I tell you my environment

git clone -b https://github.com/a-r-r-o-w/diffusers.git
pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl
Run the inference code(https://github.com/a-r-r-o-w/diffusers/tree/re-motionctrl/examples/research_projects/motionctrl_svd)

Thank you in advance

jhj7905 · 2024-04-08T01:42:49Z

@a-r-r-o-w Did you solve it?
Still, I have debugged it

a-r-r-o-w · 2024-04-08T05:38:55Z

@a-r-r-o-w Did you solve it? Still, I have debugged it

Hi. I'm on a bit of a vacation and am not carrying my personal laptop to test things out. Apologies for the delay... If you're able to find the mistake, please feel free to fork my branch and add changes. I should be more free in 2-3 days to figure out the problems

jhj7905 · 2024-04-08T06:03:24Z

@a-r-r-o-w
Oh. I see...
Thank you for replying it...
Have a good vacation!!
Okay, If i find out the mistake then fork the branch!

T0L0ve · 2024-04-10T04:01:56Z

@jhj7905
when I run the inference code I get an AttributeError: 'TemporalBasicTransformerBlock' object has no attribute 'time_mix_inner_dim', do you know how to solve it

a-r-r-o-w · 2024-04-10T04:45:59Z

@jhj7905 when I run the inference code I get an AttributeError: 'TemporalBasicTransformerBlock' object has no attribute 'time_mix_inner_dim', do you know how to solve it

did you install diffusers from my branch? I'm guessing that could be the issue. Try:

pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl

jhj7905 · 2024-04-11T00:52:45Z

@jhj7905 when I run the inference code I get an AttributeError: 'TemporalBasicTransformerBlock' object has no attribute 'time_mix_inner_dim', do you know how to solve it

You can solve it by using pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl

a-r-r-o-w · 2024-04-23T07:04:26Z

@DN6 @asomoza @jhj7905 Requesting a review. This is hopefully ready to merge I think.

github-actions · 2024-09-14T15:18:50Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu · 2024-09-17T21:54:37Z

@a-r-r-o-w do we still want this?

a-r-r-o-w · 2024-09-17T21:57:51Z

Nah, I think it's okay to close it. There are better video generation models now, and this one's only got 230 downloads all time and did not catch up to the hype. I believe Dhruv wanted it initially (there's a community issue open) but I think okay to close now.

2hiTee · 2025-04-21T14:55:51Z

Nah, I think it's okay to close it. There are better video generation models now, and this one's only got 230 downloads all time and did not catch up to the hype. I believe Dhruv wanted it initially (there's a community issue open) but I think okay to close now.

Hi, thanks for your work on implementation of integrating MotionCtrl into diffusers. But When I pip install git+https://github.com/a-r-r-o-w/diffusers@re-motionctrl, I met errors:

Could you give me some advice? Thanks again!

a-r-r-o-w added 12 commits February 18, 2024 02:12

set time_mix_inner_dim as attribute in temporal basic transformer block

53ece79

add motionctrl-svd to community examples

05d2982

support use_legacy flag

c04abb2

update

5d35c5a

make style

2103923

update

6c7f4d7

update

86055ce

update

c213daf

debug

a813f83

more debug

b26b442

remove debug

3616b69

add motionctrl scale

1fa1e66

update README

330dd16

add motionctrl svd conversion script

45b8a98

a-r-r-o-w added 2 commits February 18, 2024 22:52

Merge branch 'main' into re-motionctrl

38a3acc

Merge branch 'main' into re-motionctrl

e7c55a3

a-r-r-o-w mentioned this pull request Feb 23, 2024

[Community] Experimental AnimateDiff Image to Video (open to improvements) #6509

Merged

6 tasks

a-r-r-o-w commented Feb 28, 2024

View reviewed changes

a-r-r-o-w added 8 commits March 1, 2024 17:33

rename use_legacy -> use_quant_conv

40982c8

add copied from

ca6bbba

update example docstring

3efda5c

Merge branch 'main' into re-motionctrl

3a3c13c

add note about camera pose

00c3738

Merge branch 'main' into re-motionctrl

18965e4

move motionctrl_svd to research projects

73f4d44

update community readme

65bf39e

a-r-r-o-w requested a review from DN6 March 26, 2024 17:12

Merge branch 'main' into re-motionctrl

c718c0c

a-r-r-o-w added 2 commits April 23, 2024 12:32

fix bugs

b903d9c

Merge branch 'main' into re-motionctrl

fa5a297

Merge branch 'main' into re-motionctrl

80408a7

github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024

yiyixuxu removed the stale Issues that haven't received updates label Sep 17, 2024

a-r-r-o-w closed this Sep 17, 2024

a-r-r-o-w mentioned this pull request Sep 17, 2024

Add MotionCntrl #6688

Closed

2 tasks

[Community] MotionCtrl SVD #7005

[Community] MotionCtrl SVD #7005

Uh oh!

Conversation

a-r-r-o-w commented Feb 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

a-r-r-o-w commented Feb 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Feb 18, 2024

Uh oh!

a-r-r-o-w commented Feb 18, 2024 via email

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a-r-r-o-w Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

a-r-r-o-w commented Mar 21, 2024

Uh oh!

a-r-r-o-w commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhj7905 commented Mar 26, 2024

Uh oh!

a-r-r-o-w commented Mar 26, 2024

Uh oh!

jhj7905 commented Mar 26, 2024

Uh oh!

a-r-r-o-w commented Mar 26, 2024

Uh oh!

jhj7905 commented Mar 28, 2024

Uh oh!

a-r-r-o-w commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhj7905 commented Apr 3, 2024

Uh oh!

a-r-r-o-w commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhj7905 commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhj7905 commented Apr 8, 2024

Uh oh!

a-r-r-o-w commented Apr 8, 2024

Uh oh!

jhj7905 commented Apr 8, 2024

Uh oh!

T0L0ve commented Apr 10, 2024

Uh oh!

a-r-r-o-w commented Apr 10, 2024

Uh oh!

jhj7905 commented Apr 11, 2024

Uh oh!

a-r-r-o-w commented Apr 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 14, 2024

Uh oh!

yiyixuxu commented Sep 17, 2024

Uh oh!

a-r-r-o-w commented Sep 17, 2024

Uh oh!

2hiTee commented Apr 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

a-r-r-o-w commented Feb 18, 2024 •

edited

Loading

a-r-r-o-w commented Feb 18, 2024 •

edited

Loading

a-r-r-o-w commented Mar 26, 2024 •

edited

Loading

a-r-r-o-w commented Mar 28, 2024 •

edited

Loading

a-r-r-o-w commented Apr 3, 2024 •

edited

Loading

jhj7905 commented Apr 4, 2024 •

edited

Loading

a-r-r-o-w commented Apr 23, 2024 •

edited

Loading