Skip to content

Conversation

@DN6
Copy link
Collaborator

@DN6 DN6 commented Nov 15, 2024

What does this PR do?

Update Mochi docs

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@DN6 DN6 requested a review from sayakpaul November 15, 2024 11:38
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@Ednaordinary Ednaordinary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a few things this PR would be helpful to change

prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."

with torch.autocast("cuda", torch.bfloat16, cache_enabled=False):
frames = pipe(prompt, num_frames=84).frames[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_frames=84 should be num_frames=85, (14 * 6 + 1) like mentioned here

<Tip>

Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.

Copy link
Contributor

@Ednaordinary Ednaordinary Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(One line above this) Only FlowMatchEulerDiscreteScheduler has invert_sigmas, so anything else wouldn't work as of now as I understand it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cc: @hlky here too. @DN6 do we wanna remove this bit? I think we need to remove it from all flow pipelines.

pipe.enable_vae_tiling()

prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."
frames = pipe(prompt, num_frames=84).frames[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing, num_frames=85

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from @Ednaordinary are already great, so let's resolve them. Maybe we could add a section on how to reproduce some of their videos generated with the original inference code and params? I think most people would be interested in that.

Additionally, it seems like we should suggest using a maximum sequence length of 256?
#9769 (comment)

Already the case:

max_sequence_length: int = 256,

@DN6 DN6 mentioned this pull request Nov 27, 2024
6 tasks
@sayakpaul
Copy link
Member

@DN6 is this ready to be merged? Cc: @a-r-r-o-w as well

@DN6 DN6 requested a review from sayakpaul December 20, 2024 10:18
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, Dhruv. I know getting to this point hasn't been the easiest experience. Salute 🫡

<Tip>

Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cc: @hlky here too. @DN6 do we wanna remove this bit? I think we need to remove it from all flow pipelines.

Comment on lines 82 to 85
<Tip>
Decoding the latents in full precision is very memory intensive. You will need at least 70GB VRAM to generate the 163 frames
in this example. To reduce memory, either reduce the number of frames or run the decoding step in `torch.bfloat16`
</Tip>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we use enable_model_cpu_offload(), we would need 70GBs?

Suggested change
<Tip>
Decoding the latents in full precision is very memory intensive. You will need at least 70GB VRAM to generate the 163 frames
in this example. To reduce memory, either reduce the number of frames or run the decoding step in `torch.bfloat16`
</Tip>
<Tip>
Decoding the latents in full precision is very memory intensive. You will need at least 70GB VRAM to generate the 163 frames
in this example. To reduce memory, either reduce the number of frames or run the decoding step in `torch.bfloat16`.
</Tip>

@DN6 DN6 merged commit e12d610 into main Dec 20, 2024
4 checks passed
@a-r-r-o-w a-r-r-o-w deleted the mochi-docs branch December 23, 2024 00:23
Foundsheep pushed a commit to Foundsheep/diffusers that referenced this pull request Dec 23, 2024
* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <[email protected]>
sayakpaul added a commit that referenced this pull request Dec 23, 2024
* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants