Skip to content

Conversation

@a-r-r-o-w
Copy link
Contributor

@a-r-r-o-w a-r-r-o-w commented Nov 26, 2024

T2V:

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained("a-r-r-o-w/LTX-Video-diffusers", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
).frames[0]
export_to_video(video, "output.mp4", fps=24)

I2V:

import torch
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = LTXImageToVideoPipeline.from_pretrained("a-r-r-o-w/LTX-Video-diffusers", torch_dtype=torch.bfloat16)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/datasets/a-r-r-o-w/tiny-meme-dataset-captioned/resolve/main/images/8.png"
)
prompt = "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
).frames[0]
export_to_video(video, "output.mp4", fps=24)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w a-r-r-o-w marked this pull request as ready for review November 27, 2024 14:22
@a-r-r-o-w a-r-r-o-w requested review from DN6, stevhliu and yiyixuxu and removed request for yiyixuxu November 27, 2024 14:22
Comment on lines +199 to +202
elif qk_norm == "rms_norm_across_heads":
# LTX applies qk norm across all heads
self.norm_q = RMSNorm(dim_head * heads, eps=eps)
self.norm_k = RMSNorm(dim_head * kv_heads, eps=eps)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DN6 Should I follow your approach with Mochi and create a separate attention class for LTX?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but we want to be more careful, ideally, we do that as part of carefully planned-out refactor
but maybe it would be safe to just inherit form Attention for now? e.g. we wrote code like this with the assumption in mind we only have one attention class
https://github.com/huggingface/diffusers/blob/e47cc1fc1a89a5375c322d296cd122fe71ab859f/src/diffusers/pipelines/pag/pag_utils.py#L57C39-L57C48

cc @DN6 here too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess Attention stays here for now

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

@a-r-r-o-w
Copy link
Contributor Author

a-r-r-o-w commented Dec 4, 2024

Thanks for the reviews! Waiting for confirmation of where we should host weights, so I can update documentation accordingly. Currently they are under my account, but once we move them, we should be good to merge

@Skquark
Copy link

Skquark commented Dec 6, 2024

I'm wondering if it might be easy to incorporate STG (Spatiotemporal Skip Guidance) into LTX-Video pipeline. The improvements in video quality look significant, examples are impressive. Here's the links STG Project, STGuidance GitHub, ComfyUI-LTXTricks. Looks like it can also apply to Mochi, SVD and Open-Sora. Could be that missing ingredient... Adds params stg_mode, stg_scale, stg_block_idx, do_rescaling & rescaling_scale.

@a-r-r-o-w
Copy link
Contributor Author

Hey, thanks for the suggestion!

We do plan to incorporate STG and other methods of guidance by isolating it out into its separate component. I do not like the idea of adding more parameters to the pipeline __call__ because it starts to become very confusing and bloated, so the design for integration is still a WIP on my end, but I plan to open a PR in the coming week or whenever I can get it to work with all our pipelines.

Copy link

@yoavhacohen yoavhacohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request about naming, LTX -> LTXV

@a-r-r-o-w a-r-r-o-w merged commit 96c376a into main Dec 12, 2024
15 checks passed
@a-r-r-o-w a-r-r-o-w deleted the ltx-integration branch December 12, 2024 10:51
@tin2tin
Copy link

tin2tin commented Dec 15, 2024

@a-r-r-o-w

Testing, I get this error:
AttributeError: module diffusers has no attribute LTXTransformer3DModel. Did you mean: 'LatteTransformer3DModel'

@a-r-r-o-w
Copy link
Contributor Author

@tin2tin I think it should be LTXVideoTransformer3DModel

@tin2tin
Copy link

tin2tin commented Dec 15, 2024

@a-r-r-o-w I just used the test code in the first post, which doesn't specify that name. So, I guess something internally is calling a wrongly named operator?

@a-r-r-o-w
Copy link
Contributor Author

Just to confirm, you have installed diffusers from the main branch, yes?

@a-r-r-o-w
Copy link
Contributor Author

Also, the config files on the LTX repo were updated recently. Could you ensure you're using the latest commit of those configs so that the right model names are being pointed to?

https://huggingface.co/Lightricks/LTX-Video/commits/main

@tin2tin
Copy link

tin2tin commented Dec 16, 2024

Yes, I'm on the latest main branch, and I downloaded the LTX repo yesterday, and it seems like the name change was 4 days ago.

Error: Python: Traceback (most recent call last):
  File "C:\Users\xxx\Documents\Blender Projekter\LTX_video.blend\Text", line 5, in <module>
  File "C:\Users\xxx\AppData\Roaming\Python\Python311\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Roaming\Python\Python311\site-packages\diffusers\pipelines\pipeline_utils.py", line 902, in from_pretrained
    loaded_sub_model = load_sub_model(
                       ^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Roaming\Python\Python311\site-packages\diffusers\pipelines\pipeline_loading_utils.py", line 635, in load_sub_model
    class_obj, class_candidates = get_class_obj_and_candidates(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Roaming\Python\Python311\site-packages\diffusers\pipelines\pipeline_loading_utils.py", line 319, in get_class_obj_and_candidates
    class_obj = getattr(library, class_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\AppData\Roaming\Python\Python311\site-packages\diffusers\utils\import_utils.py", line 861, in __getattr__
    raise AttributeError(f"module {self.__name__} has no attribute {name}")
AttributeError: module diffusers has no attribute LTXTransformer3DModel. Did you mean: 'LatteTransformer3DModel'?

Checking dependencies...

@Abhinay1997
Copy link
Contributor

@a-r-r-o-w minor typo here, cond_mask should use mask_shape rather than shape here. https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ltx/pipeline_ltx_image2video.py#L493

sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* transformer

* make style & make fix-copies

* transformer

* add transformer tests

* 80% vae

* make style

* make fix-copies

* fix

* undo cogvideox changes

* update

* update

* match vae

* add docs

* t2v pipeline working; scheduler needs to be checked

* docs

* add pipeline test

* update

* update

* make fix-copies

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* update

* copy t2v to i2v pipeline

* update

* apply review suggestions

* update

* make style

* remove framewise encoding/decoding

* pack/unpack latents

* image2video

* update

* make fix-copies

* update

* update

* rope scale fix

* debug layerwise code

* remove debug

* Apply suggestions from code review

Co-authored-by: YiYi Xu <[email protected]>

* propagate precision changes to i2v pipeline

* remove downcast

* address review comments

* fix comment

* address review comments

* [Single File] LTX support for loading original weights (#10135)

* from original file mixin for ltx

* undo config mapping fn changes

* update

* add single file to pipelines

* update docs

* Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

* Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

* rename classes based on ltx review

* point to original repository for inference

* make style

* resolve conflicts correctly

---------

Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

roadmap Add to current release roadmap

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants