Skip to content

Conversation

@neph1
Copy link
Contributor

@neph1 neph1 commented Apr 12, 2025

Still have some cleaning up to do, but this pr is back (I might migrate this and force push, or if it's eventually squashed).
It does things and produced something. I will do a longer run tomorrow, but I have no idea how to inference. Is there an example script somewhere i can modify? Found it

video = video.permute(0, 2, 1, 3, 4).contiguous() # [B, F, C, H, W] -> [B, C, F, H, W]


compute_posterior = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far only made it work with compute_posterior false

Copy link
Contributor

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @neph1, thanks for the PR and testing with some runs! I know there's some duplication of code at the moment, but I plan to address that in the future with something else. For now, let's keep the duplication

I'll try to help with launching some training runs too for verifying the PR once we have the changes looking similar to the Wan/CogView control implementations #338

@neph1
Copy link
Contributor Author

neph1 commented Apr 13, 2025

Oh, I did follow the new implementation, just kept it on the same branch. At least I believe it follows what is in main now.
Not sure it works, though. I ran 1000 steps canny training on some simpsons dataset I found, think I managed to make the conversion script handle x_embedder, but it's only noise. Sadly can't inference with plain diffusers due to VRAM.

Got a bit disheartened, but I'll get back to it.

@neph1 neph1 force-pushed the control-lora-trainer-hunyuan branch from 2a06b9c to 9be0e0c Compare April 13, 2025 17:36
latents = moments.to(dtype=dtype)

return {self.output_names[0]: latents}
latents_mean = torch.tensor(vae.latent_channels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neph1 These changes seem incorrec to me and will cause worse generations. The previous implementation that did not perform this normalization was correct, I think.

Was this modified from Wan? If so, it's incorrect because they are different models and preprocess latents differently

@a-r-r-o-w
Copy link
Contributor

Also, I'm a bit more free now. I was working on a major upcoming feature for speeding up training and inference, and it's nearing completion. If you'd like me to take over the PR and make the relevant changes, do a long run for validating correctness, please do LMK

@neph1
Copy link
Contributor Author

neph1 commented Apr 18, 2025

By all means, if you have the time. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants