-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Hunyuan Video 1.5 manual FLF implementation #11151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@kabachuha could you fix the linting/ruff errors? thanks! |
|
@Kosinkadink Addressed its comments now |
|
This would be a nice feature to have!
Another idea would be to do the generation again in reverse, now with the (previously) last frame being the first and the (previously) first frame being the last. And then somehow blend the results of both generations. Of course, this would take double time to process and I'm not sure the result would match what results currently with Wan 2.2 FLF. Edit: I downloaded the video and I looked at it frame by frame. It seems that the whole last latent block (from frame 117 to the last) is "from another story", is simply a bad version of the intended last image repeated. I can think of "workarounds" about that but they are all in "real pixels space", latent space is beyond my level of knowledge. I would simply try a 4+1 (or 8+1) frames generation in... Wan 2.2 FLF 🤣 With the last "good" Hunyuan generated frame (frame 116) as the first image and the intended last frame as the last. But this is of course somewhat more of a joke than a solution. |
|
@jovan2009 maybe something can be done with the last latent in the way the special node added by kijai replaces the first – corrupted – latent of Kandinsky outputs ReplaceVideoLatentFrames, it can work with the last frames as well. If this FLF node adds the last image as a latent, it can probably replace the last weird part |
So to translate what my limited knowledge allows me to understand: transform the last image into 1 block latent, cut and replace the bad last latent block with this new block and let the VAE decoding to make a smooth transition out of it. It might work regarding the burnt look but it will still be a very sudden transition from winter to summer in 4 frames. Better than what it's like right now, I guess. |
Test Evidence CheckIf this PR modifies behavior that requires testing, a test explanation is required. PRs lacking applicable test explanations may not be reviewed until added. Please add test explanations to ensure code quality and prevent regressions. |
This pull request adds "manual" first-last frame support for Hunyuan1.5 video generation via latent concatenation.
The code is very simple and based on the first-last-frame implementation for Wan.
Because Hunyuan uses CLIP vision embeddings as input, without the dedicated model only one of them is used as provided.
One thing I would like to have help with is the last frame often "flickering" without direct soft transition. I also observed this problem for Wan2.2/VACE, but there it was less noticeable.
Otherwise, it looks good and it indeed takes the last frame's subject in consideration.
hunyuan_video_1.5_00010.mp4
Closes #11020.