Skip to content

Conversation

@kabachuha
Copy link

This pull request adds "manual" first-last frame support for Hunyuan1.5 video generation via latent concatenation.

The code is very simple and based on the first-last-frame implementation for Wan.

Because Hunyuan uses CLIP vision embeddings as input, without the dedicated model only one of them is used as provided.

One thing I would like to have help with is the last frame often "flickering" without direct soft transition. I also observed this problem for Wan2.2/VACE, but there it was less noticeable.

Otherwise, it looks good and it indeed takes the last frame's subject in consideration.

hunyuan_video_1.5_00010.mp4
ComfyUI_temp_rqlky_00011_ ComfyUI_temp_rqlky_00002_

Closes #11020.

@Kosinkadink
Copy link
Member

@kabachuha could you fix the linting/ruff errors? thanks!

@kabachuha
Copy link
Author

@Kosinkadink Addressed its comments now

@jovan2009
Copy link

jovan2009 commented Dec 11, 2025

This would be a nice feature to have!

The last frame looks extremely burned and the motion simply jumps to that frame without any transition. One idea to alleviate that, from the top of my head, would be something like it is used as a workaround with Wan 2.2 S2V first frames. There it helps to concatenate the first latent block and then cutting out the first frames after VAE decoding. Maybe here the same could work somewhat, concatenate the last latent block and then cut out the last frames after the VAE decoding. I'm not sure, it might be a completely different situation.

Another idea would be to do the generation again in reverse, now with the (previously) last frame being the first and the (previously) first frame being the last. And then somehow blend the results of both generations. Of course, this would take double time to process and I'm not sure the result would match what results currently with Wan 2.2 FLF.

Edit: I downloaded the video and I looked at it frame by frame. It seems that the whole last latent block (from frame 117 to the last) is "from another story", is simply a bad version of the intended last image repeated. I can think of "workarounds" about that but they are all in "real pixels space", latent space is beyond my level of knowledge. I would simply try a 4+1 (or 8+1) frames generation in... Wan 2.2 FLF 🤣 With the last "good" Hunyuan generated frame (frame 116) as the first image and the intended last frame as the last. But this is of course somewhat more of a joke than a solution.

@kabachuha
Copy link
Author

@jovan2009 maybe something can be done with the last latent in the way the special node added by kijai replaces the first – corrupted – latent of Kandinsky outputs

ReplaceVideoLatentFrames, it can work with the last frames as well. If this FLF node adds the last image as a latent, it can probably replace the last weird part

@jovan2009
Copy link

@jovan2009 maybe something can be done with the last latent in the way the special node added by kijai replaces the first – corrupted – latent of Kandinsky outputs

ReplaceVideoLatentFrames, it can work with the last frames as well. If this FLF node adds the last image as a latent, it can probably replace the last weird part

So to translate what my limited knowledge allows me to understand: transform the last image into 1 block latent, cut and replace the bad last latent block with this new block and let the VAE decoding to make a smooth transition out of it. It might work regarding the burnt look but it will still be a very sudden transition from winter to summer in 4 frames. Better than what it's like right now, I guess.

@comfy-pr-bot
Copy link
Member

Test Evidence Check

⚠️ Warning: Test Explanation Missing

If this PR modifies behavior that requires testing, a test explanation is required. PRs lacking applicable test explanations may not be reviewed until added. Please add test explanations to ensure code quality and prevent regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Last frame input for Hunyuan1.5 Image2Video

4 participants