Skip to content

Conversation

@ravenouse
Copy link
Contributor

@ravenouse ravenouse commented Feb 4, 2025

Summary

This PR adds support for the vision encoder and projector of the Janus Pro 7B model.
It is still a WIP .

Progress

  • Add conversion script for the vision encoder
  • Map the Janus vision encoder implementation to the current clip.cpp implementation
  • Currently working: Debugging the NaN results from the computation graph (Update Mar 06 2025)
  • Verify the implementation results

@github-actions github-actions bot added examples python python script changes labels Feb 4, 2025
@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 13, 2025

Looks very interesting!
If I didn't miss something, Janus Pro just uses a single clip generated embedding block correct ? similar to how llava (1.5) worked ?
I always disliked the way llava next and co stitch multiple clip embeddings together.

@ravenouse
Copy link
Contributor Author

Hi @cmp-nct,

Thank you so much for your comments!
I apologize for the delayed response; I've been focusing on understanding the architecture of the llava models and the image pre-processing functions defined in the clip.cpp.

Regarding your question about "co-stitching multiple CLIP embeddings," are you referring to the anyres processing used in llava 1.6?
My current understanding is that Janus Pro, similar to LLaVA 1.5, uses a single CLIP model for image understanding (specifically, siglip_large_patch16_384) followed by a two-layer MLP aligner.

During testing and debugging of the Janus Pro computation graph I implemented in this PR, I encountered an issue where all embedding values were NaN, even from the very first computation node. The computation graph itself appears to be functioning correctly, but the numerical output is invalid.

I would greatly appreciate any insights you might have on potential causes for this issue. Could it be related to the PyTorch version, the data type of the original model weights, or perhaps some other factor? Any suggestions for debugging directions would be extremely helpful.

Thank you again for your time and help on this!

@ravenouse ravenouse closed this Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants