Add supports for Janus vision encoder and projector [WIP] #11646

ravenouse · 2025-02-04T06:52:16Z

Summary

This PR adds support for the vision encoder and projector of the Janus Pro 7B model.
It is still a WIP .

Progress

Add conversion script for the vision encoder
Map the Janus vision encoder implementation to the current clip.cpp implementation
Currently working: Debugging the NaN results from the computation graph (Update Mar 06 2025)
Verify the implementation results

…ments

…erter

cmp-nct · 2025-02-13T21:10:14Z

Looks very interesting!
If I didn't miss something, Janus Pro just uses a single clip generated embedding block correct ? similar to how llava (1.5) worked ?
I always disliked the way llava next and co stitch multiple clip embeddings together.

ravenouse · 2025-03-17T18:45:41Z

Hi @cmp-nct,

Thank you so much for your comments!
I apologize for the delayed response; I've been focusing on understanding the architecture of the llava models and the image pre-processing functions defined in the clip.cpp.

Regarding your question about "co-stitching multiple CLIP embeddings," are you referring to the anyres processing used in llava 1.6?
My current understanding is that Janus Pro, similar to LLaVA 1.5, uses a single CLIP model for image understanding (specifically, siglip_large_patch16_384) followed by a two-layer MLP aligner.

During testing and debugging of the Janus Pro computation graph I implemented in this PR, I encountered an issue where all embedding values were NaN, even from the very first computation node. The computation graph itself appears to be functioning correctly, but the numerical output is invalid.

I would greatly appreciate any insights you might have on potential causes for this issue. Could it be related to the PyTorch version, the data type of the original model weights, or perhaps some other factor? Any suggestions for debugging directions would be extremely helpful.

Thank you again for your time and help on this!

Add script to convert Janus encoder to GGUF format and update require…

b7fafb7

…ments

github-actions bot added examples python python script changes labels Feb 4, 2025

Add example clip cli and enhance tensor name processing in Janus conv…

3667a0a

…erter

davrot mentioned this pull request Feb 6, 2025

Support Janus-Pro-7b for vision models ollama/ollama#8618

Open

ravenouse added 2 commits February 7, 2025 06:04

Add Janus Attention Pool with Latent Query support in CLIP model

7850716

Add Janus Attention Pool support in CLIP model

448c62e

ravenouse added 2 commits February 14, 2025 07:42

Align convert script with clip.cpp

5bb91a9

Refactor Janus Attention Pool implementation in CLIP model

f3f6787

ravenouse closed this Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add supports for Janus vision encoder and projector [WIP] #11646

Add supports for Janus vision encoder and projector [WIP] #11646

Uh oh!

ravenouse commented Feb 4, 2025 •

edited

Loading

Uh oh!

cmp-nct commented Feb 13, 2025

Uh oh!

ravenouse commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add supports for Janus vision encoder and projector [WIP] #11646

Add supports for Janus vision encoder and projector [WIP] #11646

Uh oh!

Conversation

ravenouse commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Progress

Uh oh!

cmp-nct commented Feb 13, 2025

Uh oh!

ravenouse commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ravenouse commented Feb 4, 2025 •

edited

Loading