Skip to content

Conversation

@anivar
Copy link

@anivar anivar commented Jul 20, 2025

Adds qwen2vl model architecture support to llamafile.

Changes

  • Add LLM_ARCH_QWEN2VL architecture enum
  • Add tensor mappings with bias support for Q, K, V, output layers
  • Add model loading for 2B, 7B, 72B variants
  • Add build_qwen2vl() graph building function
  • Add MODEL_72B model type
  • Add optional rope dimension sections loading

Implementation

  • Ported from upstream llama.cpp
  • Uses standard rope positioning
  • Maintains compatibility with existing architectures
  • Successfully builds with cosmocc

Addresses issue #752.

@cjpais
Copy link
Collaborator

cjpais commented Jul 20, 2025

Thanks for this contribution! I no longer have the ability to merge this in, however maybe @mofosyne can take a look and merge?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this binary file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, don't have the ability to merge. But had a look and aside from this binary file, the PR looks reasonable.

Anivar A Aravind added 2 commits July 28, 2025 06:44
- Add LLM_ARCH_QWEN2VL architecture enum
- Add tensor mappings with bias support for Q, K, V, output layers
- Add model loading for 2B, 7B, 72B variants
- Add build_qwen2vl() graph building function
- Add MODEL_72B model type
- Add optional rope dimension sections loading

Implementation ported from upstream llama.cpp. Uses standard rope
positioning. rope_multi with sections pending upstream support.

Addresses issue mozilla-ai#752.
Fixes integer underflow when n_discard >= cache_tokens.size() that causes
std::length_error crashes. This commonly occurs during KV cache context
shifting, particularly with Chinese text translation workloads.

The fix adds proper bounds checking before resizing the cache_tokens vector.

Fixes mozilla-ai#771
@anivar anivar force-pushed the add-qwen2vl-support branch from 1f9e927 to 5b73d10 Compare July 28, 2025 01:15
@anivar
Copy link
Author

anivar commented Jul 28, 2025

Hey @cjpais and @mofosyne, thanks for the review!

I've cleaned up the PR based on your feedback:

  • Removed that binary file you spotted (good catch!)
  • Added the test binaries to .gitignore so this won't happen again
  • Improved the test file to actually do something useful
  • Made the rope_multi TODO more concise

The PR is now split into 2 clean commits:

  1. The qwen2vl architecture support
  2. The KV cache crash fix (this one's pretty important - fixes those std::length_error crashes)

Everything should be good to merge now. Let me know if you need anything else!

@mofosyne
Copy link
Collaborator

Mostly just checked that there is nothing obviously wrong. Don't have merge capability, but at least can confirm to core maintainers that more attention can be placed on checking the actual logic etc... thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants