Add qwen2vl model architecture support #776

anivar · 2025-07-20T20:10:11Z

Adds qwen2vl model architecture support to llamafile.

Changes

Add LLM_ARCH_QWEN2VL architecture enum
Add tensor mappings with bias support for Q, K, V, output layers
Add model loading for 2B, 7B, 72B variants
Add build_qwen2vl() graph building function
Add MODEL_72B model type
Add optional rope dimension sections loading

Implementation

Ported from upstream llama.cpp
Uses standard rope positioning
Maintains compatibility with existing architectures
Successfully builds with cosmocc

Addresses issue #752.

cjpais · 2025-07-20T20:44:35Z

Thanks for this contribution! I no longer have the ability to merge this in, however maybe @mofosyne can take a look and merge?

mofosyne · 2025-07-27T15:07:06Z

test_kv_cache_fix

What's the purpose of this binary file?

FYI, don't have the ability to merge. But had a look and aside from this binary file, the PR looks reasonable.

- Add LLM_ARCH_QWEN2VL architecture enum - Add tensor mappings with bias support for Q, K, V, output layers - Add model loading for 2B, 7B, 72B variants - Add build_qwen2vl() graph building function - Add MODEL_72B model type - Add optional rope dimension sections loading Implementation ported from upstream llama.cpp. Uses standard rope positioning. rope_multi with sections pending upstream support. Addresses issue mozilla-ai#752.

Fixes integer underflow when n_discard >= cache_tokens.size() that causes std::length_error crashes. This commonly occurs during KV cache context shifting, particularly with Chinese text translation workloads. The fix adds proper bounds checking before resizing the cache_tokens vector. Fixes mozilla-ai#771

anivar · 2025-07-28T01:16:13Z

Hey @cjpais and @mofosyne, thanks for the review!

I've cleaned up the PR based on your feedback:

Removed that binary file you spotted (good catch!)
Added the test binaries to .gitignore so this won't happen again
Improved the test file to actually do something useful
Made the rope_multi TODO more concise

The PR is now split into 2 clean commits:

The qwen2vl architecture support
The KV cache crash fix (this one's pretty important - fixes those std::length_error crashes)

Everything should be good to merge now. Let me know if you need anything else!

mofosyne · 2025-07-30T13:55:46Z

Mostly just checked that there is nothing obviously wrong. Don't have merge capability, but at least can confirm to core maintainers that more attention can be placed on checking the actual logic etc... thanks

github-actions bot added the llama.cpp label Jul 20, 2025

mofosyne reviewed Jul 27, 2025

View reviewed changes

Anivar A Aravind added 2 commits July 28, 2025 06:44

anivar force-pushed the add-qwen2vl-support branch from 1f9e927 to 5b73d10 Compare July 28, 2025 01:15

mofosyne approved these changes Jul 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add qwen2vl model architecture support #776

Add qwen2vl model architecture support #776

Uh oh!

anivar commented Jul 20, 2025 •

edited

Loading

Uh oh!

cjpais commented Jul 20, 2025

Uh oh!

mofosyne Jul 27, 2025

Uh oh!

mofosyne Jul 27, 2025

Uh oh!

anivar commented Jul 28, 2025

Uh oh!

mofosyne commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add qwen2vl model architecture support #776

Are you sure you want to change the base?

Add qwen2vl model architecture support #776

Uh oh!

Conversation

anivar commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Implementation

Uh oh!

cjpais commented Jul 20, 2025

Uh oh!

mofosyne Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

mofosyne Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

anivar commented Jul 28, 2025

Uh oh!

mofosyne commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anivar commented Jul 20, 2025 •

edited

Loading