- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
Add qwen2vl model architecture support #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Thanks for this contribution! I no longer have the ability to merge this in, however maybe @mofosyne can take a look and merge? | 
        
          
                test_kv_cache_fix
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this binary file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, don't have the ability to merge. But had a look and aside from this binary file, the PR looks reasonable.
- Add LLM_ARCH_QWEN2VL architecture enum - Add tensor mappings with bias support for Q, K, V, output layers - Add model loading for 2B, 7B, 72B variants - Add build_qwen2vl() graph building function - Add MODEL_72B model type - Add optional rope dimension sections loading Implementation ported from upstream llama.cpp. Uses standard rope positioning. rope_multi with sections pending upstream support. Addresses issue mozilla-ai#752.
Fixes integer underflow when n_discard >= cache_tokens.size() that causes std::length_error crashes. This commonly occurs during KV cache context shifting, particularly with Chinese text translation workloads. The fix adds proper bounds checking before resizing the cache_tokens vector. Fixes mozilla-ai#771
1f9e927    to
    5b73d10      
    Compare
  
    | Hey @cjpais and @mofosyne, thanks for the review! I've cleaned up the PR based on your feedback: 
 The PR is now split into 2 clean commits: 
 Everything should be good to merge now. Let me know if you need anything else! | 
| Mostly just checked that there is nothing obviously wrong. Don't have merge capability, but at least can confirm to core maintainers that more attention can be placed on checking the actual logic etc... thanks | 
Adds qwen2vl model architecture support to llamafile.
Changes
Implementation
Addresses issue #752.