Replies: 1 comment
-
|
not Expo (RN CLI), but we are able to run Qwen3-VL-2B on a phone. not entirely sure what issue you're running into, but in our case the problem was related to context window limits. A couple of things to keep in mind: Qwen-VL models use dynamic image sizing rather than a fixed size. The number of output tokens scales based on image dimensions: For Qwen3-VL, the effective patch size is for example for my phone's camera the image dims are 3024 × 4032, which means: To manage this, llama.cpp caps output tokens at 4096 by default for Qwen-VL models (source). It calculates the maximum allowed pixels from this token limit, then scales the image down (calc_size_preserved_ratio). So
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Has anyone implemented a qwen-vl multimodal demo using expo?
Beta Was this translation helpful? Give feedback.
All reactions