Qwen 3 Vision Model: `Qwen3 VL Collection`

The Qwen 3 VL models are the successors to the Qwen 2.5 VL models, featuring a diverse lineup of increased performance, flexible sizes, and reasoning-capable models.

Note: Support for the MoE variants is not yet implemented. This is coming very soon!

Mistral.rs supports the Qwen 3 VL vision model family, with examples in the Rust, Python, and HTTP APIs. ISQ quantization is supported to allow running the model with less memory requirements.

UQFF quantizations are also available.

The Python and HTTP APIs support sending images as:

URL
Path to a local image
Base64 encoded string

The Rust API takes an image from the image crate.

Note: When using device mapping or model topology, only the text model and its layers will be managed. This is because it contains most of the model parameters.

Interactive mode

Mistral.rs supports interactive mode for vision models! It is an easy way to interact with the model.

Start up interactive mode with the Qwen3 VL model:

Note

You should replace --features ... with one of the features specified here, or remove it for pure CPU inference.

cargo run --features ... --release -- -i vision-plain -m Qwen/Qwen3-VL-4B-Instruct

HTTP server

You can find this example here.

We support an OpenAI compatible HTTP API for vision models. This example demonstrates sending a chat completion request with an image.

Note: The image_url may be either a path, URL, or a base64 encoded string.

Start the server

Note

You should replace --features ... with one of the features specified here, or remove it for pure CPU inference.

cargo run --release --features ... -- --port 1234 -m Qwen/Qwen3-VL-4B-Instruct

Send a request

from openai import OpenAI

client = OpenAI(api_key="foobar", base_url="http://localhost:1234/v1/")

completion = client.chat.completions.create(
    model="default",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://www.garden-treasures.com/cdn/shop/products/IMG_6245.jpg"
                    },
                },
                {
                    "type": "text",
                    "text": "What type of flower is this? Give some fun facts.",
                },
            ],
        },
    ],
    max_tokens=256,
    frequency_penalty=1.0,
    top_p=0.1,
    temperature=0,
)
resp = completion.choices[0].message.content
print(resp)

You can find an example of encoding the image via base64 here.
You can find an example of loading an image locally here.

Rust

You can find this example here.

use anyhow::Result;
use mistralrs::{IsqType, TextMessageRole, VisionMessages, VisionModelBuilder};

#[tokio::main]
async fn main() -> Result<()> {
    let model = VisionModelBuilder::new("Qwen/Qwen3-VL-4B-Instruct")
        .with_isq(IsqType::Q4K)
        .with_logging()
        .build()
        .await?;

    let bytes = match reqwest::blocking::get(
        "https://www.nhmagazine.com/content/uploads/2019/05/mtwashingtonFranconia-2-19-18-108-Edit-Edit.jpg",
    ) {
        Ok(http_resp) => http_resp.bytes()?.to_vec(),
        Err(e) => anyhow::bail!(e),
    };
    let image = image::load_from_memory(&bytes)?;

    let messages = VisionMessages::new().add_image_message(
        TextMessageRole::User,
        "What is this?",
        vec![image],
        &model,
    )?;

    let response = model.send_chat_request(messages).await?;

    println!("{}", response.choices[0].message.content.as_ref().unwrap());
    dbg!(
        response.usage.avg_prompt_tok_per_sec,
        response.usage.avg_compl_tok_per_sec
    );

    Ok(())
}

Python

You can find this example here.

This example demonstrates loading and sending a chat completion request with an image.

Note: the image_url may be either a path, URL, or a base64 encoded string.

from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture

MODEL_ID = "Qwen/Qwen3-VL-4B-Thinking"

runner = Runner(
    which=Which.VisionPlain(
        model_id=MODEL_ID,
        arch=VisionArchitecture.Qwen3VL,
    ),
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="default",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://www.garden-treasures.com/cdn/shop/products/IMG_6245.jpg"
                        },
                    },
                    {
                        "type": "text",
                        "text": "What type of flower is this? Give some fun facts.",
                    },
                ],
            }
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res.choices[0].message.content)
print(res.usage)

You can find an example of encoding the image via base64 here.
You can find an example of loading an image locally here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen 3 Vision Model: `Qwen3 VL Collection`

ToC

Interactive mode

HTTP server

Rust

Python

FilesExpand file tree

QWEN3VL.md

Latest commit

History

QWEN3VL.md

File metadata and controls

Qwen 3 Vision Model: Qwen3 VL Collection

ToC

Interactive mode

HTTP server

Rust

Python

Qwen 3 Vision Model: `Qwen3 VL Collection`