openvinotoolkit · dhandhalyabhavik · Apr 14, 2026 · Apr 15, 2026
diff --git a/notebooks/ministral-3/README.md b/notebooks/ministral-3/README.md
@@ -0,0 +1,31 @@
+# Visual-language assistant with Ministral-3 and OpenVINO
+
+Ministral-3 (Ministral-3-3B-Instruct-2512) is a lightweight, state-of-the-art multimodal model from Mistral AI, combining a 3.4B parameter language model with a 0.4B parameter vision encoder based on the Pixtral architecture. It is designed for efficient visual-language understanding tasks.
+
+**Key Features of Ministral-3:**
+* **Multimodal Understanding**: Combines text and vision capabilities in a compact 3B parameter model, enabling image understanding and visual question answering.
+* **Long Context Support**: Supports up to 262,144 tokens with YaRN RoPE scaling for extended context processing.
+* **Efficient Architecture**: Uses Grouped Query Attention (32 attention heads with 8 KV heads) for memory-efficient inference.
+* **Pixtral Vision Encoder**: Employs a PixtralVisionModel with patch-based image processing and multi-modal projection for seamless vision-language integration.
+
+More details about the model can be found in the [model card](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) and the [Mistral AI documentation](https://docs.mistral.ai/).
+
+In this tutorial we consider how to convert and optimize Ministral-3 model for creating a multimodal chatbot using [Optimum Intel](https://github.com/huggingface/optimum-intel). Additionally, we demonstrate how to apply model optimization techniques like weights compression using [NNCF](https://github.com/openvinotoolkit/nncf).
+
+## Notebook contents
+The tutorial consists of the following steps:
+
+- Install requirements
+- Convert and Optimize model
+- Prepare OpenVINO GenAI Inference Pipeline
+- Run OpenVINO GenAI model inference
+- Launch Interactive demo
+
+In this demonstration, you'll create an interactive chatbot that can answer questions about provided image content.
+
+## Installation instructions
+This is a self-contained example that relies solely on its own code.</br>
+We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
+For details, please refer to [Installation Guide](../../README.md).
+
+<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/ministral-3/README.md" />
diff --git a/notebooks/ministral-3/gradio_helper.py b/notebooks/ministral-3/gradio_helper.py
@@ -0,0 +1,77 @@
+from pathlib import Path
+import gradio as gr
+
+from PIL import Image
+import requests
+from threading import Thread
+import inspect
+from transformers import TextIteratorStreamer
+
+example_image_urls = [
+    (
+        "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/1d6a0188-5613-418d-a1fd-4560aae1d907",
+        "bee.jpg",
+    ),
+    (
+        "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/6cc7feeb-0721-4b5d-8791-2576ed9d2863",
+        "baklava.png",
+    ),
+]
+for url, file_name in example_image_urls:
+    if not Path(file_name).exists():
+        Image.open(requests.get(url, stream=True, timeout=30).raw).save(file_name)
+
+
+def make_demo(model, processor):
+    has_additonal_buttons = "undo_button" in inspect.signature(gr.ChatInterface.__init__).parameters
+
+    def bot_streaming(message, history):
+        print(f"message is - {message}")
+        print(f"history is - {history}")
+
+        files = message["files"] if isinstance(message, dict) else message.files
+        message_text = message["text"] if isinstance(message, dict) else message.text
+
+        image = None
+        if files:
+            if isinstance(files[-1], dict):
+                image = files[-1]["path"]
+            else:
+                if isinstance(files[-1], (str, Path)):
+                    image = files[-1]
+                else:
+                    image = files[-1] if isinstance(files[-1], (list, tuple)) else files[-1].path
+        if image is not None:
+            image = Image.open(image).convert("RGB")
+            # Resize large images to keep patch count manageable
+            if max(image.size) > 512:
+                image.thumbnail((512, 512))
+
+        inputs = model.preprocess_inputs(text=message_text, image=image, processor=processor)
+
+        streamer = TextIteratorStreamer(processor.tokenizer, skip_prompt=True, skip_special_tokens=True)
+        generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=128, do_sample=False)
+
+        thread = Thread(target=model.generate, kwargs=generation_kwargs)
+        thread.start()
+
+        buffer = ""
+        for new_text in streamer:
+            buffer += new_text
+            yield buffer
+
+    additional_buttons = {}
+    if has_additonal_buttons:
+        additional_buttons = {"undo_button": None, "retry_button": None}
+    demo = gr.ChatInterface(
+        fn=bot_streaming,
+        title="Ministral-3 OpenVINO Demo",
+        examples=[
+            {"text": "What is on the flower?", "files": ["./bee.jpg"]},
+            {"text": "How to make this pastry?", "files": ["./baklava.png"]},
+        ],
+        stop_btn=None,
+        multimodal=True,
+        **additional_buttons,
+    )
+    return demo