feat(ifr): add characteristics for multimodal models (#4896)

fpagny · RoRoJ · web-flow · commit c83a4c501e02 · 2025-04-29T09:46:56.000+02:00
* feat(ifr): add characteristics for multimodal models

* fix(infr): language

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

---------

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;
diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -46,15 +46,15 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
 | `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
 | `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
-| `llama-3-70b-instruct` | Yes | Yes | English |
+| `llama-3-70b-instruct` | Yes | No | English |
 | `llama-3.1-nemotron-70b-instruct` | Yes | Yes | English |
 | `deepseek-r1-distill-llama-70B` | Yes | Yes | English, Chinese |
 | `deepseek-r1-distill-llama-8B` | Yes | Yes | English, Chinese |
 | `mistral-7b-instruct-v0.3` | Yes | Yes | English |
 | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
-| `mistral-small-24b-instruct-2501` | Yes | Yes | Text | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
+| `mistral-small-24b-instruct-2501` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
 | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
-| `mixtral-8x7b-instruct-v0.1` | Yes | Yes | English, French, German, Italian, Spanish |
+| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
 | `moshiko-0.1-8b` | No | No | English |
 | `moshika-0.1-8b` | No | No | English |
 | `pixtral-12b-2409` | Yes | Yes | English |
@@ -83,21 +83,40 @@ The model was not trained specifically to output function / tool call tokens. He
 ```
 google/gemma-3-27b-it:bf16
 ```
+| Attribute | Value |
+|-----------|-------|
+| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
 
 ### Mistral-small-3.1-24b-instruct-2503
 Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages.
 This model was optimized to have a dense knowledge and faster tokens throughput compared to its size.
 
+| Attribute | Value |
+|-----------|-------|
+| Supported images formats | PNG, JPEG, WEBP, and non-animated GIFs |
+| Maximum image resolution (pixels) | 1540x1540 |
+| Token dimension (pixels)| 28x28 |
+
 #### Model names
 ```
 mistral/mistral-small-3.1-24b-instruct-2503:bf16
 ```
 
+- Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported. Vector image formats (SVG, PSD) are not supported, neither PDFs nor videos.
+- Images size are limited in the following ways:
+  - Directly by the maximum context window. As an example, since tokens are squares of 28x28 pixels, the maximum context window taken by a single image is `3025` tokens (ie. `(1540*1540)/(28*28)`)
+  - Indirectly by the model accuracy: resolution above 1540x1540 will not increase model output accuracy. Indeed, images above 1540 pixels width or height will be automatically downscaled to fit within 1540x1540 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionaly compressed).
+
 ### Pixtral-12b-2409
 Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
 It can analyze images and offer insights from visual content alongside text.
-This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
-Pixtral is open-weight and distributed under the Apache 2.0 license.
+
+| Attribute | Value |
+|-----------|-------|
+| Supported images formats | PNG, JPEG, WEBP, and non-animated GIFs |
+| Maximum image resolution (pixels) | 1024x1024 |
+| Token dimension (pixels)| 16x16 |
+| Maximum images per request | 12 |
 
 #### Model name
 ```