Skip to content

Commit c83a4c5

Browse files
fpagnyRoRoJ
andauthored
feat(ifr): add characteristics for multimodal models (#4896)
* feat(ifr): add characteristics for multimodal models * fix(infr): language Co-authored-by: Rowena Jones <[email protected]> --------- Co-authored-by: Rowena Jones <[email protected]>
1 parent 3bf8e76 commit c83a4c5

File tree

1 file changed

+24
-5
lines changed

1 file changed

+24
-5
lines changed

pages/managed-inference/reference-content/model-catalog.mdx

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,15 +46,15 @@ A quick overview of available models in Scaleway's catalog and their core attrib
4646
| `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
4747
| `llama-3.1-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
4848
| `llama-3.1-8b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
49-
| `llama-3-70b-instruct` | Yes | Yes | English |
49+
| `llama-3-70b-instruct` | Yes | No | English |
5050
| `llama-3.1-nemotron-70b-instruct` | Yes | Yes | English |
5151
| `deepseek-r1-distill-llama-70B` | Yes | Yes | English, Chinese |
5252
| `deepseek-r1-distill-llama-8B` | Yes | Yes | English, Chinese |
5353
| `mistral-7b-instruct-v0.3` | Yes | Yes | English |
5454
| `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
55-
| `mistral-small-24b-instruct-2501` | Yes | Yes | Text | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
55+
| `mistral-small-24b-instruct-2501` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
5656
| `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
57-
| `mixtral-8x7b-instruct-v0.1` | Yes | Yes | English, French, German, Italian, Spanish |
57+
| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
5858
| `moshiko-0.1-8b` | No | No | English |
5959
| `moshika-0.1-8b` | No | No | English |
6060
| `pixtral-12b-2409` | Yes | Yes | English |
@@ -83,21 +83,40 @@ The model was not trained specifically to output function / tool call tokens. He
8383
```
8484
google/gemma-3-27b-it:bf16
8585
```
86+
| Attribute | Value |
87+
|-----------|-------|
88+
| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
8689

8790
### Mistral-small-3.1-24b-instruct-2503
8891
Mistral-small-3.1-24b-instruct-2503 is a model developed by Mistral to perform text processing and image analysis on many languages.
8992
This model was optimized to have a dense knowledge and faster tokens throughput compared to its size.
9093

94+
| Attribute | Value |
95+
|-----------|-------|
96+
| Supported images formats | PNG, JPEG, WEBP, and non-animated GIFs |
97+
| Maximum image resolution (pixels) | 1540x1540 |
98+
| Token dimension (pixels)| 28x28 |
99+
91100
#### Model names
92101
```
93102
mistral/mistral-small-3.1-24b-instruct-2503:bf16
94103
```
95104

105+
- Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported. Vector image formats (SVG, PSD) are not supported, neither PDFs nor videos.
106+
- Images size are limited in the following ways:
107+
- Directly by the maximum context window. As an example, since tokens are squares of 28x28 pixels, the maximum context window taken by a single image is `3025` tokens (ie. `(1540*1540)/(28*28)`)
108+
- Indirectly by the model accuracy: resolution above 1540x1540 will not increase model output accuracy. Indeed, images above 1540 pixels width or height will be automatically downscaled to fit within 1540x1540 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionaly compressed).
109+
96110
### Pixtral-12b-2409
97111
Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
98112
It can analyze images and offer insights from visual content alongside text.
99-
This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
100-
Pixtral is open-weight and distributed under the Apache 2.0 license.
113+
114+
| Attribute | Value |
115+
|-----------|-------|
116+
| Supported images formats | PNG, JPEG, WEBP, and non-animated GIFs |
117+
| Maximum image resolution (pixels) | 1024x1024 |
118+
| Token dimension (pixels)| 16x16 |
119+
| Maximum images per request | 12 |
101120

102121
#### Model name
103122
```

0 commit comments

Comments
 (0)