You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported. Vector image formats (SVG, PSD) are not supported, neither PDFs nor videos.
106
+
- Images size are limited:
107
+
- Directly by the maximum context window. As an example, since tokens are squares of 28x28 pixels, the maximum context window taken by a single image is `3025` tokens (ie. `(1540*1540)/(28*28)`)
108
+
- Indirectly by the model accuracy: resolution above 1540x1540 will not increase model output accuracy. Indeed, images above 1540 pixels width or height will be automatically downscaled to fit within 1540x1540 dimension. Note that image ratio and overall aspect is preserved (images are not cropped, only additionaly compressed).
109
+
96
110
### Pixtral-12b-2409
97
111
Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder.
98
112
It can analyze images and offer insights from visual content alongside text.
99
-
This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension.
100
-
Pixtral is open-weight and distributed under the Apache 2.0 license.
0 commit comments