|
| 1 | +--- |
| 2 | +meta: |
| 3 | + title: Understanding the Pixtral-12b-2409 model |
| 4 | + description: Deploy your own secure Pixtral-12b-2409 model with Scaleway Managed Inference. Privacy-focused, fully managed. |
| 5 | +content: |
| 6 | + h1: Understanding the Pixtral-12b-2409 model |
| 7 | + paragraph: This page provides information on the Pixtral-12b-2409 model |
| 8 | +tags: |
| 9 | +dates: |
| 10 | + validation: 2024-09-23 |
| 11 | +categories: |
| 12 | + - ai-data |
| 13 | +--- |
| 14 | + |
| 15 | +## Model overview |
| 16 | + |
| 17 | +| Attribute | Details | |
| 18 | +|-----------------|------------------------------------| |
| 19 | +| Provider | [Mistral](https://mistral.ai/technology/#models) | |
| 20 | +| Model Name | `pixtral-12b-2409` | |
| 21 | +| Compatible Instances | H100, H100-2 (bf16) | |
| 22 | +| Context size | 128k tokens | |
| 23 | + |
| 24 | +## Model name |
| 25 | + |
| 26 | +```bash |
| 27 | +mistral/pixtral-12b-2409:bf16 |
| 28 | +``` |
| 29 | + |
| 30 | +## Compatible Instances |
| 31 | + |
| 32 | +| Instance type | Max context length | |
| 33 | +| ------------- |-------------| |
| 34 | +| H100 | 128k (BF16) |
| 35 | +| H100-2 | 128k (BF16) |
| 36 | + |
| 37 | +## Model introduction |
| 38 | + |
| 39 | +Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. |
| 40 | +It can analyze images and offer insights from visual content alongside text. |
| 41 | +This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. |
| 42 | + |
| 43 | +Pixtral is open-weight and distributed under the Apache 2.0 license. |
| 44 | + |
| 45 | +## Why is it useful? |
| 46 | + |
| 47 | +- Pixtral allows you to process real world and high resolution images, unlocking capacities such as transcribing handwritten files or payment receipts, extracting information from graphs, captioning images, etc. |
| 48 | +- It offers large context window of up to 128k tokens, particularly useful for RAG applications |
| 49 | +- Pixtral supports variable image sizes and types: PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), as well as non-animated GIF with only one frame (.gif) |
| 50 | + |
| 51 | +<Message type="note"> |
| 52 | + Pixtral 12B can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. |
| 53 | +</Message> |
| 54 | + |
| 55 | +## How to use it |
| 56 | + |
| 57 | +### Sending Inference requests |
| 58 | + |
| 59 | +<Message type="tip"> |
| 60 | + Unlike previous Mistral models, Pixtral can take an `image_url` in the content array. |
| 61 | +</Message> |
| 62 | + |
| 63 | +To perform inference tasks with your Pixtral model deployed at Scaleway, use the following command: |
| 64 | + |
| 65 | +```bash |
| 66 | +curl -s \ |
| 67 | +-H "Authorization: Bearer <IAM API key>" \ |
| 68 | +-H "Content-Type: application/json" \ |
| 69 | +--request POST \ |
| 70 | +--url "https://<Deployment UUID>.ifr.fr-par.scw.cloud/v1/chat/completions" \ |
| 71 | +--data '{ |
| 72 | + "model": "mistral/pixtral-12b-2409:bf16", |
| 73 | + "messages": [ |
| 74 | + { |
| 75 | + "role": "user", |
| 76 | + "content": [ |
| 77 | + {"type" : "text", "text": "Describe this image in detail please."}, |
| 78 | + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, |
| 79 | + {"type" : "text", "text": "and this one as well."}, |
| 80 | + {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}} |
| 81 | + ] |
| 82 | + } |
| 83 | + ], |
| 84 | + "top_p": 1, |
| 85 | + "temperature": 0.7, |
| 86 | + "stream": false |
| 87 | +}' |
| 88 | +``` |
| 89 | + |
| 90 | +Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. |
| 91 | + |
| 92 | +<Message type="tip"> |
| 93 | + The model name allows Scaleway to put your prompts in the expected format. |
| 94 | +</Message> |
| 95 | + |
| 96 | +<Message type="note"> |
| 97 | + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. |
| 98 | +</Message> |
| 99 | + |
| 100 | +### Passing images to Pixtral |
| 101 | + |
| 102 | +1. Image URLs |
| 103 | +If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding. |
| 104 | + |
| 105 | +2. Base64 encoded image |
| 106 | +Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet. |
| 107 | + |
| 108 | +The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. |
| 109 | + |
| 110 | + |
| 111 | +```python |
| 112 | +import base64 |
| 113 | +from io import BytesIO |
| 114 | +from PIL import Image |
| 115 | + |
| 116 | +def encode_image(img): |
| 117 | + buffered = BytesIO() |
| 118 | + img.save(buffered, format="JPEG") |
| 119 | + encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8") |
| 120 | + return encoded_string |
| 121 | + |
| 122 | +img = Image.open("path_to_your_image.jpg") |
| 123 | +base64_img = encode_image(img) |
| 124 | + |
| 125 | +payload = { |
| 126 | + "messages": [ |
| 127 | + { |
| 128 | + "role": "user", |
| 129 | + "content": [ |
| 130 | + {"type": "text", "text": "What is this image?"}, |
| 131 | + { |
| 132 | + "type": "image_url", |
| 133 | + "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}, |
| 134 | + }, |
| 135 | + ], |
| 136 | + } |
| 137 | + ], |
| 138 | + ... # other parameters |
| 139 | +} |
| 140 | + |
| 141 | +``` |
| 142 | + |
| 143 | +### Receiving Managed Inference responses |
| 144 | + |
| 145 | +Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. |
| 146 | +Process the output data according to your application's needs. The response will contain the output generated by the visual language model based on the input provided in the request. |
| 147 | + |
| 148 | +<Message type="note"> |
| 149 | + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/ai-data/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. |
| 150 | +</Message> |
| 151 | + |
| 152 | +## Frequently Asked Questions |
| 153 | + |
| 154 | +#### What types of images are supported by Pixtral? |
| 155 | +- Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular. |
| 156 | +- Vector image formats (SVG, PSD) are not supported. |
| 157 | + |
| 158 | +#### Are other files supported? |
| 159 | +Only bitmaps can be analyzed by Pixtral, PDFs and videos are not supported. |
| 160 | + |
| 161 | +#### Is there a limit to the size of each image? |
| 162 | +The only limitation is in context window (1 token for each 16x16 pixel). |
| 163 | + |
| 164 | +#### What is the maximum amount of images per conversation? |
| 165 | +One conversation can handle up to 12 images (per request). The 13rd will return a 413 error. |
0 commit comments