Skip to content

Commit 616a568

Browse files
Merge pull request #321 from mistralai/fix-image-vision
Fix Vision Images
2 parents b9670be + 1b772fd commit 616a568

File tree

2 files changed

+5
-110
lines changed

2 files changed

+5
-110
lines changed

docs/capabilities/vision.md

Lines changed: 5 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ messages = [
4848
},
4949
{
5050
"type": "image_url",
51-
"image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
51+
"image_url": "https://docs.mistral.ai/img/eiffel-tower-paris.jpg"
5252
}
5353
]
5454
}
@@ -83,7 +83,7 @@ const chatResponse = await client.chat.complete({
8383
{ type: "text", text: "What's in this image?" },
8484
{
8585
type: "image_url",
86-
imageUrl: "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg",
86+
imageUrl: "https://docs.mistral.ai/img/eiffel-tower-paris.jpg",
8787
},
8888
],
8989
},
@@ -112,7 +112,7 @@ curl https://api.mistral.ai/v1/chat/completions \
112112
},
113113
{
114114
"type": "image_url",
115-
"image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
115+
"image_url": "https://docs.mistral.ai/img/eiffel-tower-paris.jpg"
116116
}
117117
]
118118
}
@@ -308,51 +308,6 @@ The chart is a bar chart titled 'France's Social Divide,' comparing socio-econom
308308

309309
</details>
310310

311-
<details>
312-
<summary><b>Compare images</b></summary>
313-
314-
![](https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg)
315-
316-
![](https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg)
317-
318-
```bash
319-
curl https://api.mistral.ai/v1/chat/completions \
320-
-H "Content-Type: application/json" \
321-
-H "Authorization: Bearer $MISTRAL_API_KEY" \
322-
-d '{
323-
"model": "pixtral-12b-2409",
324-
"messages": [
325-
{
326-
"role": "user",
327-
"content": [
328-
{
329-
"type": "text",
330-
"text": "what are the differences between two images?"
331-
},
332-
{
333-
"type": "image_url",
334-
"image_url": "https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg"
335-
},
336-
{
337-
"type": "image_url",
338-
"image_url": {
339-
"url": "https://assets.visitorscoverage.com/production/wp-content/uploads/2024/04/AdobeStock_626542468-min-1024x683.jpeg"
340-
}
341-
}
342-
]
343-
}
344-
],
345-
"max_tokens": 300
346-
}'
347-
```
348-
349-
Model output:
350-
```
351-
The first image features the Eiffel Tower surrounded by snow-covered trees and pathways, with a clear view of the tower's intricate iron lattice structure. The second image shows the Eiffel Tower in the background of a large, outdoor stadium filled with spectators, with a red tennis court in the center. The most notable differences are the setting - one is a winter scene with snow, while the other is a summer scene with a crowd at a sporting event. The mood of the first image is serene and quiet, whereas the second image conveys a lively and energetic atmosphere. These differences highlight the versatility of the Eiffel Tower as a landmark that can be enjoyed in various contexts and seasons.
352-
```
353-
354-
</details>
355-
356311
<details>
357312
<summary><b>Transcribe receipts</b></summary>
358313

@@ -428,68 +383,6 @@ Model output:
428383

429384
</details>
430385

431-
<details>
432-
<summary><b>OCR with structured output</b></summary>
433-
434-
![](https://i.imghippo.com/files/kgXi81726851246.jpg)
435-
436-
```bash
437-
curl https://api.mistral.ai/v1/chat/completions \
438-
-H "Content-Type: application/json" \
439-
-H "Authorization: Bearer $MISTRAL_API_KEY" \
440-
-d '{
441-
"model": "pixtral-12b-2409",
442-
"messages": [
443-
{
444-
"role": "system",
445-
"content": [
446-
{"type": "text",
447-
"text" : "Extract the text elements described by the user from the picture, and return the result formatted as a json in the following format : {name_of_element : [value]}"
448-
}
449-
]
450-
},
451-
{
452-
"role": "user",
453-
"content": [
454-
{
455-
"type": "text",
456-
"text": "From this restaurant bill, extract the bill number, item names and associated prices, and total price and return it as a string in a Json object"
457-
},
458-
{
459-
"type": "image_url",
460-
"image_url": "https://i.imghippo.com/files/kgXi81726851246.jpg"
461-
}
462-
]
463-
}
464-
],
465-
"response_format":
466-
{
467-
"type": "json_object"
468-
}
469-
}'
470-
471-
```
472-
473-
Model output:
474-
```json
475-
{'bill_number': '566548',
476-
'items': [{'item_name': 'BURGER - MED RARE', 'price': 10},
477-
{'item_name': 'WH/SUB POUTINE', 'price': 2},
478-
{'item_name': 'BURGER - MED RARE', 'price': 10},
479-
{'item_name': 'WH/SUB BSL - MUSH', 'price': 4},
480-
{'item_name': 'BURGER - MED WELL', 'price': 10},
481-
{'item_name': 'WH BREAD/NO ONION', 'price': 2},
482-
{'item_name': 'SUB POUTINE - MUSH', 'price': 2},
483-
{'item_name': 'CHK PESTO/BR', 'price': 9},
484-
{'item_name': 'SUB POUTINE', 'price': 2},
485-
{'item_name': 'SPEC OMELET/BR', 'price': 9},
486-
{'item_name': 'SUB POUTINE', 'price': 2},
487-
{'item_name': 'BSL', 'price': 8}],
488-
'total_price': 68}
489-
```
490-
491-
</details>
492-
493386
## FAQ
494387

495388
- **What is the price per image?**
@@ -502,6 +395,8 @@ Model output:
502395

503396
| Model | Max Resolution | ≈ Formula | ≈ N Max Tokens |
504397
| - | - | - | - |
398+
| Magistral Medium 1.2 | 1540x1540 | `≈ (ResolutionX * ResolutionY) / 784` | ≈ 3025 |
399+
| Magistral Small 1.2 | 1540x1540 | `≈ (ResolutionX * ResolutionY) / 784` | ≈ 3025 |
505400
| Mistral Small 3.2 | 1540x1540 | `≈ (ResolutionX * ResolutionY) / 784` | ≈ 3025 |
506401
| Mistral Medium 3 | 1540x1540 | `≈ (ResolutionX * ResolutionY) / 784` | ≈ 3025 |
507402
| Mistral Small 3.1 | 1540x1540 | `≈ (ResolutionX * ResolutionY) / 784` | ≈ 3025 |

static/img/eiffel-tower-paris.jpg

11.5 MB
Loading

0 commit comments

Comments
 (0)