From c0fba953b55ae9af835d78fdf6c74b6e3c9cc224 Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Mon, 18 Nov 2024 11:55:49 +0100 Subject: [PATCH 01/14] feat(ai): draft molmo support --- .../reference-content/molmo-72b-0924.mdx | 168 ++++++++++++++++++ 1 file changed, 168 insertions(+) create mode 100644 ai-data/managed-inference/reference-content/molmo-72b-0924.mdx diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx new file mode 100644 index 0000000000..139f0dafdd --- /dev/null +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -0,0 +1,168 @@ +--- +meta: + title: Understanding the Molmo-72b-0924 model + description: Deploy your own secure Molmo-72b-0924 model with Scaleway Managed Inference. Privacy-focused, fully managed. +content: + h1: Understanding the Molmo-72b-0924 model + paragraph: This page provides information on the Molmo-72b-0924 model +tags: +dates: + validation: 2024-11-18 + posted: 2024-11-18 +categories: + - ai-data +--- + +## Model overview + +| Attribute | Details | +|-----------------|------------------------------------| +| Provider | [Allen Institute for AI](https://molmo.allenai.org/blog) | +| License | Apache 2.0 | | +| Compatible Instances | H100-2 (FP8) | +| Context size | 50k tokens | + +## Model name + +```bash +allenai/molmo-72b-0924:fp8 +``` + +## Compatible Instances + +| Instance type | Max context length | +| ------------- |-------------| +| H100-2 | 50k (FP8) + +## Model introduction + +Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. +Vision-language model like Molmo can analyze images and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. + +Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be full open-source. +Its base model is Qwen2-72B ((Twonyi Qianwen license)[https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE]). + +## Why is it useful? + +- Molmo-72b allows you to process real world and high resolution images, unlocking capacities such as transcribing handwritten files or payment receipts, extracting information from graphs, captioning images, etc. +- This model achieves the [highest academic benchmark scores and second rank on human evaluations](https://huggingface.co/allenai/Molmo-72B-0924#evaluations) as of writing (September 2024) + + + Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. + + + + Molmo-72b was reported to struggle with transparent images. The official recommandation is to add white or dark background to images for the time being. + + +## How to use it + +### Sending Inference requests + + + Unlike regular chat models, Molmo-72b can take an `image_url` in the content array. + + +To perform inference tasks with your Molmo-72b model deployed at Scaleway, use the following command: + +```bash +curl -s \ +-H "Authorization: Bearer " \ +-H "Content-Type: application/json" \ +--request POST \ +--url "https://.ifr.fr-par.scw.cloud/v1/chat/completions" \ +--data '{ + "model": "allenai/molmo-72b-0924:fp8", + "messages": [ + { + "role": "user", + "content": [ + {"type" : "text", "text": "Describe this image in detail please."}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + {"type" : "text", "text": "and this one as well."}, + {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}} + ] + } + ], + "top_p": 1, + "temperature": 0.7, + "stream": false +}' +``` + +Make sure to replace `` and `` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting. + + + The model name allows Scaleway to put your prompts in the expected format. + + + + Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. + + +### Passing images to Molmo-72b + +1. Image URLs +If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding. + +2. Base64 encoded image +Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet. + +The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. + + +```python +import base64 +from io import BytesIO +from PIL import Image + +def encode_image(img): + buffered = BytesIO() + img.save(buffered, format="JPEG") + encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8") + return encoded_string + +img = Image.open("path_to_your_image.jpg") +base64_img = encode_image(img) + +payload = { + "messages": [ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + { + "type": "image_url", + "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}, + }, + ], + } + ], + ... # other parameters +} + +``` + +### Receiving Managed Inference responses + +Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. +Process the output data according to your application's needs. The response will contain the output generated by the visual language model based on the input provided in the request. + + + Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/ai-data/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently. + + +## Frequently Asked Questions + +#### What types of images are supported by Molmo-72b? +- Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular. +- Vector image formats (SVG, PSD) are not supported. + +#### Are other files supported? +Only bitmaps can be analyzed by Molmo, PDFs and videos are not supported. + +#### Is there a limit to the size of each image? +The only limitation is in context window (1 token for each 16x16 pixel). + +#### What is the maximum amount of images per conversation? +One conversation can handle up to 12 images (per request). The 13rd will return a 413 error. \ No newline at end of file From b7392b0b805946009c523059707ef7df07d799ae Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Mon, 18 Nov 2024 11:56:44 +0100 Subject: [PATCH 02/14] feat(ai): edited nav --- menu/navigation.json | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/menu/navigation.json b/menu/navigation.json index 82904cc59b..cfe1f2281e 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -623,6 +623,10 @@ "label": "Mixtral-8x7b-instruct-v0.1 model", "slug": "mixtral-8x7b-instruct-v0.1" }, + { + "label": "Molmo-72b-0924 model", + "slug": "molmo-72b-0924" + }, { "label": "Sentence-t5-xxl model", "slug": "sentence-t5-xxl" From c9f683a898093024a359d41bc9538423373b3979 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Mon, 18 Nov 2024 15:36:29 +0100 Subject: [PATCH 03/14] Apply suggestions from code review Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- .../reference-content/molmo-72b-0924.mdx | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index 139f0dafdd..5fb884dd09 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -37,15 +37,15 @@ allenai/molmo-72b-0924:fp8 ## Model introduction Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. -Vision-language model like Molmo can analyze images and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. +Vision-language models like Molmo can analyze images and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. -Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be full open-source. +Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. Its base model is Qwen2-72B ((Twonyi Qianwen license)[https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE]). ## Why is it useful? - Molmo-72b allows you to process real world and high resolution images, unlocking capacities such as transcribing handwritten files or payment receipts, extracting information from graphs, captioning images, etc. -- This model achieves the [highest academic benchmark scores and second rank on human evaluations](https://huggingface.co/allenai/Molmo-72B-0924#evaluations) as of writing (September 2024) +- This model achieves the [highest academic benchmark scores and ranks second on human evaluation](https://huggingface.co/allenai/Molmo-72B-0924#evaluations) at the time of writing (September 2024) Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. @@ -102,10 +102,10 @@ Make sure to replace `` and `` with your actual [I ### Passing images to Molmo-72b -1. Image URLs +#### Image URLs If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding. -2. Base64 encoded image +#### Base64 encoded image Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet. The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. @@ -158,11 +158,11 @@ Process the output data according to your application's needs. The response will - Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular. - Vector image formats (SVG, PSD) are not supported. -#### Are other files supported? +#### Are other file types supported? Only bitmaps can be analyzed by Molmo, PDFs and videos are not supported. #### Is there a limit to the size of each image? The only limitation is in context window (1 token for each 16x16 pixel). #### What is the maximum amount of images per conversation? -One conversation can handle up to 12 images (per request). The 13rd will return a 413 error. \ No newline at end of file +One conversation can handle up to 12 images (per request). The 13th will return a 413 error. \ No newline at end of file From 96e9f1d71de735ae83ea56d669a21477fefaace4 Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 11:30:52 +0100 Subject: [PATCH 04/14] fix(ai): edited according to tests --- .../reference-content/molmo-72b-0924.mdx | 25 ++++++++----------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index 5fb884dd09..9d3eaed77e 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -7,8 +7,8 @@ content: paragraph: This page provides information on the Molmo-72b-0924 model tags: dates: - validation: 2024-11-18 - posted: 2024-11-18 + validation: 2024-11-27 + posted: 2024-11-27 categories: - ai-data --- @@ -37,7 +37,7 @@ allenai/molmo-72b-0924:fp8 ## Model introduction Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. -Vision-language models like Molmo can analyze images and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. +Vision-language model like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. Its base model is Qwen2-72B ((Twonyi Qianwen license)[https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE]). @@ -51,10 +51,6 @@ Its base model is Qwen2-72B ((Twonyi Qianwen license)[https://huggingface.co/Qwe Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint. - - Molmo-72b was reported to struggle with transparent images. The official recommandation is to add white or dark background to images for the time being. - - ## How to use it ### Sending Inference requests @@ -78,9 +74,7 @@ curl -s \ "role": "user", "content": [ {"type" : "text", "text": "Describe this image in detail please."}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - {"type" : "text", "text": "and this one as well."}, - {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}} + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}} ] } ], @@ -96,9 +90,12 @@ Make sure to replace `` and `` with your actual [I The model name allows Scaleway to put your prompts in the expected format. - - Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content. - +### Known limitations + +- Molmo-72b was reported to struggle with transparent images. The official recommandation is to add white or dark background to images for the time being. +- Molmo-72b chat template doesn't support the system role. Ensure that the `messages` array is properly formatted with user role and assistant role only. +- Molmo-72b isn't able to generate structured outputs (`response_format` parameter not supported) +- Molmo-72b can't do function calling (`tools` parameter not supported) ### Passing images to Molmo-72b @@ -165,4 +162,4 @@ Only bitmaps can be analyzed by Molmo, PDFs and videos are not supported. The only limitation is in context window (1 token for each 16x16 pixel). #### What is the maximum amount of images per conversation? -One conversation can handle up to 12 images (per request). The 13th will return a 413 error. \ No newline at end of file +One conversation can handle maximum 1 image (per request). Sending more than one image will return a 400 error. \ No newline at end of file From 1195012aa6bd92f51cf5d80b60e3dc2a83656e0c Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:22:15 +0100 Subject: [PATCH 05/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index 9d3eaed77e..fa80128bab 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -3,7 +3,7 @@ meta: title: Understanding the Molmo-72b-0924 model description: Deploy your own secure Molmo-72b-0924 model with Scaleway Managed Inference. Privacy-focused, fully managed. content: - h1: Understanding the Molmo-72b-0924 model + h1: Understanding the Molmo-72b-0924 model paragraph: This page provides information on the Molmo-72b-0924 model tags: dates: From 8d50a38b47ed0553f38953c9069fac7ac5b60a2a Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:22:35 +0100 Subject: [PATCH 06/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index fa80128bab..fd22b2500f 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -5,7 +5,7 @@ meta: content: h1: Understanding the Molmo-72b-0924 model paragraph: This page provides information on the Molmo-72b-0924 model -tags: +tags: ai molmo inference dates: validation: 2024-11-27 posted: 2024-11-27 From e09a74a1c822db8ba6e6e864645996ed336a54c7 Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:22:43 +0100 Subject: [PATCH 07/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index fd22b2500f..b18ac97d40 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -40,7 +40,7 @@ Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by Vision-language model like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. -Its base model is Qwen2-72B ((Twonyi Qianwen license)[https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE]). +Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)). ## Why is it useful? From ca0e4ef5e9c006c44a25c4c29572db819475cd8d Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:22:52 +0100 Subject: [PATCH 08/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index b18ac97d40..9d50462499 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -53,7 +53,7 @@ Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwe ## How to use it -### Sending Inference requests +### Sending inference requests Unlike regular chat models, Molmo-72b can take an `image_url` in the content array. From b8b90f2b616f8d735d687f69160a0098af08fc65 Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:23:05 +0100 Subject: [PATCH 09/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- .../managed-inference/reference-content/molmo-72b-0924.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index 9d50462499..f8d8338802 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -93,9 +93,9 @@ Make sure to replace `` and `` with your actual [I ### Known limitations - Molmo-72b was reported to struggle with transparent images. The official recommandation is to add white or dark background to images for the time being. -- Molmo-72b chat template doesn't support the system role. Ensure that the `messages` array is properly formatted with user role and assistant role only. -- Molmo-72b isn't able to generate structured outputs (`response_format` parameter not supported) -- Molmo-72b can't do function calling (`tools` parameter not supported) +- Molmo-72b chat template does not support the system role. Ensure that the `messages` array is properly formatted with user role and assistant role only. +- Molmo-72b is not able to generate structured outputs (`response_format` parameter not supported). +- Molmo-72b cannot do function calling (`tools` parameter not supported). ### Passing images to Molmo-72b From 33f13b290d595d5396bf657e6e12f35bf36409b6 Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:23:16 +0100 Subject: [PATCH 10/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index f8d8338802..da164a4b36 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -156,7 +156,7 @@ Process the output data according to your application's needs. The response will - Vector image formats (SVG, PSD) are not supported. #### Are other file types supported? -Only bitmaps can be analyzed by Molmo, PDFs and videos are not supported. +Only bitmaps can be analyzed by Molmo. PDFs and videos are not supported. #### Is there a limit to the size of each image? The only limitation is in context window (1 token for each 16x16 pixel). From 0ea80db906d3228749e45a7a881a20412268a9c9 Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:23:27 +0100 Subject: [PATCH 11/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index da164a4b36..5a10e3e7d5 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -162,4 +162,4 @@ Only bitmaps can be analyzed by Molmo. PDFs and videos are not supported. The only limitation is in context window (1 token for each 16x16 pixel). #### What is the maximum amount of images per conversation? -One conversation can handle maximum 1 image (per request). Sending more than one image will return a 400 error. \ No newline at end of file +One conversation can handle a maximum of 1 image (per request). Sending more than one image will return a 400 error. \ No newline at end of file From 6e656c24669bd77a2ad19e032c4aa3dc73e9dd4f Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:23:36 +0100 Subject: [PATCH 12/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index 5a10e3e7d5..94392a2e3d 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -159,7 +159,7 @@ Process the output data according to your application's needs. The response will Only bitmaps can be analyzed by Molmo. PDFs and videos are not supported. #### Is there a limit to the size of each image? -The only limitation is in context window (1 token for each 16x16 pixel). +The only limitation is the context window (1 token for each 16x16 pixel). #### What is the maximum amount of images per conversation? One conversation can handle a maximum of 1 image (per request). Sending more than one image will return a 400 error. \ No newline at end of file From 11ed10ea32b65f87e9a4310002a8166969aebdde Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:23:44 +0100 Subject: [PATCH 13/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index 94392a2e3d..2c58dcc889 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -107,7 +107,6 @@ Base64 encoding is a standard way to transform binary data, like images, into a The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload. - ```python import base64 from io import BytesIO From 265869c74b92a77641bc2d5241bee47b99e4548b Mon Sep 17 00:00:00 2001 From: Thibault Genaitay Date: Wed, 27 Nov 2024 17:23:51 +0100 Subject: [PATCH 14/14] Update ai-data/managed-inference/reference-content/molmo-72b-0924.mdx Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- ai-data/managed-inference/reference-content/molmo-72b-0924.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx index 2c58dcc889..fc11e5f1af 100644 --- a/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx +++ b/ai-data/managed-inference/reference-content/molmo-72b-0924.mdx @@ -37,7 +37,7 @@ allenai/molmo-72b-0924:fp8 ## Model introduction Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI. -Vision-language model like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. +Vision-language models like Molmo can analyze an image and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be fully open-source. Its base model is Qwen2-72B ([Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)).