-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Description
Existing documentation URL(s)
https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct/
What changes are you suggesting?
All of the code samples for the llama-3.2-11b-vision-instruct Workers AI model do not work. At its most basic, the model is for image recognition, and the sample code never handles an image.
Something similar to this image-to-text model may be better: https://developers.cloudflare.com/workers-ai/models/uform-gen2-qwen-500m/.
I was able to get the following JavaScript code sample to execute in a Worker, however, I'd want someone else to confirm the code is following the best practices for this model:
const res = await fetch("https://cataas.com/cat");
const blob = await res.arrayBuffer();
const encodedImage = [...new Uint8Array(blob)]
const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct',
{
image: encodedImage,
prompt: 'Tell me what is in the image.',
},
);
I also have concerns about the Parameters section (https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct/#Parameters). It's not clear to me whether you need to have Prompt and Messages as input parameters for the model. Based on testing, you can only have one or the other, not both. But nothing states that in the documentation. And if you can only have one why would you choose one over the other?
Additional information
No response