-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Expected Behavior
Ollama generate API allows an additional field which is not present on the model, it's the "images" field, which has to be an array of base 64 encoded images.
With that field we can ask models like "llava" about those images.
E.g. take this request for generate endpoint, with the b64 contents of just a capture from a given text:
Request:
curl --location 'http://localhost:11434/api/generate' \ --header 'Content-Type: application/json' \ --data '{ "model": "llava", "prompt": "What does this image say?", "stream": false, "images":["iVBORw..."]
Gives us the following response:
{ "model": "llava", "created_at": "2024-03-10T08:15:19.437032Z", "response": " The image shows a text that says:\n\n\"This is an example text\" ", ... }
So, the expected behaviour is that for the chat client, I should be able to send an image resource as an attachment for any prompt.
Or, I should be able to extend current implementation to support needed functionallity not yet supported.
Current Behavior
I cannot ask Ollama about the contents of any image, as the "images" field is not defined in the payload record org.springframework.ai.ollama.api.OllamaApi.GenerateRequest.
Context
Local setup of Ollama with "llava" model, trying to get explanations, descriptions or insights about an image. Need to send both the text and the image and found that Ollama generate endpoint options are not fully supported, in particular the "images" field.