Skip to content

Add images field support for Ollama generate endpoint (llava model) #421

@AlbertoPolo

Description

@AlbertoPolo

Expected Behavior

Ollama generate API allows an additional field which is not present on the model, it's the "images" field, which has to be an array of base 64 encoded images.
With that field we can ask models like "llava" about those images.

E.g. take this request for generate endpoint, with the b64 contents of just a capture from a given text:
Request:
curl --location 'http://localhost:11434/api/generate' \ --header 'Content-Type: application/json' \ --data '{ "model": "llava", "prompt": "What does this image say?", "stream": false, "images":["iVBORw..."]

Gives us the following response:

{ "model": "llava", "created_at": "2024-03-10T08:15:19.437032Z", "response": " The image shows a text that says:\n\n\"This is an example text\" ", ... }

So, the expected behaviour is that for the chat client, I should be able to send an image resource as an attachment for any prompt.

Or, I should be able to extend current implementation to support needed functionallity not yet supported.

Current Behavior

I cannot ask Ollama about the contents of any image, as the "images" field is not defined in the payload record org.springframework.ai.ollama.api.OllamaApi.GenerateRequest.

Context

Local setup of Ollama with "llava" model, trying to get explanations, descriptions or insights about an image. Need to send both the text and the image and found that Ollama generate endpoint options are not fully supported, in particular the "images" field.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions