Skip to content

[Feature] Image functionality #120

@Merri04

Description

@Merri04

Description

The backend must support receiving images from the frontend, process them, and pass the extracted content to the LLM (Large Language Model). This enables users to upload images (e.g. screenshots) and receive relevant answers based on the content in those images.

Proposed Solution

I am sure http and python/fastAPI has a way of doing this

Task Checklist

  • POST /image-endpoint i FastAPI
  • Sent the image to Azure OpenAI Vision or another image-to-text model
  • Embedded the generated image caption using existing embedding services
  • Stored the embedding and metadata in ChromaDB with traceable content
  • LLM responds based on both the image content and retrieved contextual documents
  • Teste with local temporary file storage before integrating with the frontend

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions