-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Description
The backend must support receiving images from the frontend, process them, and pass the extracted content to the LLM (Large Language Model). This enables users to upload images (e.g. screenshots) and receive relevant answers based on the content in those images.
Proposed Solution
I am sure http and python/fastAPI has a way of doing this
Task Checklist
- POST /image-endpoint i FastAPI
- Sent the image to Azure OpenAI Vision or another image-to-text model
- Embedded the generated image caption using existing embedding services
- Stored the embedding and metadata in ChromaDB with traceable content
- LLM responds based on both the image content and retrieved contextual documents
- Teste with local temporary file storage before integrating with the frontend
Reactions are currently unavailable