A basic, end-to-end Retrieval-Augmented Generation (RAG) pipeline that combines image and text retrieval with generative AI. Uses Google Gemini for multi-modal generation, OpenAI CLIP for embeddings, and ChromaDB for vector search.
- Multi-modal (image + text) semantic search and extraction
- Google Gemini LLM integration for document analysis
- OpenAI CLIP for image and text embeddings
- ChromaDB for fast vector search
- Docker support for easy deployment
- Clone the repo:
git clone <your-repo-url> cd hostingLLM
- Install dependencies:
Or use Docker:
pip install -r requirements.txt
docker build -t hostingllm . docker run --env-file .env hostingllm - Set up your
.envfile:GOOGLE_API_KEY=your_google_api_key_here
python index_image.pyThis will embed all images in the docs/ folder and store them in ChromaDB.
Edit retrieve_and_generate.py to set your query and prompt, then run:
python retrieve_and_generate.pypython geminivllm.pyThis runs a simple Gemini demo on a single image and prompt.
python main.pyThis runs a text-only RAG pipeline using SentenceTransformers, FAISS, and vllm.
GOOGLE_API_KEY: Your Google Gemini API key (required)
index_image.py— Indexes images into ChromaDBretrieve_and_generate.py— Retrieves relevant images and runs Geminigeminivllm.py— Gemini single-image demomain.py— (Optional) Text-only RAG demodocs/— Sample images and test files
MIT