MULTIMODAL_RAG: Multimodal Retrieval-Augmented Generation System

Project Overview

MULTIMODAL_RAG is a robust Retrieval-Augmented Generation (RAG) system that supports both text and image modalities. It leverages advanced context management, a finite state machine (FSM) for query routing, and async/await for high-performance rendering of inline images and graphs. The project is designed for research, portfolio, and real-world applications where rich, context-aware responses are required.

Directory Structure

MULTIMODAL_RAG/
├── rag_pipeline.py           # Main RAG pipeline
├── streamlit_app.py          # Streamlit UI for demo/visualization
├── inspect_db.py             # Inspect ChromaDB contents
├── requirements.txt          # Python dependencies
├── Dockerfile                # Containerization
├── test.py                   # Test scripts
├── track.txt                 # Tracking file
├── data/
│   ├── text/                 # Markdown files
│   ├── pdfs/                 # PDF files
│   └── images/               # Image files
├── chroma_db/                # ChromaDB vector store
├── src/
│   ├── utils.py              # Utility functions
│   ├── logger.py             # Logging setup
│   ├── exception.py          # Custom exceptions
│   ├── streamlit_utils.py    # Streamlit helpers
│   └── __init__.py
└── .env                      # Environment variables

Installation & Setup

Clone the repository:

git clone https://github.com/Shiv-Expert2503/MULTIMODAL_RAG.git
cd MULTIMODAL_RAG

Install dependencies:

pip install -r requirements.txt
# For markdown/pdf/image support:
pip install unstructured markdown pypdf2 pillow sentence-transformers chromadb langchain langchain-community langchain-google-genai python-dotenv

Set up environment variables:
- Create a .env file with your Google API key:
```
GOOGLE_API_KEY=your_google_api_key_here
```
Prepare your data:
- Place markdown files in data/text/, PDFs in data/pdfs/, and images in data/images/.

Data Preparation

Text Data:
- Markdown files are loaded and chunked for semantic retrieval.
- PDF files are parsed and chunked using langchain loaders.
Image Data:
- Images are embedded using CLIP (SentenceTransformer) and stored in ChromaDB.
High-Quality RAG Enrichment:
- Each chunk is enriched with metadata (source, page, context).
- Embeddings are generated and stored in batches for efficiency.

RAG Pipeline

Text Embedding:
- Uses Google Generative AI Embeddings for text chunks.
Image Embedding:
- Uses CLIP model for image embeddings.
Storage:
- Embeddings and metadata are stored in ChromaDB collections (portfolio_text, portfolio_images).
Retrieval:
- Queries are matched against both text and image embeddings for multimodal responses.

Advanced Features

Finite State Machine (FSM)

Purpose:
- Routes user queries to the correct handler (text/image/general).
- Maintains conversation state and topic transitions.
Implementation:
- Each query is classified (topic, intent, similarity).
- FSM decides whether to answer, rewrite, or reject based on similarity and gap thresholds.
- Example:
```
if similarity > threshold and gap > min_gap:
    state = 'accepted'
else:
    state = 'rejected'
```

Context Memory & Caching

Context Memory:
- Stores previous queries, responses, and metadata for continuity.
- Enables context-aware answers and follow-ups.
Cache:
- Frequently accessed queries and embeddings are cached for fast retrieval.
- Implemented as a local JSON or in-memory cache.
- Example:
```
cache = {}
def get_from_cache(query):
    return cache.get(query)
```

Async/Await for Fast Rendering

Async Processing:
- Embedding generation and retrieval are performed asynchronously for speed.
- Streamlit app uses async to render images and graphs inline without blocking UI.
- Example:
```
import asyncio
async def embed_and_store(...):
    await embedding_model.encode_async(...)
```
Inline Rendering:
- Images and graphs are displayed in real-time using Streamlit's st.image and st.pyplot.

Usage

Run the RAG pipeline:
```
python rag_pipeline.py
```
Start the Streamlit app for visualization:
```
streamlit run streamlit_app.py
```
Interact with the system:
- Ask questions about text or image data.
- View inline images and graphs in the UI.

Troubleshooting

Missing dependencies:
- Ensure all packages in requirements.txt are installed.
ChromaDB errors:
- Delete and recreate the chroma_db/ directory if corrupted.
API key issues:
- Check .env for correct Google API key.
Async errors:
- Ensure Python 3.8+ for async/await support.

Contributing

Fork the repo and submit pull requests.
Open issues for bugs or feature requests.
Follow best practices for code quality and documentation.

Author: Shivansh (Shiv-Expert2503)

License: MIT

Contact: GitHub Issues

This README provides a comprehensive guide to the MULTIMODAL_RAG project, covering everything from setup to advanced features like FSM, context memory, caching, and async rendering. For further details, refer to the code and comments in each module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MULTIMODAL_RAG: Multimodal Retrieval-Augmented Generation System

Table of Contents

Project Overview

Directory Structure

Installation & Setup

Data Preparation

RAG Pipeline

Advanced Features

Finite State Machine (FSM)

Context Memory & Caching

Async/Await for Fast Rendering

Usage

Troubleshooting

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
chroma_db		chroma_db
data		data
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.topic_embeddings_cache.pkl		.topic_embeddings_cache.pkl
Dockerfile		Dockerfile
README.md		README.md
inspect_db.py		inspect_db.py
main.py		main.py
rag_pipeline.py		rag_pipeline.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Shiv-Expert2503/MULTIMODAL_RAG

Folders and files

Latest commit

History

Repository files navigation

MULTIMODAL_RAG: Multimodal Retrieval-Augmented Generation System

Table of Contents

Project Overview

Directory Structure

Installation & Setup

Data Preparation

RAG Pipeline

Advanced Features

Finite State Machine (FSM)

Context Memory & Caching

Async/Await for Fast Rendering

Usage

Troubleshooting

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages