VedicGPT - RAG-Based Question Answering System

A Retrieval-Augmented Generation (RAG) system that uses Cohere LLM to answer questions about Vedic scriptures with source citations.

Features

📚 Loads all text/markdown files from the data folder
🔍 Uses FAISS for efficient vector search (cached on disk)
🧠 Supports Cohere or local SentenceTransformer embeddings
🤖 Powered by Cohere's Command-R-Plus model for answers
💬 Interactive chat interface with Gradio
📎 Automatic source citation for all answers
🕉️ Optimized for Vedic scripture queries

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Get Cohere API Key

Visit Cohere Dashboard
Sign up or log in
Navigate to API Keys section
Copy your API key

3. Configure Environment

Edit the .env file and add your Cohere API key (needed for chat responses):

COHERE_API_KEY=your-actual-api-key-here

Optional environment variables:

# Use Cohere for embeddings (requires higher rate limits) or keep default local model
EMBEDDING_PROVIDER=local   # options: local, cohere
LOCAL_EMBED_MODEL=sentence-transformers/all-MiniLM-L6-v2
COHERE_CHAT_MODEL=command-r-plus-08-2024
COHERE_EMBED_MODEL=embed-english-v3.0

4. Add Your Data

Place your Vedic text files (.txt or .md) in the data folder. The system will automatically:

Load all documents
Split them into chunks
Create a vector database
Enable semantic search

Usage

Run the Application

python app.py

The application will:

Load all documents from the data folder
Create a vector store for semantic search
Launch a web interface at http://localhost:7860

Chat Interface

Open your browser to http://localhost:7860
Type your question in the text box
Click "Send" or press Enter
Receive answers with source citations

Example Questions

"What is the meaning of dharma in the Bhagavad Gita?"
"Explain the concept of karma"
"What does the Gita say about meditation?"

How It Works

Document Loading: All .txt and .md files from the data folder are loaded
Text Chunking: Documents are split into manageable chunks with overlap
Embedding: Each chunk is converted to vector embeddings (Cohere or local models)
Vector Store: FAISS indexes the embeddings for fast retrieval and is cached on disk
Query Processing: User questions are embedded and matched against the database
Context Building: Relevant chunks are retrieved and formatted
LLM Response: Cohere generates an answer based on the context
Source Citation: The response includes references to source documents

Project Structure

VedicGPT/
├── app.py                 # Main RAG application
├── ocr.py                 # PDF to text conversion
├── data/                  # Your text documents
├── requirements.txt       # Python dependencies
├── .env                   # Environment variables (API keys)
└── README.md             # This file

Technologies Used

Cohere: LLM for question answering
LangChain: Framework for RAG pipeline
FAISS: Vector database for semantic search
Gradio: Web interface for chat
PyMuPDF: PDF processing
Tesseract: OCR for scanned documents

Tips

For better results, ensure your documents are well-formatted
Use the OCR script (ocr.py) to convert PDF files to text
The system works best with clear, structured text
Adjust chunk_size and k (number of retrieved docs) for optimization

Troubleshooting

Error: "No module named 'cohere'"

pip install -r requirements.txt

Error: "Invalid API key"

Check your .env file has the correct Cohere API key
Ensure there are no extra spaces or quotes

Error: "TooManyRequestsError" when building embeddings

Leave EMBEDDING_PROVIDER=local to avoid Cohere embedding rate limits
Upgrade to a production Cohere key if you must use Cohere embeddings
Delete the storage/vectorstore folder if you need to rebuild the cache

No documents loaded

Verify files exist in the data folder
Check file extensions are .txt or .md
Ensure files have UTF-8 encoding

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VedicGPT - RAG-Based Question Answering System

Features

Setup

1. Install Dependencies

2. Get Cohere API Key

3. Configure Environment

4. Add Your Data

Usage

Run the Application

Chat Interface

Example Questions

How It Works

Project Structure

Technologies Used

Tips

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
storage/vectorstore/faiss_index		storage/vectorstore/faiss_index
.env		.env
README.md		README.md
app.py		app.py
ocr.py		ocr.py
requirements.txt		requirements.txt

L0veMathur/VedicGPT

Folders and files

Latest commit

History

Repository files navigation

VedicGPT - RAG-Based Question Answering System

Features

Setup

1. Install Dependencies

2. Get Cohere API Key

3. Configure Environment

4. Add Your Data

Usage

Run the Application

Chat Interface

Example Questions

How It Works

Project Structure

Technologies Used

Tips

Troubleshooting

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages