This project implements a Retrieval-Augmented Generation (RAG) based question-answering system over a document. The system ingests a PDF, performs semantic chunking and embedding, stores it in a FAISS vector index, and answers user queries using an open-source LLM.
- Document Ingestion & Chunking
- Loader Used: PyPDFLoader from langchain_community
- Splitter: RecursiveCharacterTextSplitter
- chunk_size: 400 characters
- chunk_overlap: 50 characters
- Separation Strategy: ["\n\n", "\n", " ", ""] to preserve paragraph context
- Embedding & Vector Store
- Model: sentence-transformers/all-MiniLM-L6-v2
- Framework: HuggingFaceEmbeddings
- Storage: FAISS
- Query Interface
- Command-line interface using input()
- Accepts a user query and retrieves top-k relevant chunks
- LLM Integration
- Model: vblagoje/bart_lfqa (HuggingFace)
- Pipeline: text2text-generation
- Prompt includes instruction to answer based on context or indicate no answer
- Output
- Printed answer
- Source document and pages
- Time taken to generate answer
- langchain: document loading and chunking
- transformers: LLM pipeline
- sentence-transformers: embedding generation
- FAISS: vector similarity search
- ChatGPT: used for architectural decisions and class structure
Query: What are the diagnostic criteria for Obsessive-Compulsive Disorder (OCD)?
Output:
Answer: [LLM-generated answer here]
Source: book.pdf


- PDF ingestion
- Chunking with overlap
- FAISS vector store
- HuggingFace LLM-based QA
- CLI interface
- Prompt engineering
- No GUI (Gradio/Streamlit)
- No caching for repeated queries
- No reranking logic
- CPU-only inference
- No dynamic document uploads
- Input is a clean, extractable-text PDF
- All models loadable on CPU
- Responses rely only on provided document context
pip install -r req.txt
python main.py
- Code structure and class design
- Prompt design
- Model selection
- README formatting
This MVP provides an end-to-end RAG QA pipeline with clean modularity, extensibility, and full local execution support.