Retrieval-Augmented Generation (RAG) System for Legal Document Q&A This project demonstrates how to build a Retrieval-Augmented Generation (RAG) pipeline that can answer questions about legal documents and retrieve relevant clauses using open-source tools like LangChain, FAISS, and Hugging Face Transformers.
π Use case: Automating legal document understanding β including contract clause retrieval, regulatory Q&A, and compliance insights.
π Features β Load and process legal contract datasets.
π Intelligent document chunking using RecursiveCharacterTextSplitter.
π Semantic search via dense embeddings and FAISS vector store.
π€ Query answering using LLMs like Mistral-7B or other Hugging Face-hosted models.
π End-to-end RAG pipeline using LangChain.
βοΈ Modular and extendable β plug in your own datasets, models, or prompts.
π§ Technologies Used LangChain β for chaining retrieval and LLM generation.
SentenceTransformers β for generating text embeddings (MiniLM-L6-v2).
FAISS β for fast similarity search over document chunks.
Transformers (Hugging Face) β to load and run LLMs.
BitsAndBytes β for 4-bit quantized LLM loading.
Google Colab β for development and GPU experimentation.
Folder structure: . βββ RAG_project3.ipynb # Jupyter Notebook with full pipeline βββ data/ # (optional) directory for PDF or text contracts βββ README.md # You are here!