🤖 Local RAG Assistant

This is an Advanced Retrieval-Augmented Generation (RAG) system built locally to chat with private documents. This version is optimized to run 100% locally on your machine using Ollama, saving costs and ensuring data privacy.

🛠️ Tech Stack

Framework: LangChain 🦜🔗
LLM: Llama 3.2:1b (Local via Ollama) 🦙
Embeddings: mxbai-embed-large (1024-dim) 🔢
Vector Store: FAISS (Facebook AI Similarity Search) ⚡
Database: Pickle (Metadata storage) 💾
Environment: Python 3.10+ 🐍

📥 Installation & Setup

Clone the Repository 📂

git clone https://github.com/Shahryar-Sohail/local-rag/
cd local-rag

Create & Activate Virtual Environment 🍦

python -m venv .venv

On Windows:

.venv\Scripts\activate

Install Dependencies 📦

pip install -r requirements.txt

Setup Local Models (Ollama) 📥

ollama pull llama3.2:1b
ollama pull mxbai-embed-large

5.🚀 Running the Project To test the backend pipeline and see the AI in action:

python app.py

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
faiss_store		faiss_store
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Local RAG Assistant

🛠️ Tech Stack

📥 Installation & Setup

On Windows:

⚙️ How It WorksIngestion:

1-PDFs and text files are loaded from the data/ directory.

2-Chunking: Documents are split into manageable pieces using RecursiveCharacterTextSplitter.

3-Embedding: Each chunk is converted into a 1024-dimensional vector using the mxbai-embed-large model.

4-Indexing: Vectors are stored in a FAISS index for high-speed similarity search.

5-Retrieval: When a query is made, the system finds the top-$k$ most relevant chunks.

6-Generation: Llama 3.2 uses the retrieved context to generate a concise, factual summary.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Local RAG Assistant

🛠️ Tech Stack

📥 Installation & Setup

On Windows:

⚙️ How It WorksIngestion:

1-PDFs and text files are loaded from the data/ directory.

2-Chunking: Documents are split into manageable pieces using RecursiveCharacterTextSplitter.

3-Embedding: Each chunk is converted into a 1024-dimensional vector using the mxbai-embed-large model.

4-Indexing: Vectors are stored in a FAISS index for high-speed similarity search.

5-Retrieval: When a query is made, the system finds the top-$k$ most relevant chunks.

6-Generation: Llama 3.2 uses the retrieved context to generate a concise, factual summary.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages