A Retrieval-Augmented Generation system that allows querying local documents using Ollama and vector embeddings.
- π Privacy-focused: All processing happens locally
- π Fast semantic search using FAISS vector database
- π€ Multiple LLM support via Ollama (llama3.2, mistral, gemma)
- π Document retrieval with context-aware answers
- Ollama - Local LLM inference
- LangChain - RAG orchestration
- FAISS - Vector database
- HuggingFace Embeddings - Text embeddings (all-MiniLM-L6-v2)
- Python 3.13
- Install Ollama:
# Download from https://ollama.ai
ollama pull llama3.2- Install Python dependencies:
pip install langchain-community langchain-core sentence-transformers faiss-cpu-
Place your text documents in the
docs/folder -
Run the RAG system:
python rag_demo.py- Ask questions about your documents!
Ask a question about your documents: what is python used for
==================================================
Answer: According to the context, Python is used for:
1. AI
2. Web development
3. Automation
- Documents are loaded and converted to vector embeddings
- Embeddings stored in FAISS vector database
- User question is embedded and used for similarity search
- Retrieved context + question sent to Ollama LLM
- LLM generates answer based on retrieved context