An intelligent document search and question-answering system powered by Retrieval-Augmented Generation (RAG). Get instant, verified answers from official documents with source attribution and confidence scoring. Designed for government, enterprises, and organizations handling large document collections.
No hallucinations. No guessing. Just answers backed by your documents.
- π Easy Document Upload - Load PDF and text files via intuitive web interface
- β‘ Instant Answers - Get responses in seconds, not minutes
- π Source Verification - Every answer shows which document it came from, with page numbers
- π Confidence Indicators - Know how confident the system is in each answer (High/Medium/Low/Insufficient)
- β Zero Hallucinations - Only answers from your documents; system clearly states when it doesn't know
- π§ͺ Built-in Testing - Quality assurance tools to verify system accuracy and performance
- π Secure & Local - All document processing happens on your infrastructure; no external API calls with sensitive data
- π Clean Web UI - Professional, accessible interface designed for non-technical users
- βοΈ Production-Ready - Evaluation metrics, guardrails, relevance filtering, and confidence scoring
Click one button and upload your PDFs or text files. System processes them in 2-5 minutes for hundreds of pages.
Type natural questions: "What is our vacation policy?"
β Employees receive 20 days of annual leave per year,
plus 8 statutory bank holidays.
π’ Confidence: High Confidence
Based on your documents
Found 2 relevant sections
Click on reference documents to verify answers against original text.
| Field | Use Case |
|---|---|
| HR & Pensions | Staff Q&A on policies, benefits, leave entitlements |
| Health & Safety | Instant access to procedures, regulations, incident reporting |
| Compliance & Audit | Answer audit questions, demonstrate policy adherence |
| Legal & Contracts | Search contracts for terms, conditions, obligations |
| Government | Public policy, citizen inquiries, internal documentation |
| Enterprise | Internal knowledge base, SOPs, training materials |
| Customer Support | Answer customer questions from knowledge base |
Upload PDFs β Text Extraction β Break into Sections β
Create AI Embeddings β Store in Vector Index
(One-time setup: 2-5 minutes for hundreds of documents)
User Question β Convert to Embedding β Search Index β
Find Matching Sections β Generate Answer β Apply Guardrails β
Return Answer + Sources + Confidence
(Typical response: <5 seconds)
Multiple techniques ensure answers come from your documents:
- Retrieval Filtering: Discard low-relevance matches
- Grounded Prompting: Explicit instructions to only use provided context
- Temperature Control: Low temperature (0.1) for factual responses
- Post-Generation Guardrails: Validate answers against sources
- Confidence Scoring: Transparent scoring based on match quality
- Python 3.10+
- Ollama installed and running (
pip install ollamaor download) - A model downloaded (
ollama pull llama2orollama pull neural-chat) - 4GB+ RAM for optimal performance
- Docker (for containerized deployment)
- PostgreSQL (for production vector store)
git clone https://github.com/yourusername/rag_knowledge_assistant.git
cd rag_knowledge_assistantpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtPlace your PDF or TXT files in data/sample_docs/
ollama serve
# In another terminal:
ollama pull llama2uvicorn main:app --reloadVisit http://localhost:8000 in your browser.
GET /health
Returns system status, Ollama readiness, and document count.
Response:
{
"status": "healthy",
"ollama": {
"configured_model": "llama2",
"model_ready": true
},
"vector_store_chunks": 250
}POST /ingest
Process all documents in data/sample_docs/ and create searchable index.
Response:
{
"documents_processed": 5,
"chunks_created": 250,
"status": "success"
}POST /query
Request:
{
"question": "How many days of vacation do employees get?"
}Response:
{
"answer": "Employees receive 20 days of PTO per year plus 8 statutory bank holidays.",
"confidence": "high",
"is_grounded": true,
"retrieval_count": 3,
"sources": [
{
"chunk": {
"source": "employee_handbook.pdf",
"page_number": 5,
"content": "All full-time employees receive 20 days PTO..."
},
"similarity_score": 0.92
}
]
}POST /evaluate
Run test suite to measure system accuracy and performance.
Response:
{
"summary": {
"total_tests": 10,
"correct": 9,
"accuracy": 0.90
},
"results": [...]
}Edit config.py to customize system behavior:
# Document Processing
CHUNK_SIZE = 512 # Characters per section
CHUNK_OVERLAP = 50 # Overlap for context
# Retrieval
SIMILARITY_THRESHOLD = 0.3 # Minimum match score (0.0-1.0)
TOP_K = 5 # Number of chunks to retrieve
# LLM
LLM_TEMPERATURE = 0.1 # Lower = more factual
LLM_MAX_TOKENS = 500 # Response length
# Model Selection
OLLAMA_MODEL = "llama2" # Model to use
OLLAMA_BASE_URL = "http://localhost:11434"rag_knowledge_assistant/
βββ main.py # FastAPI application & routes
βββ config.py # Configuration settings
βββ requirements.txt # Python dependencies
β
βββ models/
β βββ __init__.py
β βββ schemas.py # Pydantic data models
β
βββ services/
β βββ __init__.py
β βββ document_loader.py # PDF/TXT file processing
β βββ chunker.py # Text segmentation
β βββ embeddings.py # Vector embeddings
β βββ vector_store.py # FAISS indexing
β βββ retriever.py # Similarity search + filtering
β βββ prompt_builder.py # Grounded prompt construction
β βββ llm_services.py # Ollama integration
β βββ guardrails.py # Output validation
β βββ evaluator.py # Quality metrics & testing
β
βββ data/
β βββ sample_docs/ # Your documents (PDF/TXT)
β βββ faiss_index/ # Vector store (auto-generated)
β
βββ index.html # Web UI
β
βββ README.md # This file
βββ GOVERNMENT_PRESENTATION.md # Stakeholder presentation guide
βββ DEMO_GUIDE.md # Live demo walkthrough
βββ UI_UX_REDESIGN.md # Design documentation
βββ PRESENTATION_RESOURCE_GUIDE.md # Complete resource guide
Traditional LLMs "hallucinate" - they make up information not in their training data or provided context.
# Only use chunks with high similarity (>0.3)
if similarity_score < SIMILARITY_THRESHOLD:
return "I don't know"system_prompt = """
You are a helpful assistant. IMPORTANT:
- Only answer using the provided context
- If the answer is not in the context, say "I don't know"
- Do not use your training knowledge
- Always cite which document you're using
"""# Temperature 0.1 = deterministic, factual
# Temperature 0.7 = creative, unreliable
temperature = 0.1# Check if answer contains hedging language
if "probably" in answer or "I think" in answer:
confidence = "medium"
# Verify numbers appear in source docs
if any_number_in_answer not in retrieved_context:
confidence = "low"# Be transparent about confidence
response = {
"answer": "...",
"confidence": "high", # Based on match quality
"is_grounded": True, # Answer from documents
"sources": [...] # Show proof
}- Accuracy: Are answers correct? (Run evaluation suite)
- Latency: How fast are responses? (Target: <5s)
- Coverage: What % of questions can be answered? (Target: >85%)
- Confidence: How certain is the system? (High/Medium/Low)
curl -X POST http://localhost:8000/evaluateExample output: 90% accuracy across test suite
This project includes comprehensive resources for stakeholder presentations:
- GOVERNMENT_PRESENTATION.md - Complete guide for executive stakeholders
- DEMO_GUIDE.md - Step-by-step demo walkthrough with scripts
- PRESENTATION_RESOURCE_GUIDE.md - Master coordination guide
- UI_UX_REDESIGN.md - Design documentation
Quick Stats for Decision Makers:
- β±οΈ Saves 2-3 hours per staff member per week
- π° ~Β£9,000 annual savings per 200-person organization
- π 85-95% typical accuracy
- π 100% local, no cloud dependencies
uvicorn main:app --reloaduvicorn main:app --host 0.0.0.0 --port 8000 --workers 4docker build -t rag-assistant .
docker run -p 8000:8000 -v $(pwd)/data:/app/data rag-assistantInstructions for AWS, Azure, Google Cloud in DEPLOYMENT.md (coming soon)
- β Local Processing: All data stays on your servers
- β No External APIs: No sensitive data sent to external services
- β HTTPS Ready: Deploy with SSL/TLS certificates
- β Authentication: Add API key or OAuth as needed
- β Audit Logging: Log all queries for compliance
- Enable HTTPS/SSL
- Add authentication (API keys, OAuth)
- Set up logging and monitoring
- Regular security audits
- Backup vector database
- Access control on document uploads
Problem: Indexing 10,000+ documents slowly
Solutions:
- Use PostgreSQL with pgvector instead of FAISS
- Enable GPU acceleration if available
- Batch document processing in background jobs
- Add caching layer (Redis) for repeated queries
See PERFORMANCE.md for detailed optimization guide.
Contributions welcome!
- Support for additional document formats (DOCX, Excel, HTML)
- Multi-language support
- UI enhancements and accessibility improvements
- Performance optimizations
- Additional LLM integrations (OpenAI, Anthropic, local models)
- Production deployment guides
- Test coverage expansion
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
This project is licensed under the MIT License - see LICENSE file for details.
Q: "System shows 'API Disconnected'"
- Ensure
uvicorn main:app --reloadis running - Check if port 8000 is available
Q: "Ollama Not Ready"
- Run
ollama servein a separate terminal - Download model:
ollama pull llama2
Q: "Answers seem inaccurate"
- Run evaluation:
curl -X POST http://localhost:8000/evaluate - Check document quality in
data/sample_docs/ - Adjust SIMILARITY_THRESHOLD in config.py
Q: "Slow response times"
- Check available RAM (need 4GB+)
- Reduce TOP_K in config.py
- Consider GPU acceleration
- π Read GOVERNMENT_PRESENTATION.md for conceptual questions
- π¬ Check DEMO_GUIDE.md for usage examples
- π File an issue on GitHub for bugs
- π¬ Start a discussion for feature requests
- Web UI file upload modal
- Multi-language support
- DOCX and Excel file support
- Advanced analytics dashboard
- Feedback loop for continuous improvement
- Mobile app
- Voice-based queries
- Integration with Slack/Teams
- Issues & Bugs: GitHub Issues
- Feature Requests: GitHub Discussions
- General Questions: Start a Discussion or open an Issue
Built with:
- FastAPI - Modern web framework
- FAISS - Vector similarity search
- Sentence Transformers - Text embeddings
- Ollama - Local LLM inference
- Pydantic - Data validation
If you use this project in your research or publication, please cite:
@software{rag_knowledge_assistant,
title = {Government Knowledge Assistant},
author = {Richard Ogundele},
year = {2026},
url = {https://github.com/richardogundele/rag_knowledge_assistant}
}Ready to get started? β‘
- Install:
pip install -r requirements.txt - Add documents: Place PDFs in
data/sample_docs/ - Start:
uvicorn main:app --reload - Visit:
http://localhost:8000
Questions? See GOVERNMENT_PRESENTATION.md or open an issue on GitHub.
Made with β€οΈ for government, enterprises, and organizations
β Star us on GitHub | π Documentation | π Report Issue
Edit config.py to tune:
| Parameter | Default | Description |
|---|---|---|
CHUNK_SIZE |
512 | Characters per chunk |
CHUNK_OVERLAP |
50 | Overlap between chunks |
SIMILARITY_THRESHOLD |
0.3 | Minimum retrieval score |
TOP_K |
5 | Number of chunks to retrieve |
LLM_TEMPERATURE |
0.1 | LLM creativity (lower = more factual) |
Run the built-in test suite:
curl -X POST http://localhost:8000/evaluateTests include:
- Questions that SHOULD be answerable (checks for correct retrieval)
- Questions that should NOT be answerable (checks "I don't know" behavior)
For production deployment:
- Vector Store: Replace FAISS with managed solution (Pinecone, Weaviate, Qdrant)
- Caching: Add Redis for repeated query caching
- Observability: Track retrieval metrics, latency, confidence distributions
- Feedback Loop: Collect user ratings for continuous improvement
- A/B Testing: Test prompt variations systematically
- Authentication: Add API key or OAuth protection
MIT