A full-stack web application for document-based question answering using local LLMs with Retrieval-Augmented Generation (RAG) capabilities.
Apollyon provides a local web interface for local LLMs allowing you to upload large documents, engage with the LLM in conversation, and get accurate answers based on the uploaded content using hybrid search (vector + keyword) and iterative RAG.
- Document Upload & Processing 📄: Upload text/markdown files (up to 10MB) which are automatically chunked and indexed
- Hybrid Search 🔗: Combines vector embeddings with TF-IDF keyword search for better retrieval
- Iterative RAG 🔄: Multiple retrieval iterations to gather comprehensive context before answering
- Session Management 💬: Multiple chat sessions with persistent conversation history - supports multiple users at the same time
- Modern UI 🎨: SvelteKit-based responsive frontend
- FastAPI Backend ⚡: Python backend with async streaming support
Frontend (SvelteKit) → Backend (FastAPI) → RAG System → Ollama LLM
↓
Document Database
(HybridDB: Vector + Keyword)
- Ollama 🦙: Install from ollama.ai
# Install and start Ollama ollama serve - Python 3.8+ 🐍 with pip
- Node.js 18+ 🟢 with npm
git clone <repository-url>
cd <repository-name>pip install -r requirements.txtnpm install-
Ensure Ollama is running:
ollama serve
-
The default configuration uses:
- Model:
ministral-3:14b(can be changed inconfig.py) - Embedding model:
all-MiniLM-L6-v2 - API endpoints:
- Backend:
http://localhost:8000 - Frontend:
http://localhost:5173 - Ollama:
http://localhost:11434
- Backend:
- Model:
main.py: FastAPI backend configurationvite.config.js: Frontend proxy configurationconfig.py: Model configuration
uvicorn main:app --reload --port 8000In a second terminal:
npm run devNavigate to http://localhost:5173 in your browser.
- Click the upload 🔗 button to add
.txtor.mdfiles - Files are processed and indexed automatically
- Uploads may take a while depending on your hardware and file size ⏳
- Type questions in the chat interface
- The system will retrieve relevant context from uploaded documents
- Answers are generated using the local Ollama model 🤖
├── frontend/ # SvelteKit application
│ ├── src/ # Chat interface
│ ├── static/ # Frontend assets
│ └── package.json
├── backend/ # FastAPI application
│ ├── main.py # Main API server
│ ├── llm.py # LLM wrapper classes
│ ├── rag.py # RAG pipeline
│ ├── hdb.py # Hybrid database
│ ├── files.py # File handling
│ ├── stateful_llm.py # Stateful LLM sessions
│ └── requirements.txt
├── example_data/ # Sample documents
├── uploads/ # User uploaded files
└── README.md
POST /api/chat: Stream chat completionsPOST /api/upload/: Upload and process documents
You can run some tests using the example data:
# Test RAG system
python test_rag.py
# Test database
python test_db.py
# Test simplified RAG
python test_rag2.py-
Ollama not running:
Error: Could not connect to Ollama. Is `ollama serve` running?Solution: Start Ollama with
ollama serve -
File upload fails:
- Check file size (<10MB)
- Ensure file extension is a supported format (code or text)
- Verify write permissions in
uploads/directory
-
Slow response time:
- Ensure Ollama is warmed up
- Use smaller model
- Ollama for local LLM serving
- Sentence Transformers for embeddings
- LangChain for text splitting utilities
- SvelteKit for frontend framework
- FastAPI for backend API