doc-bot is a fully offline Retrieval-Augmented Generation (RAG) app targeting iOS and iPadOS. The goal is to let you chat with your own PDF documents using AI, with all processing done locally and only using downloaded models—no internet connection or cloud APIs required. The app uses SwiftUI as its interface builder for a modern, native experience. Import a PDF, ask questions, and get answers powered by local language models and embeddings.
- Fully Offline RAG: All retrieval, embedding, and LLM inference is performed on-device using only downloaded models. No cloud or online API calls.
- Offline RAG Chat: Chat with your imported PDF documents using AI, with all processing done locally.
- Multiple Conversations: Create and manage multiple conversation threads for each document, with automatic subject generation.
- Conversation Management: Switch between conversations with a side drawer interface, view conversation history, and manage chat sessions.
- PDF Import: Uses Apple PDFKit to extract text from PDF files with progress tracking and error handling.
- Chunking, Embedding & Similarity Search: Utilizes Apple's NaturalLanguage framework to split text into chunks, generate embeddings, and perform similarity search—all on-device, without Faiss or external libraries.
- Local Embedding Storage: Embeddings are saved as JSON files in the app support directory using FileManager for fast, private retrieval.
- CoreData Persistence: Documents, conversations, and messages are stored using CoreData for reliability and offline access with cascade deletion support.
- Local LLM Inference: Answers are generated using Qwen2.5-0.5B Instruct (default) or other edge-optimized GGUF models via llama.cpp integration.
- Modern SwiftUI UI: Clean, native interface with improved animations, typing indicators, and message bubbles.
- Internationalization: Multi-language support with English, Spanish, and Portuguese (Brazil) localizations.
- Customizable Themes: Light, dark, and system theme options for personalized user experience.
- Settings & Configuration: Dedicated settings view for language selection, theme customization, and app preferences.
- Import PDF: Select a PDF to import. The app extracts its text using PDFKit with real-time progress tracking.
- Chunking: The text is split into manageable chunks using Apple's NaturalLanguage framework, targeting optimal size for embeddings.
- Embedding Generation: Each chunk is embedded using a local embedding model (e.g., nomic-embed-text-v1.5 or bge-small-en-v1.5, in GGUF format).
- Vector Storage & Search: Embeddings are stored as JSON files in the app support directory using FileManager, and similarity search is performed using Apple's NaturalLanguage framework to find relevant chunks—no Faiss required.
- Persistence: All documents, conversations, and messages are saved using CoreData for offline access and reliability.
- Multiple Conversations: Create multiple conversation threads for each document, with automatic subject generation based on the first user message.
- Chat Interface: When you ask a question, the app finds the most relevant chunks using Apple's NaturalLanguage similarity search and uses a local LLM (Qwen2.5-0.5B Instruct or other edge AI models) via llama.cpp to generate an answer.
- Conversation Management: Switch between conversations using the side drawer, view conversation history, and manage your chat sessions.
The app includes a curated selection of state-of-the-art edge AI models optimized for on-device inference on mobile devices:
Ultra-Lightweight Models (< 500 MB) - Best for resource-constrained devices:
- Qwen2.5-0.5B Instruct (Q4_K_M, 0.32 GiB) ⭐ - Default model, latest Qwen version with improved instruction following
- Qwen3-0.6B Instruct (Q8_0, 0.48 GiB) - Next-gen Qwen model with enhanced performance
- SmolLM-360M Instruct (Q4_K_M, 0.23 GiB) - HuggingFace's ultra-efficient model for maximum battery life
Lightweight Models (0.5-1 GiB) - Optimal balance of performance and efficiency:
- Qwen2.5-1.5B Instruct (Q4_K_M, 0.94 GiB) 🔥 - Recommended for best quality-to-size ratio
- Gemma-2-2B Instruct (Q4_K_M, 1.38 GiB) - Google's efficient instruction-tuned model
- StableLM-2-1.6B (Q4_K_M, 0.98 GiB) - Stability AI's mobile-optimized model
Medium Models (1-2 GiB) - Higher quality, still mobile-friendly:
- Phi-3.5-Mini Instruct (Q4_K_M, 2.2 GiB) - Microsoft's latest Phi model with improved capabilities
All models are in GGUF format and run via llama.cpp integration. While not as powerful as cloud models like Claude Sonnet 4 or GPT-4, these models provide excellent results for on-device document Q&A and work completely offline, ensuring privacy and zero latency.
- nomic-embed-text-v1.5 - High-quality text embeddings optimized for semantic search
- bge-small-en-v1.5 - Lightweight embedding model for efficient document chunking
All embedding models support GGUF format for on-device inference.
- Tabbed Interface: Easy navigation between Documents, Models, and Settings
- Document Management: Import, view, and organize your PDF documents with enhanced UI
- Chat Interface: Modern message bubbles with user/assistant differentiation and typing indicators
- Conversation Drawer: Side panel for switching between multiple conversations per document
- Progress Tracking: Real-time progress indicators for document import and model downloads
- Responsive Design: Optimized for both iPhone and iPad with adaptive layouts
- Multi-language Support: Full localization for English, Spanish, and Portuguese (Brazil)
- Theme Options: Light, dark, and system-adaptive themes
- Accessibility: Proper accessibility labels and VoiceOver support
- Localized Strings: All user-facing text is properly localized for international users
- All processing (PDF parsing, chunking, embedding, LLM inference) is done on-device.
- No data is sent to external servers.
- High Battery Consumption: Local processing for chunking, embedding, and LLM inference can significantly increase battery usage, especially on mobile devices.
- Device Heating: Intensive computations may cause some devices to heat up during prolonged use.
- Large Model Sizes: Even "small" models can be 1GB or more, requiring substantial storage space on your device.
- iOS or iPadOS device with Apple Silicon recommended for best performance.
- Xcode for building and running the app.
- Clone the repository
- Open
doc-bot.xcodeprojin Xcode - Build and run on your device or simulator
- Download a model from the Models tab (Qwen2.5-0.5B Instruct or Qwen2.5-1.5B Instruct recommended for best results)
- Import a PDF from the Documents tab
- Start chatting with your document!
- Create multiple conversations using the conversation drawer for different topics
- Customize your experience in the Settings tab with themes and language preferences
- SwiftUI for UI with component-based architecture and reusable views
- PDFKit for PDF text extraction with progress tracking
- NaturalLanguage for chunking, embedding, and similarity search
- CoreData for persistence of documents, conversations, and messages with proper relationship management
- llama.cpp (via Swift bindings) for LLM and embedding inference with 2026 edge-optimized models (Qwen2.5-0.5B Instruct default for optimal mobile performance)
- JSON (in App Support via FileManager) for vector storage
- Combine/Factory for dependency injection and state management
- Modular Design: Separated view components, repositories, and infrastructure layers for maintainability
- ✨ Multiple Conversations: Create and manage multiple conversation threads per document
- 🌍 Internationalization: Support for English, Spanish, and Portuguese (Brazil)
- 🎨 Theme Support: Light, dark, and system-adaptive themes
- 🗂️ Better Organization: Restructured UI components for better maintainability
- 📱 Enhanced UX: Improved animations, loading states, and user feedback
- 🔄 Conversation Switching: Side drawer for easy conversation navigation
- 📊 Progress Tracking: Real-time progress for imports and downloads
- 🏗️ Repository Pattern: Better data management with repository abstraction
- 🧪 Expanded Testing: Comprehensive test coverage including integration and performance tests
- Add New Models: Update the
Modelslist to include additional GGUF models for LLM and embedding - Custom Themes: Extend the theme system with additional color schemes
- New Languages: Add more localizations by creating new
.lprojfolders - UI Components: Leverage the modular component architecture to add new features
- Repository Extensions: Implement additional repositories for new data types
- Embedding Models: Swap out embedding or LLM models as needed for different use cases
- Chunking Strategies: Extend chunking or retrieval logic for specific document types
MIT License. See LICENSE file for details.
- llama.cpp
- Apple PDFKit
- Apple NaturalLanguage
- HuggingFace for model hosting
doc-bot: Your offline, private PDF AI chat companion.




