Skip to content

felnanuke2/doc-bot

Repository files navigation

🚧 Project Status: Under Development 🚧

doc-bot

codecov

Overview

doc-bot is a fully offline Retrieval-Augmented Generation (RAG) app targeting iOS and iPadOS. The goal is to let you chat with your own PDF documents using AI, with all processing done locally and only using downloaded models—no internet connection or cloud APIs required. The app uses SwiftUI as its interface builder for a modern, native experience. Import a PDF, ask questions, and get answers powered by local language models and embeddings.

Features

  • Fully Offline RAG: All retrieval, embedding, and LLM inference is performed on-device using only downloaded models. No cloud or online API calls.
  • Offline RAG Chat: Chat with your imported PDF documents using AI, with all processing done locally.
  • Multiple Conversations: Create and manage multiple conversation threads for each document, with automatic subject generation.
  • Conversation Management: Switch between conversations with a side drawer interface, view conversation history, and manage chat sessions.
  • PDF Import: Uses Apple PDFKit to extract text from PDF files with progress tracking and error handling.
  • Chunking, Embedding & Similarity Search: Utilizes Apple's NaturalLanguage framework to split text into chunks, generate embeddings, and perform similarity search—all on-device, without Faiss or external libraries.
  • Local Embedding Storage: Embeddings are saved as JSON files in the app support directory using FileManager for fast, private retrieval.
  • CoreData Persistence: Documents, conversations, and messages are stored using CoreData for reliability and offline access with cascade deletion support.
  • Local LLM Inference: Answers are generated using Qwen2.5-0.5B Instruct (default) or other edge-optimized GGUF models via llama.cpp integration.
  • Modern SwiftUI UI: Clean, native interface with improved animations, typing indicators, and message bubbles.
  • Internationalization: Multi-language support with English, Spanish, and Portuguese (Brazil) localizations.
  • Customizable Themes: Light, dark, and system theme options for personalized user experience.
  • Settings & Configuration: Dedicated settings view for language selection, theme customization, and app preferences.

How It Works

  1. Import PDF: Select a PDF to import. The app extracts its text using PDFKit with real-time progress tracking.
  2. Chunking: The text is split into manageable chunks using Apple's NaturalLanguage framework, targeting optimal size for embeddings.
  3. Embedding Generation: Each chunk is embedded using a local embedding model (e.g., nomic-embed-text-v1.5 or bge-small-en-v1.5, in GGUF format).
  4. Vector Storage & Search: Embeddings are stored as JSON files in the app support directory using FileManager, and similarity search is performed using Apple's NaturalLanguage framework to find relevant chunks—no Faiss required.
  5. Persistence: All documents, conversations, and messages are saved using CoreData for offline access and reliability.
  6. Multiple Conversations: Create multiple conversation threads for each document, with automatic subject generation based on the first user message.
  7. Chat Interface: When you ask a question, the app finds the most relevant chunks using Apple's NaturalLanguage similarity search and uses a local LLM (Qwen2.5-0.5B Instruct or other edge AI models) via llama.cpp to generate an answer.
  8. Conversation Management: Switch between conversations using the side drawer, view conversation history, and manage your chat sessions.

Supported Models

LLM Models (2026 Edge AI Optimized)

The app includes a curated selection of state-of-the-art edge AI models optimized for on-device inference on mobile devices:

Ultra-Lightweight Models (< 500 MB) - Best for resource-constrained devices:

  • Qwen2.5-0.5B Instruct (Q4_K_M, 0.32 GiB) ⭐ - Default model, latest Qwen version with improved instruction following
  • Qwen3-0.6B Instruct (Q8_0, 0.48 GiB) - Next-gen Qwen model with enhanced performance
  • SmolLM-360M Instruct (Q4_K_M, 0.23 GiB) - HuggingFace's ultra-efficient model for maximum battery life

Lightweight Models (0.5-1 GiB) - Optimal balance of performance and efficiency:

  • Qwen2.5-1.5B Instruct (Q4_K_M, 0.94 GiB) 🔥 - Recommended for best quality-to-size ratio
  • Gemma-2-2B Instruct (Q4_K_M, 1.38 GiB) - Google's efficient instruction-tuned model
  • StableLM-2-1.6B (Q4_K_M, 0.98 GiB) - Stability AI's mobile-optimized model

Medium Models (1-2 GiB) - Higher quality, still mobile-friendly:

  • Phi-3.5-Mini Instruct (Q4_K_M, 2.2 GiB) - Microsoft's latest Phi model with improved capabilities

All models are in GGUF format and run via llama.cpp integration. While not as powerful as cloud models like Claude Sonnet 4 or GPT-4, these models provide excellent results for on-device document Q&A and work completely offline, ensuring privacy and zero latency.

Embedding Models

  • nomic-embed-text-v1.5 - High-quality text embeddings optimized for semantic search
  • bge-small-en-v1.5 - Lightweight embedding model for efficient document chunking

All embedding models support GGUF format for on-device inference.

User Interface

  • Tabbed Interface: Easy navigation between Documents, Models, and Settings
  • Document Management: Import, view, and organize your PDF documents with enhanced UI
  • Chat Interface: Modern message bubbles with user/assistant differentiation and typing indicators
  • Conversation Drawer: Side panel for switching between multiple conversations per document
  • Progress Tracking: Real-time progress indicators for document import and model downloads
  • Responsive Design: Optimized for both iPhone and iPad with adaptive layouts

Internationalization & Accessibility

  • Multi-language Support: Full localization for English, Spanish, and Portuguese (Brazil)
  • Theme Options: Light, dark, and system-adaptive themes
  • Accessibility: Proper accessibility labels and VoiceOver support
  • Localized Strings: All user-facing text is properly localized for international users

Privacy & Offline

  • All processing (PDF parsing, chunking, embedding, LLM inference) is done on-device.
  • No data is sent to external servers.

Cons & Considerations

  • High Battery Consumption: Local processing for chunking, embedding, and LLM inference can significantly increase battery usage, especially on mobile devices.
  • Device Heating: Intensive computations may cause some devices to heat up during prolonged use.
  • Large Model Sizes: Even "small" models can be 1GB or more, requiring substantial storage space on your device.

Requirements

  • iOS or iPadOS device with Apple Silicon recommended for best performance.
  • Xcode for building and running the app.

Getting Started

  1. Clone the repository
  2. Open doc-bot.xcodeproj in Xcode
  3. Build and run on your device or simulator
  4. Download a model from the Models tab (Qwen2.5-0.5B Instruct or Qwen2.5-1.5B Instruct recommended for best results)
  5. Import a PDF from the Documents tab
  6. Start chatting with your document!
  7. Create multiple conversations using the conversation drawer for different topics
  8. Customize your experience in the Settings tab with themes and language preferences

Assets & Screenshots

App Screenshot

Simulator Screenshot 1 Simulator Screenshot 2 Simulator Screenshot 3 Simulator Screenshot 4

▶️ Watch Demo Video (Google Drive)

Architecture

  • SwiftUI for UI with component-based architecture and reusable views
  • PDFKit for PDF text extraction with progress tracking
  • NaturalLanguage for chunking, embedding, and similarity search
  • CoreData for persistence of documents, conversations, and messages with proper relationship management
  • llama.cpp (via Swift bindings) for LLM and embedding inference with 2026 edge-optimized models (Qwen2.5-0.5B Instruct default for optimal mobile performance)
  • JSON (in App Support via FileManager) for vector storage
  • Combine/Factory for dependency injection and state management
  • Modular Design: Separated view components, repositories, and infrastructure layers for maintainability

New in This Version

  • Multiple Conversations: Create and manage multiple conversation threads per document
  • 🌍 Internationalization: Support for English, Spanish, and Portuguese (Brazil)
  • 🎨 Theme Support: Light, dark, and system-adaptive themes
  • 🗂️ Better Organization: Restructured UI components for better maintainability
  • 📱 Enhanced UX: Improved animations, loading states, and user feedback
  • 🔄 Conversation Switching: Side drawer for easy conversation navigation
  • 📊 Progress Tracking: Real-time progress for imports and downloads
  • 🏗️ Repository Pattern: Better data management with repository abstraction
  • 🧪 Expanded Testing: Comprehensive test coverage including integration and performance tests

Extending & Customizing

  • Add New Models: Update the Models list to include additional GGUF models for LLM and embedding
  • Custom Themes: Extend the theme system with additional color schemes
  • New Languages: Add more localizations by creating new .lproj folders
  • UI Components: Leverage the modular component architecture to add new features
  • Repository Extensions: Implement additional repositories for new data types
  • Embedding Models: Swap out embedding or LLM models as needed for different use cases
  • Chunking Strategies: Extend chunking or retrieval logic for specific document types

License

MIT License. See LICENSE file for details.

Credits


doc-bot: Your offline, private PDF AI chat companion.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages