|
2 | 2 |
|
3 | 3 | A Streamlit-based application for chatting with PDF documents using local Large Language Models via Ollama. Features intelligent citation highlighting and multi-document chat sessions.
|
4 | 4 |
|
5 |
| -## Features |
6 |
| - |
7 |
| -- **Local LLM Integration**: Chat with documents using Ollama models (no cloud dependency) |
8 |
| -- **Smart Citations**: Automatically highlights referenced text in PDFs with contextual evidence |
9 |
| -- **Multi-Chat Support**: Manage multiple chat sessions with different documents |
10 |
| -- **PDF Processing**: Robust text extraction from PDF documents |
11 |
| -- **Interactive PDF Viewer**: View documents with embedded highlights |
12 |
| -- **Docker Support**: Easy deployment with Docker Compose |
13 |
| - |
14 |
| -## Prerequisites |
| 5 | +## Quick Start |
15 | 6 |
|
| 7 | +### Prerequisites |
16 | 8 | - **Python 3.8+**
|
17 |
| -- **Ollama**: Install and configure Ollama locally |
18 |
| - ```bash |
19 |
| - # Install Ollama (macOS/Linux) |
20 |
| - curl -fsSL https://ollama.ai/install.sh | sh |
21 |
| - |
22 |
| - # Pull a model (required) |
23 |
| - ollama pull llama3.2 |
24 |
| - ``` |
25 |
| - |
26 |
| -## Installation |
| 9 | +- **Ollama**: Install from [https://ollama.ai](https://ollama.ai) |
27 | 10 |
|
28 |
| -### Option 1: Local Development |
| 11 | +### Local Setup |
29 | 12 |
|
30 |
| -1. **Clone the repository** |
31 |
| - ```bash |
32 |
| - git clone <repository-url> |
33 |
| - cd ragnarok |
34 |
| - ``` |
35 |
| - |
36 |
| -2. **Install Python dependencies** |
| 13 | +1. **Install dependencies** |
37 | 14 | ```bash
|
38 | 15 | pip install -r requirements.txt
|
39 | 16 | ```
|
40 | 17 |
|
41 |
| -3. **Install as a package (optional)** |
42 |
| - ```bash |
43 |
| - # For development |
44 |
| - pip install -e . |
45 |
| - |
46 |
| - # Or for production |
47 |
| - pip install . |
48 |
| - ``` |
49 |
| - |
50 |
| -4. **Start Ollama** |
| 18 | +2. **Start Ollama and pull a model** |
51 | 19 | ```bash
|
52 | 20 | ollama serve
|
| 21 | + ollama pull olmo2:7b # or olmo2:13b for better performance |
53 | 22 | ```
|
54 | 23 |
|
55 |
| -5. **Run the application** |
| 24 | +3. **Run the application** |
56 | 25 | ```bash
|
57 | 26 | streamlit run app.py
|
58 | 27 | ```
|
59 | 28 |
|
60 |
| -6. **Access the application** |
61 |
| - - Open your browser to `http://localhost:8501` |
| 29 | +4. **Open browser** to `http://localhost:8501` |
62 | 30 |
|
63 |
| -### Option 2: Docker Setup |
64 |
| - |
65 |
| -Quick Docker Compose setup for running with external Ollama. |
| 31 | +### Docker Setup |
66 | 32 |
|
67 | 33 | 1. **Start Ollama with Docker-compatible configuration**
|
68 | 34 | ```bash
|
69 |
| - # Important: Ollama must accept connections from Docker containers |
70 | 35 | OLLAMA_HOST=0.0.0.0:11434 ollama serve
|
71 | 36 | ```
|
72 | 37 |
|
73 |
| -2. **Build and run the application** |
| 38 | +2. **Run with Docker Compose** |
74 | 39 | ```bash
|
75 |
| - # Build and start the container |
76 | 40 | docker-compose up -d --build
|
77 |
| - |
78 |
| - # View logs (optional) |
79 |
| - docker-compose logs -f app |
80 | 41 | ```
|
81 | 42 |
|
82 |
| -3. **Access the application** |
83 |
| - - **URL:** http://localhost:8501 |
84 |
| - - The app will automatically connect to your local Ollama instance |
85 |
| - |
86 | 43 | ## Usage
|
87 | 44 |
|
88 |
| -1. **Upload a PDF document** |
89 |
| - - Click the file uploader in the main interface |
90 |
| - - Select a PDF file from your computer |
91 |
| - - Wait for text extraction to complete |
92 |
| - |
93 |
| -2. **Start chatting** |
94 |
| - - Type your questions about the document |
95 |
| - - The AI will respond with citations from the document |
96 |
| - - Citations are automatically highlighted in the PDF viewer |
97 |
| - |
98 |
| -3. **Smart Citations (Optional)** |
99 |
| - - Toggle "Smart Citations" in the sidebar |
100 |
| - - View evidence snippets directly below AI responses |
101 |
| - - See highlighted text with page thumbnails |
| 45 | +1. **Upload a PDF** using the file uploader |
| 46 | +2. **Ask questions** about the document |
| 47 | +3. **View citations** highlighted directly in the PDF viewer |
| 48 | +4. **Manage multiple chats** via the sidebar |
102 | 49 |
|
103 |
| -4. **Manage chat sessions** |
104 |
| - - Create new chats for different documents |
105 |
| - - Switch between chat sessions in the sidebar |
106 |
| - - Delete old chats when no longer needed |
| 50 | +## Testing |
107 | 51 |
|
108 |
| -## Docker Management |
| 52 | +Run the regression test suite: |
109 | 53 |
|
110 | 54 | ```bash
|
111 |
| -# Stop the application |
112 |
| -docker-compose down |
113 |
| - |
114 |
| -# Restart with rebuild |
115 |
| -docker-compose down && docker-compose up -d --build |
116 |
| - |
117 |
| -# View real-time logs |
118 |
| -docker-compose logs -f app |
119 |
| - |
120 |
| -# Check container status |
121 |
| -docker-compose ps |
| 55 | +python -m pytest tests/ -v |
122 | 56 | ```
|
123 | 57 |
|
| 58 | +The test suite includes: |
| 59 | +- Environment detection and configuration |
| 60 | +- PDF processing and text extraction |
| 61 | +- Model integration and response handling |
| 62 | +- Citation extraction and matching |
| 63 | +- End-to-end workflow validation |
| 64 | + |
124 | 65 | ## Configuration
|
125 | 66 |
|
126 |
| -- **Memory**: Container uses minimal resources (~500MB) |
127 |
| -- **Models**: Download models before using: `ollama pull <model-name>` |
128 |
| -- **Port**: Change port in `docker-compose.yml` if 8501 is in use |
| 67 | +The application automatically detects its environment: |
| 68 | +- **Direct execution**: Uses `http://localhost:11434` |
| 69 | +- **Docker**: Uses `http://host.docker.internal:11434` |
129 | 70 |
|
130 | 71 | ## Troubleshooting
|
131 | 72 |
|
132 | 73 | ### Connection Issues
|
133 | 74 | ```bash
|
134 |
| -# 1. Verify Ollama is accessible |
| 75 | +# Verify Ollama is running |
135 | 76 | curl http://localhost:11434/api/version
|
136 | 77 |
|
137 |
| -# 2. Check Ollama is configured correctly |
138 |
| -# Must run with: OLLAMA_HOST=0.0.0.0:11434 ollama serve |
139 |
| - |
140 |
| -# 3. Check app logs |
141 |
| -docker-compose logs app |
| 78 | +# For Docker: ensure Ollama accepts external connections |
| 79 | +OLLAMA_HOST=0.0.0.0:11434 ollama serve |
142 | 80 | ```
|
143 | 81 |
|
144 | 82 | ### Common Solutions
|
145 |
| -- **"No models found"**: Start Ollama with `OLLAMA_HOST=0.0.0.0:11434` |
146 |
| -- **"Can't connect"**: Restart both Ollama and the container |
147 |
| -- **Port 8501 in use**: Change port in docker-compose.yml: `"8502:8501"` |
148 |
| - |
149 |
| -## Dependencies |
150 |
| - |
151 |
| -- **streamlit**: Web application framework |
152 |
| -- **ollama**: Local LLM integration |
153 |
| -- **pdfplumber**: PDF text extraction |
154 |
| -- **PyMuPDF**: Advanced PDF processing and highlighting |
155 |
| -- **streamlit-pdf-viewer**: Interactive PDF display |
156 |
| - |
157 |
| -## Development |
158 |
| - |
159 |
| -### Project Structure |
160 |
| -``` |
161 |
| -ragnarok/ |
162 |
| -├── app.py # Main Streamlit application |
163 |
| -├── ragnarok/ # Core package |
164 |
| -│ ├── __init__.py |
165 |
| -│ ├── core.py |
166 |
| -│ └── enhanced_pdf_processor.py |
167 |
| -├── tests/ # Test suite |
168 |
| -├── examples/ # Example files |
169 |
| -├── requirements.txt # Python dependencies |
170 |
| -├── docker-compose.yml # Docker setup |
171 |
| -└── Dockerfile # Container definition |
172 |
| -``` |
173 |
| - |
174 |
| -### Running Tests |
175 |
| -```bash |
176 |
| -pytest tests/ |
177 |
| -``` |
178 |
| - |
179 |
| -### Environment Setup |
180 |
| -```bash |
181 |
| -# Create conda environment (optional) |
182 |
| -conda env create -f environment.yml |
183 |
| -conda activate ragnarok |
184 |
| -``` |
| 83 | +- **"No models found"**: Pull a model with `ollama pull olmo2:7b` |
| 84 | +- **"Can't connect"**: Restart Ollama with correct host settings |
| 85 | +- **Upload fails**: Use "🗑️ Clear Upload" button to reset file state |
185 | 86 |
|
186 | 87 | ## Notes
|
187 | 88 |
|
188 |
| -- **Data Storage**: Chat history is stored in memory (lost on restart) |
189 |
| -- **Performance**: First model load may take 30+ seconds |
190 |
| -- **Compatibility**: Tested with Ollama 0.7.x and Python 3.8+ |
191 |
| -- **PDF Support**: Handles most standard PDF formats |
192 |
| - |
193 |
| -## Contributing |
194 |
| - |
195 |
| -1. Fork the repository |
196 |
| -2. Create a feature branch |
197 |
| -3. Make your changes |
198 |
| -4. Add tests if applicable |
199 |
| -5. Submit a pull request |
200 |
| - |
201 |
| -## License |
202 |
| - |
203 |
| -MIT License - see LICENSE file for details |
| 89 | +- Chat history is stored in memory (lost on restart) |
| 90 | +- First model load may take 30+ seconds |
| 91 | +- Tested with Ollama 0.7.x and Python 3.8+ |
0 commit comments