Skip to content

Commit 3ecda95

Browse files
authored
Merge pull request #2 from clstaudt/rag
+ RAG system
2 parents 5abd886 + 606ec66 commit 3ecda95

File tree

11 files changed

+2017
-326
lines changed

11 files changed

+2017
-326
lines changed

RAG_IMPLEMENTATION.md

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
# RAG System Implementation
2+
3+
This document describes the Retrieval-Augmented Generation (RAG) system implemented in Ragnarok to solve the large document context window issue.
4+
5+
## Problem Solved
6+
7+
**Issue**: Large documents exceed the model's context window, causing the AI to "forget" document content and provide responses without using the document information.
8+
9+
**Solution**: RAG system that chunks documents, stores them in a vector database, and retrieves only relevant chunks for each query.
10+
11+
## Architecture
12+
13+
### Components
14+
15+
1. **Document Chunking**: Uses LlamaIndex's `SentenceSplitter` to break documents into overlapping chunks
16+
2. **Vector Embeddings**: Uses Ollama's embedding models (default: `nomic-embed-text`)
17+
3. **Vector Storage**: ChromaDB for persistent vector storage
18+
4. **Semantic Retrieval**: Retrieves most relevant chunks based on query similarity
19+
5. **Response Generation**: Uses retrieved chunks as context for the LLM
20+
21+
### Flow
22+
23+
```
24+
Document Upload → Chunking → Embeddings → Vector DB → Query → Retrieval → Response
25+
```
26+
27+
## Features
28+
29+
### Automatic Chunking
30+
- Configurable chunk size (default: 512 tokens)
31+
- Configurable overlap (default: 50 tokens)
32+
- Preserves context across chunk boundaries
33+
34+
### Semantic Search
35+
- Uses vector similarity for chunk retrieval
36+
- Configurable similarity threshold (default: 0.7)
37+
- Configurable number of retrieved chunks (default: 5)
38+
39+
### Fallback Support
40+
- Graceful fallback to traditional full-document processing
41+
- Error handling and recovery
42+
- User notification of processing method
43+
44+
### Configuration Options
45+
- Chunk size and overlap
46+
- Similarity threshold
47+
- Number of retrieved chunks
48+
- Embedding model selection
49+
- Enable/disable RAG processing
50+
51+
## Installation
52+
53+
### 1. Install Dependencies
54+
55+
```bash
56+
pip install -r requirements.txt
57+
```
58+
59+
### 2. Install Required Ollama Models
60+
61+
```bash
62+
# Embedding model (required)
63+
ollama pull nomic-embed-text
64+
65+
# Alternative embedding models
66+
ollama pull mxbai-embed-large
67+
ollama pull all-minilm
68+
69+
# LLM models (if not already installed)
70+
ollama pull llama3.1:8b
71+
ollama pull mistral:latest
72+
```
73+
74+
### 3. Start Ollama
75+
76+
```bash
77+
ollama serve
78+
```
79+
80+
## Usage
81+
82+
### Basic Usage
83+
84+
1. **Enable RAG**: Check "Enable RAG (Semantic Search)" in the sidebar
85+
2. **Upload Document**: Upload a PDF document as usual
86+
3. **Wait for Processing**: The system will automatically chunk and process the document
87+
4. **Ask Questions**: Questions will use semantic search to find relevant chunks
88+
89+
### Configuration
90+
91+
Access RAG settings in the sidebar under "🔍 RAG Settings":
92+
93+
- **Chunk Size**: Size of text chunks (256-1024 tokens)
94+
- **Chunk Overlap**: Overlap between chunks (0-200 tokens)
95+
- **Similarity Threshold**: Minimum similarity for retrieval (0.0-1.0)
96+
- **Max Retrieved Chunks**: Number of chunks to retrieve (1-10)
97+
- **Embedding Model**: Model for generating embeddings
98+
99+
### Visual Feedback
100+
101+
The system provides clear feedback about processing method:
102+
103+
-**RAG Processing**: "Response generated using RAG (semantic search)"
104+
- 📄 **Traditional Processing**: "Response generated using full document"
105+
- ⚠️ **Fallback**: "RAG system failed, using traditional processing"
106+
107+
### Retrieved Chunks Display
108+
109+
When using RAG, you can view retrieved chunks:
110+
- Expandable section showing relevant chunks
111+
- Similarity scores for each chunk
112+
- Chunk content preview
113+
114+
## Testing
115+
116+
### Quick Test
117+
118+
```bash
119+
python experiments/test_rag.py
120+
```
121+
122+
This will:
123+
1. Check dependencies
124+
2. Verify Ollama is running
125+
3. Test document processing
126+
4. Test query retrieval
127+
5. Verify responses
128+
129+
### Manual Testing
130+
131+
1. Upload a large document (>10,000 words)
132+
2. Enable RAG in settings
133+
3. Ask specific questions about different parts of the document
134+
4. Verify responses use relevant information
135+
5. Check retrieved chunks for relevance
136+
137+
## Performance Benefits
138+
139+
### Memory Efficiency
140+
- Only relevant chunks loaded into context
141+
- Supports documents of any size
142+
- Consistent memory usage regardless of document size
143+
144+
### Response Quality
145+
- More focused responses using relevant content
146+
- Better handling of multi-topic documents
147+
- Reduced hallucination from irrelevant context
148+
149+
### Scalability
150+
- Persistent vector storage
151+
- Fast similarity search
152+
- Supports multiple documents (future enhancement)
153+
154+
## Configuration Examples
155+
156+
### For Large Documents (>50 pages)
157+
```python
158+
{
159+
"chunk_size": 1024,
160+
"chunk_overlap": 100,
161+
"similarity_threshold": 0.6,
162+
"top_k": 7
163+
}
164+
```
165+
166+
### For Precise Retrieval
167+
```python
168+
{
169+
"chunk_size": 256,
170+
"chunk_overlap": 25,
171+
"similarity_threshold": 0.8,
172+
"top_k": 3
173+
}
174+
```
175+
176+
### For Comprehensive Coverage
177+
```python
178+
{
179+
"chunk_size": 512,
180+
"chunk_overlap": 75,
181+
"similarity_threshold": 0.5,
182+
"top_k": 10
183+
}
184+
```
185+
186+
## Troubleshooting
187+
188+
### Common Issues
189+
190+
**RAG System Not Available**
191+
- Check Ollama is running: `ollama serve`
192+
- Verify embedding model: `ollama pull nomic-embed-text`
193+
- Check dependencies: `pip install -r requirements.txt`
194+
195+
**Poor Retrieval Quality**
196+
- Lower similarity threshold (0.5-0.6)
197+
- Increase number of retrieved chunks
198+
- Try different embedding model
199+
- Adjust chunk size for your document type
200+
201+
**Slow Processing**
202+
- Reduce chunk overlap
203+
- Use smaller embedding model
204+
- Increase chunk size (fewer chunks)
205+
206+
**Memory Issues**
207+
- Reduce number of retrieved chunks
208+
- Use smaller chunk size
209+
- Clear old documents from vector DB
210+
211+
### Debug Mode
212+
213+
Enable debug logging to see detailed RAG operations:
214+
215+
```python
216+
from loguru import logger
217+
logger.add("rag_debug.log", level="DEBUG")
218+
```
219+
220+
## Future Enhancements
221+
222+
### Planned Features
223+
- Multi-document support
224+
- Hybrid search (keyword + semantic)
225+
- Document summarization
226+
- Chunk re-ranking
227+
- Custom embedding fine-tuning
228+
229+
### Advanced Configuration
230+
- Custom chunking strategies
231+
- Multiple vector stores
232+
- Query expansion
233+
- Response fusion
234+
235+
## API Reference
236+
237+
### RAGSystem Class
238+
239+
```python
240+
from ragnarok import RAGSystem, create_rag_system
241+
242+
# Create system
243+
rag = create_rag_system(
244+
ollama_base_url="http://localhost:11434",
245+
embedding_model="nomic-embed-text",
246+
chunk_size=512,
247+
chunk_overlap=50,
248+
similarity_threshold=0.7,
249+
top_k=5
250+
)
251+
252+
# Process document
253+
stats = rag.process_document(text, document_id)
254+
255+
# Query document
256+
result = rag.query_document(question)
257+
258+
# Get retrieval info
259+
chunks = rag.get_retrieval_info(question)
260+
261+
# Cleanup
262+
rag.cleanup()
263+
```
264+
265+
### Key Methods
266+
267+
- `process_document(text, doc_id)`: Process and store document
268+
- `query_document(question)`: Get AI response with retrieval
269+
- `get_retrieval_info(question)`: Get chunk information only
270+
- `clear_document(doc_id)`: Remove document from storage
271+
- `get_system_info()`: Get configuration details
272+
- `cleanup()`: Clean up resources
273+
274+
## Contributing
275+
276+
When contributing to the RAG system:
277+
278+
1. **Test thoroughly** with various document types and sizes
279+
2. **Maintain backward compatibility** with traditional processing
280+
3. **Add appropriate error handling** and user feedback
281+
4. **Update documentation** for new features
282+
5. **Consider performance impact** of changes
283+
284+
## License
285+
286+
Same as the main Ragnarok project.

README.md

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@ A powerful PDF processing system with high-quality text extraction and structure
66

77
- **High-Quality Text Extraction**: Uses PyMuPDF4LLM for superior structure preservation
88
- **Automatic Structure Detection**: Headers, tables, lists, and formatting automatically detected
9+
- **RAG System**: Advanced Retrieval-Augmented Generation for large documents
10+
- Document chunking with configurable overlap
11+
- Vector embeddings using Ollama models
12+
- Semantic search and retrieval with ChromaDB
13+
- Handles documents of any size without context window issues
914
- **LLM/RAG Optimized**: Specifically designed for AI applications
1015
- **Local Processing**: All processing happens locally, no external service calls
1116
- **Citation Highlighting**: Smart PDF highlighting for AI-generated citations
@@ -40,9 +45,66 @@ conda env create -f environment.yml
4045
conda activate ragnarok
4146
```
4247

48+
3. **Install Ollama Models for RAG**:
49+
```bash
50+
# Required for embeddings
51+
ollama pull nomic-embed-text
52+
53+
# Optional alternative embedding models
54+
ollama pull mxbai-embed-large
55+
ollama pull all-minilm
56+
57+
# LLM models (if not already installed)
58+
ollama pull llama3.1:8b
59+
ollama pull mistral:latest
60+
```
61+
4362
## Quick Start
4463

45-
### Basic Usage
64+
### Web Application (Recommended)
65+
66+
```bash
67+
# Start the web application
68+
streamlit run app.py
69+
```
70+
71+
Then:
72+
1. Upload a PDF document
73+
2. Enable RAG in the sidebar settings
74+
3. Ask questions about your document
75+
76+
### RAG System (Programmatic)
77+
78+
```python
79+
from ragnarok import create_rag_system
80+
81+
# Create RAG system
82+
rag = create_rag_system(
83+
ollama_base_url="http://localhost:11434",
84+
embedding_model="nomic-embed-text",
85+
chunk_size=512,
86+
chunk_overlap=50
87+
)
88+
89+
# Process document
90+
with open('document.pdf', 'rb') as f:
91+
pdf_bytes = f.read()
92+
93+
# Extract text and process with RAG
94+
from ragnarok import EnhancedPDFProcessor
95+
processor = EnhancedPDFProcessor(pdf_bytes)
96+
text = processor.extract_full_text()
97+
98+
# Process with RAG
99+
stats = rag.process_document(text, "my_document")
100+
print(f"Created {stats['total_chunks']} chunks")
101+
102+
# Query the document
103+
result = rag.query_document("What is this document about?")
104+
print(result['response'])
105+
```
106+
107+
### Basic PDF Processing
46108

47109
```python
48110
from ragnarok.enhanced_pdf_processor import EnhancedPDFProcessor
@@ -65,6 +127,14 @@ for section_name, content in sections.items():
65127
print(content[:200] + "...")
66128
```
67129

130+
### Test the RAG System
131+
132+
```bash
133+
python experiments/test_rag.py
134+
```
135+
136+
This will test the complete RAG pipeline including dependencies, Ollama connection, and document processing.
137+
68138
### Test the Extraction
69139

70140
Run the demo script to see the extraction in action:
@@ -151,6 +221,11 @@ PDF Input → PyMuPDF4LLM → Structured Markdown → Sections/TOC
151221

152222
MIT License - see LICENSE file for details.
153223

224+
## Documentation
225+
226+
- **[RAG System Implementation](RAG_IMPLEMENTATION.md)**: Detailed guide to the RAG system, configuration, and troubleshooting
227+
- **[PDF Extraction Summary](PDF_EXTRACTION_SUMMARY.md)**: Technical details about PDF processing methods
228+
154229
## Testing
155230

156231
```bash

0 commit comments

Comments
 (0)