A retrieval-augmented generation (RAG) system designed for academic research papers. Combines intelligent text chunking, vector search, and local LLM capabilities to provide context-aware responses to research queries.
- Intelligent Text Chunking: Section-aware paper splitting using research paper structure detection
- Vector Search: Pinecone integration for semantic similarity search
- Local LLM Integration: Ollama-powered language model with RAG augmentation
- Modern Chat Interface: Full-screen black & white UI with RAG toggle
- ArXiv Integration: Direct paper search and PDF processing
- Smart Context Management: Automatic relevance scoring and chunk selection
pipeline/
├── ragpipe/ # Core RAG pipeline
│ ├── services/ # Business logic services
│ │ ├── arxiv_service.py # ArXiv API integration
│ │ ├── pdf_service.py # PDF processing & text extraction
│ │ ├── rag_service.py # RAG orchestration
│ │ ├── section_chunker.py # Intelligent text chunking
│ │ └── vector_service.py # Pinecone vector operations
│ ├── models/ # Data models
│ │ └── paper.py # Academic paper representation
│ ├── config/ # Configuration management
│ │ └── settings.py # App-wide settings
│ └── utils/ # Utility functions
│ └── text_cleaner.py # Text preprocessing
├── llm_integration/ # LLM service layer
│ ├── services/ # LLM services
│ │ ├── llm_service.py # Ollama integration
│ │ ├── rag_orchestrator.py # RAG + LLM coordination
│ │ └── prompt_builder.py # Context-aware prompts
│ ├── models/ # Conversation models
│ ├── config/ # LLM configuration
│ └── main.py # Main application entry
├── chat_app.py # Flask web interface
├── templates/ # HTML templates
└── requirements.txt # Python dependencies
- Python 3.8+
- Ollama with llama2:7b model
- Pinecone account and API key
- ArXiv API access
-
Clone the repository
git clone <repository-url> cd pipeline
-
Create and activate conda environment
conda create -n pipeline python=3.9 conda activate pipeline
-
Install dependencies
pip install -r requirements.txt
-
Install Ollama and pull model
# Install Ollama (macOS) curl -fsSL https://ollama.ai/install.sh | sh # Pull the required model ollama pull llama2:7b
-
Set up environment variables
cp .env.example .env # Edit .env with your API keys
Create a .env file in the project root:
# Pinecone Configuration
PINECONE_API_KEY=your_pinecone_api_key
ENVIRONMENT=your_pinecone_environment
INDEX=your_index_name
# LLM Configuration
LLM_MODEL=llama2:7b
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_TIMEOUT=120
# RAG Settings
MAX_CONTEXT_LENGTH=4000
RAG_SIMILARITY_THRESHOLD=0.7
DEFAULT_SIMILARITY_THRESHOLD=0.4OLLAMA_TIMEOUT: Increased to 120s to handle RAG context processingDEFAULT_SIMILARITY_THRESHOLD: Set to 0.4 for academic paper relevanceMAX_CONTEXT_LENGTH: Limits context to prevent LLM timeouts
ollama servepython chat_app.pyThe application will start on http://localhost:5001
- RAG Mode ON: Queries search your paper database and provide context-augmented responses
- RAG Mode OFF: Direct LLM responses without paper context
- Toggle: Use the RAG toggle switch in the chat header
# Test the RAG pipeline
python -m ragpipe.main
# Test section chunking
python test_section_chunker.py
# Test chunked RAG integration
python test_chunked_rag.pyIntelligent text splitting based on academic paper structure:
- Detects numbered sections (1. Introduction, 2.1 Background)
- Identifies common academic headers (Abstract, Methods, Results)
- Filters out tiny chunks (< 50 characters)
- Configurable chunk sizes (default: 2000 chars max, 50 chars min)
Coordinates the complete retrieval pipeline:
- Vector database search with quality assessment
- ArXiv paper discovery when needed
- PDF processing and text extraction
- Intelligent chunk selection and relevance scoring
Local language model with RAG capabilities:
- Ollama integration for privacy and performance
- Context-aware prompt building
- Conversation history management
- Automatic RAG/LLM routing based on query type
- User Query → Chat interface
- RAG Processing → Vector search + paper retrieval
- Text Chunking → Section-aware splitting
- Relevance Scoring → Semantic similarity + header matching
- Context Building → Top-N relevant chunks
- LLM Generation → Context-augmented response
The project includes test scripts:
test_section_chunker.py: Validates text chunking qualitytest_chunked_rag.py: Tests RAG pipeline integrationtest_pdf_content.py: Verifies PDF text extractionragpipe/main.py: End-to-end pipeline testing
-
Ollama Connection Failed
- Ensure
ollama serveis running - Check
OLLAMA_BASE_URLin configuration
- Ensure
-
Pinecone Connection Error
- Verify API key and environment settings
- Check index name configuration
-
PDF Processing Issues
- Ensure
PyPDF2is installed - Check download directory permissions
- Ensure
-
Timeout Errors
- Increase
OLLAMA_TIMEOUTfor complex queries - Reduce
max_chunksin RAG service
- Increase