|
| 1 | +# Agentic RAG System |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +An intelligent RAG (Retrieval Augmented Generation) system that uses an LLM agent to make decisions about information retrieval and response generation. The system processes PDF documents and can intelligently decide which knowledge base to query based on the user's question. |
| 6 | + |
| 7 | +The system has the following features: |
| 8 | + |
| 9 | +- Intelligent query routing |
| 10 | +- PDF processing using Docling for accurate text extraction and chunking |
| 11 | +- Persistent vector storage with ChromaDB (PDF and Websites) |
| 12 | +- Smart context retrieval and response generation |
| 13 | +- FastAPI-based REST API for document upload and querying |
| 14 | +- Support for both OpenAI-based agents or local, transformer-based agents (`Mistral-7B` by default) |
| 15 | +- Optional Chain of Thought (CoT) reasoning for more detailed and structured responses |
| 16 | + |
| 17 | +## 0. Prerequisites and setup |
| 18 | + |
| 19 | +### Prerequisites |
| 20 | + |
| 21 | +- Python 3.8 or higher |
| 22 | +- OpenAI API key (optional, for OpenAI-based agent) |
| 23 | +- HuggingFace token (optional, for local Mistral model) |
| 24 | + |
| 25 | +### Hardware Requirements |
| 26 | + |
| 27 | +- For the OpenAI Agent: Standard CPU machine |
| 28 | +- For the Local Agent: |
| 29 | + - Minimum 16GB RAM (recommended >24GBs) |
| 30 | + - GPU with 8GB VRAM recommended for better performance |
| 31 | + - Will run on CPU if GPU is not available, but will be significantly slower. |
| 32 | + |
| 33 | +### Setup |
| 34 | + |
| 35 | +1. Clone the repository and install dependencies: |
| 36 | + |
| 37 | + ```bash |
| 38 | + git clone https://github.com/oracle-devrel/devrel-labs.git |
| 39 | + cd agentic-rag |
| 40 | + pip install -r requirements.txt |
| 41 | + ``` |
| 42 | + |
| 43 | +2. Authenticate with HuggingFace: |
| 44 | + |
| 45 | + The system uses `Mistral-7B` by default, which requires authentication with HuggingFace: |
| 46 | + |
| 47 | + a. Create a HuggingFace account [here](https://huggingface.co/join), if you don't have one yet. |
| 48 | + |
| 49 | + b. Accept the Mistral-7B model terms & conditions [here](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) |
| 50 | + |
| 51 | + c. Create an access token [here](https://huggingface.co/settings/tokens) |
| 52 | + |
| 53 | + d. Create a `config.yaml` file (you can copy from `config_example.yaml`), and add your HuggingFace token: |
| 54 | + ```yaml |
| 55 | + HUGGING_FACE_HUB_TOKEN: your_token_here |
| 56 | + ``` |
| 57 | +
|
| 58 | +3. (Optional) If you want to use the OpenAI-based agent instead of the default local model, create a `.env` file with your OpenAI API key: |
| 59 | +
|
| 60 | + ```bash |
| 61 | + OPENAI_API_KEY=your-api-key-here |
| 62 | + ``` |
| 63 | +
|
| 64 | + If no API key is provided, the system will automatically download and use `Mistral-7B-Instruct-v0.2` for text generation when using the local model. No additional configuration is needed. |
| 65 | + |
| 66 | +## 1. Getting Started |
| 67 | +
|
| 68 | +You can launch this solution in three ways: |
| 69 | +
|
| 70 | +### 1. Using the Complete REST API |
| 71 | +
|
| 72 | +Start the API server: |
| 73 | +
|
| 74 | +```bash |
| 75 | +python main.py |
| 76 | +``` |
| 77 | +
|
| 78 | +The API will be available at `http://localhost:8000`. You can then use the API endpoints as described in the API Endpoints section below. |
| 79 | +
|
| 80 | +### 2. Using the Gradio Interface |
| 81 | +
|
| 82 | +The system provides a user-friendly web interface using Gradio, which allows you to: |
| 83 | +- Upload and process PDF documents |
| 84 | +- Process web content from URLs |
| 85 | +- Chat with your documents using either local or OpenAI models |
| 86 | +- Toggle Chain of Thought reasoning |
| 87 | +
|
| 88 | +To launch the interface: |
| 89 | +
|
| 90 | +```bash |
| 91 | +python gradio_app.py |
| 92 | +``` |
| 93 | +
|
| 94 | +This will start the Gradio server and automatically open the interface in your default browser at `http://localhost:7860`. The interface has two main tabs: |
| 95 | +
|
| 96 | +1. **Document Processing**: |
| 97 | + - Upload PDFs using the file uploader |
| 98 | + - Process web content by entering URLs |
| 99 | + - View processing status and results |
| 100 | +
|
| 101 | +2. **Chat Interface**: |
| 102 | + - Select between Local (Mistral) and OpenAI models |
| 103 | + - Toggle Chain of Thought reasoning for more detailed responses |
| 104 | + - Chat with your documents using natural language |
| 105 | + - Clear chat history as needed |
| 106 | +
|
| 107 | +Note: The interface will automatically detect available models based on your configuration: |
| 108 | +- Local Mistral model requires HuggingFace token in `config.yaml` |
| 109 | +- OpenAI model requires API key in `.env` file |
| 110 | +
|
| 111 | +### 3. Using Individual Python Components via Command Line |
| 112 | +
|
| 113 | +#### Process PDFs |
| 114 | +
|
| 115 | +To process a PDF file and save the chunks to a JSON file, run: |
| 116 | +
|
| 117 | +```bash |
| 118 | +# Process a single PDF |
| 119 | +python pdf_processor.py --input path/to/document.pdf --output chunks.json |
| 120 | +
|
| 121 | +# Process multiple PDFs in a directory |
| 122 | +python pdf_processor.py --input path/to/pdf/directory --output chunks.json |
| 123 | +
|
| 124 | +# Process a single PDF from a URL |
| 125 | +python pdf_processor.py --input https://example.com/document.pdf --output chunks.json |
| 126 | +# sample pdf: https://arxiv.org/pdf/2203.06605 |
| 127 | +``` |
| 128 | +
|
| 129 | +#### Process Websites with Trafilatura |
| 130 | +
|
| 131 | +Process a single website and save the content to a JSON file: |
| 132 | +
|
| 133 | +```bash |
| 134 | +python web_processor.py --input https://example.com --output docs/web_content.json |
| 135 | +``` |
| 136 | +
|
| 137 | +Or, process multiple URLs from a file and save them into a single JSON file: |
| 138 | +
|
| 139 | +```bash |
| 140 | +python web_processor.py --input urls.txt --output docs/web_content.json |
| 141 | +``` |
| 142 | +
|
| 143 | +#### Manage Vector Store |
| 144 | +
|
| 145 | +To add documents to the vector store and query them, run: |
| 146 | +
|
| 147 | +```bash |
| 148 | +# Add documents from a chunks file, by default to the pdf_collection |
| 149 | +python store.py --add chunks.json |
| 150 | +# for websites, use the --add-web flag |
| 151 | +python store.py --add-web docs/web_content.json |
| 152 | +
|
| 153 | +# Query the vector store directly, both pdf and web collections |
| 154 | +# llm will make the best decision on which collection to query based upon your input |
| 155 | +python store.py --query "your search query" |
| 156 | +python local_rag_agent.py --query "your search query" |
| 157 | +``` |
| 158 | +
|
| 159 | +#### Use RAG Agent |
| 160 | +
|
| 161 | +To query documents using either OpenAI or a local model, run: |
| 162 | +
|
| 163 | +```bash |
| 164 | +# Using OpenAI (requires API key in .env) |
| 165 | +python rag_agent.py --query "Can you explain the DaGAN Approach proposed in the Depth-Aware Generative Adversarial Network for Talking Head Video Generation article?" |
| 166 | +
|
| 167 | +# Using local Mistral model |
| 168 | +python local_rag_agent.py --query "Can you explain the DaGAN Approach proposed in the Depth-Aware Generative Adversarial Network for Talking Head Video Generation article?" |
| 169 | +``` |
| 170 | +
|
| 171 | +### 4. Complete Pipeline Example |
| 172 | +
|
| 173 | +First, we process a document and query it using the local model. Then, we add the document to the vector store and query from the knowledge base to get the RAG system in action. |
| 174 | +
|
| 175 | +```bash |
| 176 | +# 1. Process the PDF |
| 177 | +python pdf_processor.py --input example.pdf --output chunks.json |
| 178 | +
|
| 179 | +#python pdf_processor.py --input https://arxiv.org/pdf/2203.06605 --output chunks.json |
| 180 | +
|
| 181 | +# 2. Add to vector store |
| 182 | +python store.py --add chunks.json |
| 183 | +
|
| 184 | +# 3. Query using local model |
| 185 | +python local_rag_agent.py --query "Can you explain the DaGAN Approach proposed in the Depth-Aware Generative Adversarial Network for Talking Head Video Generation article?" |
| 186 | +
|
| 187 | +# Or using OpenAI (requires API key): |
| 188 | +python rag_agent.py --query "Can you explain the DaGAN Approach proposed in the Depth-Aware Generative Adversarial Network for Talking Head Video Generation article?" |
| 189 | +``` |
| 190 | +
|
| 191 | +## 2. Chain of Thought (CoT) Support |
| 192 | +
|
| 193 | +The system implements an advanced multi-agent Chain of Thought system, allowing complex queries to be broken down and processed through multiple specialized agents. This feature enhances the reasoning capabilities of both local and cloud-based models. |
| 194 | +
|
| 195 | +### Multi-Agent System |
| 196 | +
|
| 197 | +The CoT system consists of four specialized agents: |
| 198 | +
|
| 199 | +1. **Planner Agent**: Breaks down complex queries into clear, manageable steps |
| 200 | +2. **Research Agent**: Gathers and analyzes relevant information from knowledge bases |
| 201 | +3. **Reasoning Agent**: Applies logical analysis to information and draws conclusions |
| 202 | +4. **Synthesis Agent**: Combines multiple pieces of information into a coherent response |
| 203 | +
|
| 204 | +### Using CoT |
| 205 | +
|
| 206 | +You can activate the multi-agent CoT system in several ways: |
| 207 | +
|
| 208 | +1. **Command Line**: |
| 209 | +```bash |
| 210 | +# Using local Mistral model (default) |
| 211 | +python local_rag_agent.py --query "your query" --use-cot |
| 212 | +
|
| 213 | +# Using OpenAI model |
| 214 | +python rag_agent.py --query "your query" --use-cot |
| 215 | +``` |
| 216 | +
|
| 217 | +2. **Testing the System**: |
| 218 | +```bash |
| 219 | +# Test with local model (default) |
| 220 | +python tests/test_new_cot.py |
| 221 | +
|
| 222 | +# Test with OpenAI model |
| 223 | +python tests/test_new_cot.py --model openai |
| 224 | +``` |
| 225 | +
|
| 226 | +3. **API Endpoint**: |
| 227 | +```http |
| 228 | +POST /query |
| 229 | +Content-Type: application/json |
| 230 | +
|
| 231 | +{ |
| 232 | + "query": "your query", |
| 233 | + "use_cot": true |
| 234 | +} |
| 235 | +``` |
| 236 | +
|
| 237 | +### Example Output |
| 238 | +
|
| 239 | +When CoT is enabled, the system will show: |
| 240 | +- The initial plan for answering the query |
| 241 | +- Research findings for each step |
| 242 | +- Reasoning process and conclusions |
| 243 | +- Final synthesized answer |
| 244 | +- Sources used from the knowledge base |
| 245 | +
|
| 246 | +Example: |
| 247 | +``` |
| 248 | +Step 1: Planning |
| 249 | +- Break down the technical components |
| 250 | +- Identify key features |
| 251 | +- Analyze implementation details |
| 252 | +
|
| 253 | +Step 2: Research |
| 254 | +[Research findings for each step...] |
| 255 | +
|
| 256 | +Step 3: Reasoning |
| 257 | +[Logical analysis and conclusions...] |
| 258 | +
|
| 259 | +Final Answer: |
| 260 | +[Comprehensive response synthesized from all steps...] |
| 261 | +
|
| 262 | +Sources used: |
| 263 | +- document.pdf (pages: 1, 2, 3) |
| 264 | +- implementation.py |
| 265 | +``` |
| 266 | +
|
| 267 | +### Benefits |
| 268 | +
|
| 269 | +The multi-agent CoT approach offers several advantages: |
| 270 | +- More structured and thorough analysis of complex queries |
| 271 | +- Better integration with knowledge bases |
| 272 | +- Transparent reasoning process |
| 273 | +- Improved answer quality through specialized agents |
| 274 | +- Works with both local and cloud-based models |
| 275 | +
|
| 276 | +## Annex: API Endpoints |
| 277 | +
|
| 278 | +### Upload PDF |
| 279 | +
|
| 280 | +```http |
| 281 | +POST /upload/pdf |
| 282 | +Content-Type: multipart/form-data |
| 283 | +
|
| 284 | +file: <pdf-file> |
| 285 | +``` |
| 286 | +
|
| 287 | +This endpoint uploads and processes a PDF file, storing its contents in the vector database. |
| 288 | +
|
| 289 | +### Query |
| 290 | +
|
| 291 | +```http |
| 292 | +POST /query |
| 293 | +Content-Type: application/json |
| 294 | +
|
| 295 | +{ |
| 296 | + "query": "your question here" |
| 297 | +} |
| 298 | +``` |
| 299 | +
|
| 300 | +This endpoint processes a query through the agentic RAG pipeline and returns a response with context. |
| 301 | +
|
| 302 | +## Annex: Architecture |
| 303 | +
|
| 304 | +The system consists of several key components: |
| 305 | +
|
| 306 | +1. **PDF Processor**: we use Docling to extract and chunk text from PDF documents |
| 307 | +2. **Vector Store**: Manages document embeddings and similarity search using ChromaDB |
| 308 | +3. **RAG Agent**: Makes intelligent decisions about query routing and response generation |
| 309 | + - OpenAI Agent: Uses `gpt-4-turbo-preview` for high-quality responses, but requires an OpenAI API key |
| 310 | + - Local Agent: Uses `Mistral-7B` as an open-source alternative |
| 311 | +4. **FastAPI Server**: Provides REST API endpoints for document upload and querying |
| 312 | +
|
| 313 | +The RAG Agent flow is the following: |
| 314 | +
|
| 315 | +1. Analyzes query type |
| 316 | +2. Try to find relevant PDF context, regardless of query type |
| 317 | +3. If PDF context is found, use it to generate a response. |
| 318 | +4. If no PDF context is found OR if it's a general knowledge query, use the pre-trained LLM directly |
| 319 | +5. Fall back to a "no information" response only in edge cases. |
| 320 | + |
| 321 | +## Contributing |
| 322 | + |
| 323 | +This project is open source. Please submit your contributions by forking this repository and submitting a pull request! Oracle appreciates any contributions that are made by the open source community. |
| 324 | + |
| 325 | +## License |
| 326 | + |
| 327 | +Copyright (c) 2024 Oracle and/or its affiliates. |
| 328 | + |
| 329 | +Licensed under the Universal Permissive License (UPL), Version 1.0. |
| 330 | + |
| 331 | +See [LICENSE](../LICENSE) for more details. |
| 332 | + |
| 333 | +ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK. |
0 commit comments