|
| 1 | +# RAG Report Generator |
| 2 | + |
| 3 | +An enterprise-grade Retrieval-Augmented Generation (RAG) system for generating comprehensive business reports from multiple document sources using Oracle Cloud Infrastructure (OCI) Generative AI services. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Multi-Document Processing**: Ingest and process PDF and XLSX documents |
| 8 | +- **Multiple Embedding Models**: Support for Cohere multilingual and v4.0 embeddings |
| 9 | +- **Advanced LLM Support**: Integration with OCI models (Grok-3, Grok-4, Llama 3.3, Cohere Command) |
| 10 | +- **Agentic Workflows**: Multi-agent system for intelligent report generation |
| 11 | +- **Hierarchical Report Structure**: Automatically organizes content based on user queries |
| 12 | +- **Citation Tracking**: Source attribution with references |
| 13 | +- **Multi-Language Support**: Generate reports in English, Arabic, Spanish, and French |
| 14 | +- **Visual Analytics**: Automatic chart and table generation from data |
| 15 | + |
| 16 | +## Prerequisites |
| 17 | + |
| 18 | +- Python 3.11+ |
| 19 | +- OCI Account with Generative AI service access |
| 20 | +- OCI CLI configured with appropriate credentials |
| 21 | + |
| 22 | +## Installation |
| 23 | + |
| 24 | +1. Clone the repository: |
| 25 | +```bash |
| 26 | +git clone <repository-url> |
| 27 | +cd agentic_rag |
| 28 | +``` |
| 29 | + |
| 30 | +2. Create a virtual environment: |
| 31 | +```bash |
| 32 | +python -m venv venv |
| 33 | +source venv/bin/activate # On Windows: venv\Scripts\activate |
| 34 | +``` |
| 35 | + |
| 36 | +3. Install dependencies: |
| 37 | +```bash |
| 38 | +pip install -r requirements.txt |
| 39 | +``` |
| 40 | + |
| 41 | +4. Configure OCI credentials: |
| 42 | +```bash |
| 43 | +# Create OCI config directory if it doesn't exist |
| 44 | +mkdir -p ~/.oci |
| 45 | + |
| 46 | +# Add your OCI configuration to ~/.oci/config |
| 47 | +# See: https://docs.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm |
| 48 | +``` |
| 49 | + |
| 50 | +5. Set up environment variables: |
| 51 | +```bash |
| 52 | +# Create .env file with your configuration |
| 53 | +cat > .env << EOF |
| 54 | +# OCI Configuration |
| 55 | +OCI_COMPARTMENT_ID=your-compartment-id |
| 56 | +COMPARTMENT_ID_DAC=your-dac-compartment-id # If using dedicated cluster |
| 57 | +
|
| 58 | +# Model IDs (get from OCI Console) |
| 59 | +OCI_GROK_3_MODEL_ID=your-grok3-model-id |
| 60 | +OCI_GROK_4_MODEL_ID=your-grok4-model-id |
| 61 | +OCI_LLAMA_3_3_MODEL_ID=your-llama-model-id |
| 62 | +OCI_COHERE_COMMAND_A_MODEL_ID=your-cohere-model-id |
| 63 | +
|
| 64 | +# Default Models (optional) |
| 65 | +DEFAULT_EMBEDDING_MODEL=cohere-embed-multilingual-v3.0 |
| 66 | +DEFAULT_LLM_MODEL=grok-3 |
| 67 | +EOF |
| 68 | +``` |
| 69 | + |
| 70 | +## Quick Start |
| 71 | + |
| 72 | +1. Launch the Gradio interface: |
| 73 | +```bash |
| 74 | +python gradio_app.py |
| 75 | +``` |
| 76 | + |
| 77 | +2. Open your browser to `http://localhost:7863` |
| 78 | + |
| 79 | +3. Follow these steps in the interface: |
| 80 | + - **Document Processing Tab**: Upload and process your documents (PDF/XLSX) - see samples in sample_data folder |
| 81 | + - **Vector Store Viewer Tab**: View and manage your document collections |
| 82 | + - **Inference & Query Tab**: Enter queries and generate reports - see sample queries in sample_queries folder |
| 83 | + |
| 84 | +## Usage Guide |
| 85 | + |
| 86 | +### Document Processing |
| 87 | + |
| 88 | +1. Select an embedding model (e.g., cohere-embed-multilingual-v3.0) |
| 89 | +2. Upload documents: |
| 90 | + - **XLSX**: Financial data, ESG metrics, structured data |
| 91 | + - **PDF**: Reports, policies, unstructured documents |
| 92 | +3. Specify the entity name for each document, i.e. the bank or institition's name |
| 93 | +4. Click "Process" to ingest into the vector store |
| 94 | + |
| 95 | +### Generating Reports |
| 96 | + |
| 97 | +1. In the **Inference & Query** tab: |
| 98 | + - Enter your query (can be structured with numbered sections) |
| 99 | + - Select LLM model (Grok-3 recommended for reports) |
| 100 | + - Choose data sources (PDF/XLSX collections) |
| 101 | + - Enable "Agentic Workflow" for comprehensive multi-agent reports |
| 102 | + - Click "Run Query" |
| 103 | + |
| 104 | +2. Example structured query: |
| 105 | +``` |
| 106 | +Prepare a comprehensive ESG comparison report between Company A and Company B: |
| 107 | +
|
| 108 | +1) Climate Impact & Emissions |
| 109 | + - Net-zero commitments and targets |
| 110 | + - Scope 1, 2, and 3 emissions |
| 111 | + |
| 112 | +2) Social & Governance |
| 113 | + - Diversity targets |
| 114 | + - Board oversight |
| 115 | + |
| 116 | +3) Financial Performance |
| 117 | + - Revenue and profitability |
| 118 | + - ESG investments |
| 119 | +``` |
| 120 | + |
| 121 | +### Report Features |
| 122 | + |
| 123 | +Generated reports include: |
| 124 | +- Executive summary addressing your specific query |
| 125 | +- Hierarchically organized sections |
| 126 | +- Data tables and visualizations |
| 127 | +- Source citations [1], [2] for traceability |
| 128 | +- References section with full source details |
| 129 | +- Professional formatting (Times New Roman, black headings) |
| 130 | + |
| 131 | +## Project Structure |
| 132 | + |
| 133 | +``` |
| 134 | +agentic_rag/ |
| 135 | +├── gradio_app.py # Main application interface |
| 136 | +├── local_rag_agent.py # Core RAG system logic |
| 137 | +├── vector_store.py # Vector database management |
| 138 | +├── oci_embedding_handler.py # OCI embedding services |
| 139 | +├── agents/ |
| 140 | +│ ├── agent_factory.py # Agent creation and management |
| 141 | +│ └── report_writer_agent.py # Report generation logic |
| 142 | +├── handlers/ |
| 143 | +│ ├── query_handler.py # Query processing |
| 144 | +│ ├── pdf_handler.py # PDF document processing |
| 145 | +│ ├── xlsx_handler.py # Excel document processing |
| 146 | +│ └── vector_handler.py # Vector store operations |
| 147 | +├── ingest_pdf.py # PDF ingestion pipeline |
| 148 | +├── ingest_xlsx.py # Excel ingestion pipeline |
| 149 | +├── sample_data/ # Sample documents for testing |
| 150 | +├── sample_queries/ # Example queries for reports |
| 151 | +└── utils/ |
| 152 | + └── demo_logger.py # Logging utilities |
| 153 | +``` |
| 154 | + |
| 155 | +## Advanced Configuration |
| 156 | + |
| 157 | +### Embedding Models |
| 158 | + |
| 159 | +Available embedding models: |
| 160 | +- `cohere-embed-multilingual-v3.0` (1024 dimensions) |
| 161 | +- `cohere-embed-v4.0` (1024 dimensions) |
| 162 | +- `chromadb-default` (384 dimensions, local) |
| 163 | + |
| 164 | +### LLM Models |
| 165 | + |
| 166 | +Supported OCI Generative AI models: |
| 167 | +- **Grok-3**: Best for comprehensive reports (16K output tokens) |
| 168 | +- **Grok-4**: Advanced reasoning (120K output tokens) |
| 169 | +- **Llama 3.3**: Fast inference (4K output tokens) |
| 170 | +- **Cohere Command**: Instruction following (4K output tokens) |
| 171 | + |
| 172 | +### Vector Store Management |
| 173 | + |
| 174 | +- Collections are automatically created per embedding model |
| 175 | +- Switch between models without data loss |
| 176 | +- Delete collections via the Vector Store Viewer tab |
| 177 | + |
| 178 | +## Troubleshooting |
| 179 | + |
| 180 | +### Common Issues |
| 181 | + |
| 182 | +1. **OCI Authentication Error** |
| 183 | + - Verify ~/.oci/config is properly configured |
| 184 | + - Check compartment ID in .env file |
| 185 | + - Ensure your user has appropriate IAM policies |
| 186 | + |
| 187 | +2. **Embedding Model Errors** |
| 188 | + - Verify model IDs in .env file |
| 189 | + - Check OCI service limits and quotas |
| 190 | + - Ensure embedding service is enabled in your region |
| 191 | + |
| 192 | +3. **Memory Issues** |
| 193 | + - For large documents, process in smaller batches |
| 194 | + - Adjust chunk size in ingestion settings |
| 195 | + - Consider using pagination for large result sets |
| 196 | + |
| 197 | +### Logs |
| 198 | + |
| 199 | +Check `logs/app.log` for detailed debugging information. |
| 200 | + |
| 201 | +## API Usage (Optional) |
| 202 | + |
| 203 | +For programmatic access: |
| 204 | + |
| 205 | +```python |
| 206 | +from local_rag_agent import RAGSystem |
| 207 | +from vector_store import EnhancedVectorStore |
| 208 | + |
| 209 | +# Initialize system |
| 210 | +vector_store = EnhancedVectorStore( |
| 211 | + persist_directory="embed-cohere-embed-multilingual-v3.0", |
| 212 | + embedding_model="cohere-embed-multilingual-v3.0" |
| 213 | +) |
| 214 | + |
| 215 | +rag_system = RAGSystem( |
| 216 | + vector_store=vector_store, |
| 217 | + model_name="grok-3", |
| 218 | + use_cot=True |
| 219 | +) |
| 220 | + |
| 221 | +# Process query |
| 222 | +response = rag_system.process_query("Your query here") |
| 223 | +print(response["answer"]) |
| 224 | +``` |
| 225 | + |
| 226 | +## Contributing |
| 227 | + |
| 228 | +1. Fork the repository |
| 229 | +2. Create a feature branch |
| 230 | +3. Make your changes |
| 231 | +4. Run tests: `python -m pytest tests/` |
| 232 | +5. Submit a pull request |
| 233 | + |
| 234 | +## License |
| 235 | + |
| 236 | +[Your License Here] |
| 237 | + |
| 238 | +## Support |
| 239 | + |
| 240 | +For issues and questions: |
| 241 | +- Check the logs in `logs/app.log` |
| 242 | +- Review the troubleshooting section |
| 243 | +- Open an issue on GitHub |
| 244 | + |
| 245 | +## Acknowledgments |
| 246 | + |
| 247 | +- Oracle Cloud Infrastructure for Generative AI services |
| 248 | +- Gradio for the web interface |
| 249 | +- ChromaDB for vector storage |
| 250 | +- The open-source community |
0 commit comments