oracle-devrel · WSPluta · Apr 11, 2025 · Apr 10, 2025 · Apr 10, 2025 · Apr 10, 2025
diff --git a/agentic_rag/OraDBVectorStore.py b/agentic_rag/OraDBVectorStore.py
diff --git a/agentic_rag/README.md b/agentic_rag/README.md
@@ -10,7 +10,7 @@ The system has the following features:
 
 - Intelligent query routing
 - PDF processing using Docling for accurate text extraction and chunking
-- Persistent vector storage with ChromaDB and Oracle Database 23ai (PDF and Websites)
+- Persistent vector storage with Oracle Database 23ai (PDF and Websites)
 - Smart context retrieval and response generation
 - FastAPI-based REST API for document upload and querying
 - Support for both OpenAI-based agents or local, transformer-based agents (`Mistral-7B` by default)
@@ -206,6 +206,43 @@ python store.py --query "your search query"
 python local_rag_agent.py --query "your search query"
 ```
 
+#### Test Oracle DB Vector Store
+
+The system includes a test script to verify Oracle DB connectivity and examine the contents of your collections. This is useful for:
+- Checking if Oracle DB is properly configured
+- Viewing statistics about your collections
+- Inspecting the content stored in each collection
+- Testing basic vector search functionality
+
+To run the test:
+
+```bash
+# Basic test - checks connection and runs a test query
+python test_oradb.py
+
+# Show only collection statistics without inserting test data
+python test_oradb.py --stats-only
+
+# Specify a custom query for testing
+python test_oradb.py --query "artificial intelligence"
+```
+
+The script will:
+1. Verify Oracle DB credentials in your `config.yaml` file
+2. Test connection to the Oracle DB
+3. Display the total number of chunks in each collection (PDF, Web, Repository, General Knowledge)
+4. Show content and metadata from the most recently inserted chunk in each collection
+5. Unless running with `--stats-only`, insert test data and run a sample vector search
+
+Requirements:
+- Oracle DB credentials properly configured in `config.yaml`:
+  ```yaml
+  ORACLE_DB_USERNAME: ADMIN
+  ORACLE_DB_PASSWORD: your_password_here
+  ORACLE_DB_DSN: your_connection_string_here
+  ```
+- The `oracledb` Python package installed
+
 #### Use RAG Agent
 
 To query documents using either OpenAI or a local model, run:
@@ -358,7 +395,7 @@ The system consists of several key components:
 1. **PDF Processor**: we use `docling` to extract and chunk text from PDF documents
 2. **Web Processor**: we use `trafilatura` to extract and chunk text from websites
 3. **GitHub Repository Processor**: we use `gitingest` to extract and chunk text from repositories
-4. **Vector Store**: Manages document embeddings and similarity search using `ChromaDB` and `Oracle Database 23ai`
+4. **Vector Store**: Manages document embeddings and similarity search using `Oracle Database 23ai` (default) or `ChromaDB` (fallback)
 5. **RAG Agent**: Makes intelligent decisions about query routing and response generation
    - OpenAI Agent: Uses `gpt-4-turbo-preview` for high-quality responses, but requires an OpenAI API key
    - Local Agent: Uses `Mistral-7B` as an open-source alternative
@@ -373,6 +410,50 @@ The RAG Agent flow is the following:
 4. If no PDF context is found OR if it's a general knowledge query, use the pre-trained LLM directly
 5. Fall back to a "no information" response only in edge cases.
 
+## Annex: Command Line Usage
+
+You can run the system from the command line using:
+
+```bash
+python local_rag_agent.py --query "Your question here" [options]
+```
+
+### Command Line Arguments
+
+| Argument | Description | Default |
+| --- | --- | --- |
+| `--query` | The query to process | *Required* |
+| `--embeddings` | Select embeddings backend (`oracle` or `chromadb`) | `oracle` |
+| `--model` | Model to use for inference | `mistralai/Mistral-7B-Instruct-v0.2` |
+| `--collection` | Collection to query (PDF, Repository, Web, General) | Auto-determined |
+| `--use-cot` | Enable Chain of Thought reasoning | `False` |
+| `--store-path` | Path to ChromaDB store (if using ChromaDB) | `embeddings` |
+| `--skip-analysis` | Skip query analysis step | `False` |
+| `--verbose` | Show full content of sources | `False` |
+| `--quiet` | Disable verbose logging | `False` |
+
+### Examples
+
+Query using Oracle DB (default):
+```bash
+python local_rag_agent.py --query "How does vector search work?"
+```
+
+Force using ChromaDB:
+```bash
+python local_rag_agent.py --query "How does vector search work?" --embeddings chromadb
+```
+
+Query with Chain of Thought reasoning:
+```bash
+python local_rag_agent.py --query "Explain the difference between RAG and fine-tuning" --use-cot
+```
+
+Query a specific collection:
+```bash
+python local_rag_agent.py --query "How to implement a queue?" --collection "Repository Collection"
+```
+
 ## Contributing
 
 This project is open source. Please submit your contributions by forking this repository and submitting a pull request! Oracle appreciates any contributions that are made by the open source community.

diff --git a/agentic_rag/config_example.yaml b/agentic_rag/config_example.yaml
@@ -1 +1,10 @@
-HUGGING_FACE_HUB_TOKEN: your_token_here
+HUGGING_FACE_HUB_TOKEN: your_token_here
+
+# Oracle DB Configuration
+ORACLE_DB_USERNAME: ADMIN
+ORACLE_DB_PASSWORD: your_password_here
+ORACLE_DB_DSN: >-
+  (description= (retry_count=20)(retry_delay=3)
+  (address=(protocol=tcps)(port=1522)(host=your-oracle-db-host.com))
+  (connect_data=(service_name=your-service-name))
+  (security=(ssl_server_dn_match=yes)))
diff --git a/agentic_rag/docs/oracle_db_integration.md b/agentic_rag/docs/oracle_db_integration.md
@@ -0,0 +1,102 @@
+# Oracle DB 23ai Integration
+
+The Agentic RAG system now supports Oracle DB 23ai as a vector store backend, providing enhanced performance, scalability, and enterprise-grade database features.
+
+## Overview
+
+Oracle Database 23ai is used as the default vector storage system when available, with ChromaDB serving as a fallback option. This integration leverages Oracle's vector database capabilities for efficient semantic search and retrieval.
+
+## Requirements
+
+To use the Oracle DB integration, you need:
+
+1. **Oracle Database 23ai**: With vector extensions enabled
+2. **Python Packages**:
+   - `oracledb`: For database connectivity
+   - `sentence-transformers`: For generating embeddings
+
+## Installation
+
+1. Install the required packages:
+
+```bash
+pip install oracledb sentence-transformers
+```
+
+2. Configure your Oracle Database connection in `config.yaml`:
+
+```yaml
+# Oracle DB Configuration
+ORACLE_DB_USERNAME: ADMIN
+ORACLE_DB_PASSWORD: your_password_here
+ORACLE_DB_DSN: >-
+  (description= (retry_count=20)(retry_delay=3)
+  (address=(protocol=tcps)(port=1522)(host=your-oracle-db-host.com))
+  (connect_data=(service_name=your-service-name))
+  (security=(ssl_server_dn_match=yes)))
+```
+
+The system will automatically look for these credentials in your `config.yaml` file. If not found, it will raise an error and fall back to ChromaDB.
+
+## How It Works
+
+The system automatically determines which database to use:
+
+1. First tries to connect to Oracle DB 23ai
+2. If connection succeeds, uses Oracle for all vector operations
+3. If Oracle DB is unavailable, falls back to ChromaDB
+
+## Database Structure
+
+The Oracle DB integration creates the following tables:
+
+- `PDFCollection`: Stores chunks from PDF documents
+- `WebCollection`: Stores chunks from web content
+- `RepoCollection`: Stores chunks from code repositories
+- `GeneralCollection`: Stores general knowledge chunks
+
+Each table has the following structure:
+- `id`: Primary key identifier
+- `text`: The text content of the chunk
+- `metadata`: JSON string containing metadata (source, page, etc.)
+- `embedding`: Vector representation of the text
+
+## Testing
+
+You can test the Oracle DB integration using:
+
+```bash
+python test_oradb.py
+```
+
+Or test both systems using:
+
+```bash
+./test_db_systems.sh
+```
+
+## Switching Between Databases
+
+You can force the system to use ChromaDB instead of Oracle DB by setting the `use_oracle_db` parameter to `False`:
+
+```python
+agent = LocalRAGAgent(use_oracle_db=False)
+```
+
+## Gradio Interface
+
+The Gradio web interface displays which database system is active at the top of the page:
+
+- Green banner: Oracle DB 23ai is active
+- Red banner: ChromaDB is being used (Oracle DB not available)
+
+## Troubleshooting
+
+If you encounter database connection issues:
+
+1. Verify your Oracle DB credentials and connection string
+2. Check that the Oracle DB 23ai instance is running
+3. Ensure you have the required Python packages installed
+4. Check network connectivity to the database server
+
+If Oracle DB connection fails, the system will automatically fall back to ChromaDB without requiring any user intervention.