Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
459 changes: 459 additions & 0 deletions agentic_rag/OraDBVectorStore.py

Large diffs are not rendered by default.

85 changes: 83 additions & 2 deletions agentic_rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The system has the following features:

- Intelligent query routing
- PDF processing using Docling for accurate text extraction and chunking
- Persistent vector storage with ChromaDB and Oracle Database 23ai (PDF and Websites)
- Persistent vector storage with Oracle Database 23ai (PDF and Websites)
- Smart context retrieval and response generation
- FastAPI-based REST API for document upload and querying
- Support for both OpenAI-based agents or local, transformer-based agents (`Mistral-7B` by default)
Expand Down Expand Up @@ -206,6 +206,43 @@ python store.py --query "your search query"
python local_rag_agent.py --query "your search query"
```

#### Test Oracle DB Vector Store

The system includes a test script to verify Oracle DB connectivity and examine the contents of your collections. This is useful for:
- Checking if Oracle DB is properly configured
- Viewing statistics about your collections
- Inspecting the content stored in each collection
- Testing basic vector search functionality

To run the test:

```bash
# Basic test - checks connection and runs a test query
python test_oradb.py

# Show only collection statistics without inserting test data
python test_oradb.py --stats-only

# Specify a custom query for testing
python test_oradb.py --query "artificial intelligence"
```

The script will:
1. Verify Oracle DB credentials in your `config.yaml` file
2. Test connection to the Oracle DB
3. Display the total number of chunks in each collection (PDF, Web, Repository, General Knowledge)
4. Show content and metadata from the most recently inserted chunk in each collection
5. Unless running with `--stats-only`, insert test data and run a sample vector search

Requirements:
- Oracle DB credentials properly configured in `config.yaml`:
```yaml
ORACLE_DB_USERNAME: ADMIN
ORACLE_DB_PASSWORD: your_password_here
ORACLE_DB_DSN: your_connection_string_here
```
- The `oracledb` Python package installed

#### Use RAG Agent

To query documents using either OpenAI or a local model, run:
Expand Down Expand Up @@ -358,7 +395,7 @@ The system consists of several key components:
1. **PDF Processor**: we use `docling` to extract and chunk text from PDF documents
2. **Web Processor**: we use `trafilatura` to extract and chunk text from websites
3. **GitHub Repository Processor**: we use `gitingest` to extract and chunk text from repositories
4. **Vector Store**: Manages document embeddings and similarity search using `ChromaDB` and `Oracle Database 23ai`
4. **Vector Store**: Manages document embeddings and similarity search using `Oracle Database 23ai` (default) or `ChromaDB` (fallback)
5. **RAG Agent**: Makes intelligent decisions about query routing and response generation
- OpenAI Agent: Uses `gpt-4-turbo-preview` for high-quality responses, but requires an OpenAI API key
- Local Agent: Uses `Mistral-7B` as an open-source alternative
Expand All @@ -373,6 +410,50 @@ The RAG Agent flow is the following:
4. If no PDF context is found OR if it's a general knowledge query, use the pre-trained LLM directly
5. Fall back to a "no information" response only in edge cases.

## Annex: Command Line Usage

You can run the system from the command line using:

```bash
python local_rag_agent.py --query "Your question here" [options]
```

### Command Line Arguments

| Argument | Description | Default |
| --- | --- | --- |
| `--query` | The query to process | *Required* |
| `--embeddings` | Select embeddings backend (`oracle` or `chromadb`) | `oracle` |
| `--model` | Model to use for inference | `mistralai/Mistral-7B-Instruct-v0.2` |
| `--collection` | Collection to query (PDF, Repository, Web, General) | Auto-determined |
| `--use-cot` | Enable Chain of Thought reasoning | `False` |
| `--store-path` | Path to ChromaDB store (if using ChromaDB) | `embeddings` |
| `--skip-analysis` | Skip query analysis step | `False` |
| `--verbose` | Show full content of sources | `False` |
| `--quiet` | Disable verbose logging | `False` |

### Examples

Query using Oracle DB (default):
```bash
python local_rag_agent.py --query "How does vector search work?"
```

Force using ChromaDB:
```bash
python local_rag_agent.py --query "How does vector search work?" --embeddings chromadb
```

Query with Chain of Thought reasoning:
```bash
python local_rag_agent.py --query "Explain the difference between RAG and fine-tuning" --use-cot
```

Query a specific collection:
```bash
python local_rag_agent.py --query "How to implement a queue?" --collection "Repository Collection"
```

## Contributing

This project is open source. Please submit your contributions by forking this repository and submitting a pull request! Oracle appreciates any contributions that are made by the open source community.
Expand Down
11 changes: 10 additions & 1 deletion agentic_rag/config_example.yaml
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
HUGGING_FACE_HUB_TOKEN: your_token_here
HUGGING_FACE_HUB_TOKEN: your_token_here

# Oracle DB Configuration
ORACLE_DB_USERNAME: ADMIN
ORACLE_DB_PASSWORD: your_password_here
ORACLE_DB_DSN: >-
(description= (retry_count=20)(retry_delay=3)
(address=(protocol=tcps)(port=1522)(host=your-oracle-db-host.com))
(connect_data=(service_name=your-service-name))
(security=(ssl_server_dn_match=yes)))
102 changes: 102 additions & 0 deletions agentic_rag/docs/oracle_db_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Oracle DB 23ai Integration

The Agentic RAG system now supports Oracle DB 23ai as a vector store backend, providing enhanced performance, scalability, and enterprise-grade database features.

## Overview

Oracle Database 23ai is used as the default vector storage system when available, with ChromaDB serving as a fallback option. This integration leverages Oracle's vector database capabilities for efficient semantic search and retrieval.

## Requirements

To use the Oracle DB integration, you need:

1. **Oracle Database 23ai**: With vector extensions enabled
2. **Python Packages**:
- `oracledb`: For database connectivity
- `sentence-transformers`: For generating embeddings

## Installation

1. Install the required packages:

```bash
pip install oracledb sentence-transformers
```

2. Configure your Oracle Database connection in `config.yaml`:

```yaml
# Oracle DB Configuration
ORACLE_DB_USERNAME: ADMIN
ORACLE_DB_PASSWORD: your_password_here
ORACLE_DB_DSN: >-
(description= (retry_count=20)(retry_delay=3)
(address=(protocol=tcps)(port=1522)(host=your-oracle-db-host.com))
(connect_data=(service_name=your-service-name))
(security=(ssl_server_dn_match=yes)))
```

The system will automatically look for these credentials in your `config.yaml` file. If not found, it will raise an error and fall back to ChromaDB.

## How It Works

The system automatically determines which database to use:

1. First tries to connect to Oracle DB 23ai
2. If connection succeeds, uses Oracle for all vector operations
3. If Oracle DB is unavailable, falls back to ChromaDB

## Database Structure

The Oracle DB integration creates the following tables:

- `PDFCollection`: Stores chunks from PDF documents
- `WebCollection`: Stores chunks from web content
- `RepoCollection`: Stores chunks from code repositories
- `GeneralCollection`: Stores general knowledge chunks

Each table has the following structure:
- `id`: Primary key identifier
- `text`: The text content of the chunk
- `metadata`: JSON string containing metadata (source, page, etc.)
- `embedding`: Vector representation of the text

## Testing

You can test the Oracle DB integration using:

```bash
python test_oradb.py
```

Or test both systems using:

```bash
./test_db_systems.sh
```

## Switching Between Databases

You can force the system to use ChromaDB instead of Oracle DB by setting the `use_oracle_db` parameter to `False`:

```python
agent = LocalRAGAgent(use_oracle_db=False)
```

## Gradio Interface

The Gradio web interface displays which database system is active at the top of the page:

- Green banner: Oracle DB 23ai is active
- Red banner: ChromaDB is being used (Oracle DB not available)

## Troubleshooting

If you encounter database connection issues:

1. Verify your Oracle DB credentials and connection string
2. Check that the Oracle DB 23ai instance is running
3. Ensure you have the required Python packages installed
4. Check network connectivity to the database server

If Oracle DB connection fails, the system will automatically fall back to ChromaDB without requiring any user intervention.
Loading