Skip to content

Commit 65785a6

Browse files
authored
Merge pull request #29 from oracle-devrel/update
2 parents f5a539e + 7fb6f12 commit 65785a6

File tree

10 files changed

+1103
-56
lines changed

10 files changed

+1103
-56
lines changed

agentic_rag/OraDBVectorStore.py

Lines changed: 459 additions & 0 deletions
Large diffs are not rendered by default.

agentic_rag/README.md

Lines changed: 83 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The system has the following features:
1010

1111
- Intelligent query routing
1212
- PDF processing using Docling for accurate text extraction and chunking
13-
- Persistent vector storage with ChromaDB and Oracle Database 23ai (PDF and Websites)
13+
- Persistent vector storage with Oracle Database 23ai (PDF and Websites)
1414
- Smart context retrieval and response generation
1515
- FastAPI-based REST API for document upload and querying
1616
- Support for both OpenAI-based agents or local, transformer-based agents (`Mistral-7B` by default)
@@ -206,6 +206,43 @@ python store.py --query "your search query"
206206
python local_rag_agent.py --query "your search query"
207207
```
208208
209+
#### Test Oracle DB Vector Store
210+
211+
The system includes a test script to verify Oracle DB connectivity and examine the contents of your collections. This is useful for:
212+
- Checking if Oracle DB is properly configured
213+
- Viewing statistics about your collections
214+
- Inspecting the content stored in each collection
215+
- Testing basic vector search functionality
216+
217+
To run the test:
218+
219+
```bash
220+
# Basic test - checks connection and runs a test query
221+
python test_oradb.py
222+
223+
# Show only collection statistics without inserting test data
224+
python test_oradb.py --stats-only
225+
226+
# Specify a custom query for testing
227+
python test_oradb.py --query "artificial intelligence"
228+
```
229+
230+
The script will:
231+
1. Verify Oracle DB credentials in your `config.yaml` file
232+
2. Test connection to the Oracle DB
233+
3. Display the total number of chunks in each collection (PDF, Web, Repository, General Knowledge)
234+
4. Show content and metadata from the most recently inserted chunk in each collection
235+
5. Unless running with `--stats-only`, insert test data and run a sample vector search
236+
237+
Requirements:
238+
- Oracle DB credentials properly configured in `config.yaml`:
239+
```yaml
240+
ORACLE_DB_USERNAME: ADMIN
241+
ORACLE_DB_PASSWORD: your_password_here
242+
ORACLE_DB_DSN: your_connection_string_here
243+
```
244+
- The `oracledb` Python package installed
245+
209246
#### Use RAG Agent
210247
211248
To query documents using either OpenAI or a local model, run:
@@ -358,7 +395,7 @@ The system consists of several key components:
358395
1. **PDF Processor**: we use `docling` to extract and chunk text from PDF documents
359396
2. **Web Processor**: we use `trafilatura` to extract and chunk text from websites
360397
3. **GitHub Repository Processor**: we use `gitingest` to extract and chunk text from repositories
361-
4. **Vector Store**: Manages document embeddings and similarity search using `ChromaDB` and `Oracle Database 23ai`
398+
4. **Vector Store**: Manages document embeddings and similarity search using `Oracle Database 23ai` (default) or `ChromaDB` (fallback)
362399
5. **RAG Agent**: Makes intelligent decisions about query routing and response generation
363400
- OpenAI Agent: Uses `gpt-4-turbo-preview` for high-quality responses, but requires an OpenAI API key
364401
- Local Agent: Uses `Mistral-7B` as an open-source alternative
@@ -373,6 +410,50 @@ The RAG Agent flow is the following:
373410
4. If no PDF context is found OR if it's a general knowledge query, use the pre-trained LLM directly
374411
5. Fall back to a "no information" response only in edge cases.
375412

413+
## Annex: Command Line Usage
414+
415+
You can run the system from the command line using:
416+
417+
```bash
418+
python local_rag_agent.py --query "Your question here" [options]
419+
```
420+
421+
### Command Line Arguments
422+
423+
| Argument | Description | Default |
424+
| --- | --- | --- |
425+
| `--query` | The query to process | *Required* |
426+
| `--embeddings` | Select embeddings backend (`oracle` or `chromadb`) | `oracle` |
427+
| `--model` | Model to use for inference | `mistralai/Mistral-7B-Instruct-v0.2` |
428+
| `--collection` | Collection to query (PDF, Repository, Web, General) | Auto-determined |
429+
| `--use-cot` | Enable Chain of Thought reasoning | `False` |
430+
| `--store-path` | Path to ChromaDB store (if using ChromaDB) | `embeddings` |
431+
| `--skip-analysis` | Skip query analysis step | `False` |
432+
| `--verbose` | Show full content of sources | `False` |
433+
| `--quiet` | Disable verbose logging | `False` |
434+
435+
### Examples
436+
437+
Query using Oracle DB (default):
438+
```bash
439+
python local_rag_agent.py --query "How does vector search work?"
440+
```
441+
442+
Force using ChromaDB:
443+
```bash
444+
python local_rag_agent.py --query "How does vector search work?" --embeddings chromadb
445+
```
446+
447+
Query with Chain of Thought reasoning:
448+
```bash
449+
python local_rag_agent.py --query "Explain the difference between RAG and fine-tuning" --use-cot
450+
```
451+
452+
Query a specific collection:
453+
```bash
454+
python local_rag_agent.py --query "How to implement a queue?" --collection "Repository Collection"
455+
```
456+
376457
## Contributing
377458

378459
This project is open source. Please submit your contributions by forking this repository and submitting a pull request! Oracle appreciates any contributions that are made by the open source community.

agentic_rag/config_example.yaml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,10 @@
1-
HUGGING_FACE_HUB_TOKEN: your_token_here
1+
HUGGING_FACE_HUB_TOKEN: your_token_here
2+
3+
# Oracle DB Configuration
4+
ORACLE_DB_USERNAME: ADMIN
5+
ORACLE_DB_PASSWORD: your_password_here
6+
ORACLE_DB_DSN: >-
7+
(description= (retry_count=20)(retry_delay=3)
8+
(address=(protocol=tcps)(port=1522)(host=your-oracle-db-host.com))
9+
(connect_data=(service_name=your-service-name))
10+
(security=(ssl_server_dn_match=yes)))
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Oracle DB 23ai Integration
2+
3+
The Agentic RAG system now supports Oracle DB 23ai as a vector store backend, providing enhanced performance, scalability, and enterprise-grade database features.
4+
5+
## Overview
6+
7+
Oracle Database 23ai is used as the default vector storage system when available, with ChromaDB serving as a fallback option. This integration leverages Oracle's vector database capabilities for efficient semantic search and retrieval.
8+
9+
## Requirements
10+
11+
To use the Oracle DB integration, you need:
12+
13+
1. **Oracle Database 23ai**: With vector extensions enabled
14+
2. **Python Packages**:
15+
- `oracledb`: For database connectivity
16+
- `sentence-transformers`: For generating embeddings
17+
18+
## Installation
19+
20+
1. Install the required packages:
21+
22+
```bash
23+
pip install oracledb sentence-transformers
24+
```
25+
26+
2. Configure your Oracle Database connection in `config.yaml`:
27+
28+
```yaml
29+
# Oracle DB Configuration
30+
ORACLE_DB_USERNAME: ADMIN
31+
ORACLE_DB_PASSWORD: your_password_here
32+
ORACLE_DB_DSN: >-
33+
(description= (retry_count=20)(retry_delay=3)
34+
(address=(protocol=tcps)(port=1522)(host=your-oracle-db-host.com))
35+
(connect_data=(service_name=your-service-name))
36+
(security=(ssl_server_dn_match=yes)))
37+
```
38+
39+
The system will automatically look for these credentials in your `config.yaml` file. If not found, it will raise an error and fall back to ChromaDB.
40+
41+
## How It Works
42+
43+
The system automatically determines which database to use:
44+
45+
1. First tries to connect to Oracle DB 23ai
46+
2. If connection succeeds, uses Oracle for all vector operations
47+
3. If Oracle DB is unavailable, falls back to ChromaDB
48+
49+
## Database Structure
50+
51+
The Oracle DB integration creates the following tables:
52+
53+
- `PDFCollection`: Stores chunks from PDF documents
54+
- `WebCollection`: Stores chunks from web content
55+
- `RepoCollection`: Stores chunks from code repositories
56+
- `GeneralCollection`: Stores general knowledge chunks
57+
58+
Each table has the following structure:
59+
- `id`: Primary key identifier
60+
- `text`: The text content of the chunk
61+
- `metadata`: JSON string containing metadata (source, page, etc.)
62+
- `embedding`: Vector representation of the text
63+
64+
## Testing
65+
66+
You can test the Oracle DB integration using:
67+
68+
```bash
69+
python test_oradb.py
70+
```
71+
72+
Or test both systems using:
73+
74+
```bash
75+
./test_db_systems.sh
76+
```
77+
78+
## Switching Between Databases
79+
80+
You can force the system to use ChromaDB instead of Oracle DB by setting the `use_oracle_db` parameter to `False`:
81+
82+
```python
83+
agent = LocalRAGAgent(use_oracle_db=False)
84+
```
85+
86+
## Gradio Interface
87+
88+
The Gradio web interface displays which database system is active at the top of the page:
89+
90+
- Green banner: Oracle DB 23ai is active
91+
- Red banner: ChromaDB is being used (Oracle DB not available)
92+
93+
## Troubleshooting
94+
95+
If you encounter database connection issues:
96+
97+
1. Verify your Oracle DB credentials and connection string
98+
2. Check that the Oracle DB 23ai instance is running
99+
3. Ensure you have the required Python packages installed
100+
4. Check network connectivity to the database server
101+
102+
If Oracle DB connection fails, the system will automatically fall back to ChromaDB without requiring any user intervention.

0 commit comments

Comments
 (0)