Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 91 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,77 @@

A Python package for building RAG (Retrieval-Augmented Generation) applications using PDFs, ChromaDB, and Ollama.

## Installation
## Project Structure

```
.
├── pdf_rag
│ ├── document_processor.py
│ ├── __init__.py
│ ├── llm_interface.py
│ ├── main.py
│ └── vector_store.py
├── README.md
├── requirements.txt
├── setup.py
├── test_package.py
└── test.py

```bash
pip install -e .
2 directories, 10 files
```

## Usage
## Installation

1. **Create and activate a virtual environment:**

```bash
# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On Windows
venv\Scripts\activate
# On Unix or MacOS
source venv/bin/activate
```

2. **Install the package:**

```bash
pip install -e .
```

3. **Install Ollama on Linux:**

Follow the steps below to install Ollama on a Linux system.

```bash
# Download the Ollama installer
curl -fsSL https://ollama.com/install.sh | sh

# Verify the installation
ollama --version
```

4. **Download models in Ollama:**

To download specific models such as `llama3` and `deepseek-R1`, use the following commands:

```bash
# Download the llama3 model
ollama pull llama3

# Download the deepseek-R1 model
ollama pull deepseek-R1
```

## base Usage

```python
from pdf_rag import PDFRAGApplication

# Initialize the application
rag = PDFRAGApplication()
rag = PDFRAGApplication(model_name= "deepseek-r1")

# Load a PDF
rag.load_pdf("your_document.pdf")
Expand All @@ -23,3 +81,31 @@ rag.load_pdf("your_document.pdf")
response = rag.query("What is this document about?")
print(response)
```

## Testing

Run the `test.py` script to see how the module works with ChromaDB:

```python
import chromadb
chroma_client = chromadb.Client()

# switch `create_collection` to `get_or_create_collection` to avoid creating a new collection every time
collection = chroma_client.get_or_create_collection(name="my_collection")

# switch `add` to `upsert` to avoid adding the same documents every time
collection.upsert(
documents=[
"This is a document about pineapple",
"This is a document about oranges"
],
ids=["id1", "id2"]
)

results = collection.query(
query_texts=["This is a query document about hawaii"], # Chroma will embed this for you
n_results=2 # how many results to return
)

print(results)
```
Binary file removed pdf_rag/__pycache__/__init__.cpython-310.pyc
Binary file not shown.
Binary file not shown.
Binary file removed pdf_rag/__pycache__/llm_interface.cpython-310.pyc
Binary file not shown.
Binary file removed pdf_rag/__pycache__/main.cpython-310.pyc
Binary file not shown.
Binary file removed pdf_rag/__pycache__/vector_store.cpython-310.pyc
Binary file not shown.
2 changes: 0 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@
"chromadb",
"requests"
],
author="Your Name",
author_email="your.email@example.com",
description="A RAG application for PDF documents using ChromaDB and Ollama",
long_description=open("README.md").read(),
long_description_content_type="text/markdown",
Expand Down
19 changes: 11 additions & 8 deletions test_package.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
## Usage
from pdf_rag import PDFRAGApplication

# Initialize the application
rag = PDFRAGApplication( model_name="llama3")
def test_pdf_rag():
# Initialize the application
rag = PDFRAGApplication(model_name= "deepseek-r1")

# Load a PDF
rag.load_pdf("resume.pdf")
# Load a PDF
rag.load_pdf("your_document.pdf")

# Query the system
response = rag.query("What is this document about?")
print(response)
# Query the system
response = rag.query("What is this document about?")
print(response)

if __name__ == "__main__":
test_pdf_rag()