PR Resolver

An intelligent, scalable PR analysis platform powered by RAG (Retrieval Augmented Generation).

PR Resolver uses semantic search + LLM reasoning to analyze pull requests by finding contextually similar code changes across your repository history. Deployed as containerized microservices on Kubernetes for enterprise-grade scalability.

🎯 Purpose

Webhook-driven analysis of PRs from GitHub, Bitbucket, Azure DevOps, and other version control systems. For each PR:

Fetches diffs from the repository
Chunks & indexes code changes with semantic embeddings
Searches for similar historical changes
Analyzes with Gemini LLM for intelligent insights and recommendations

🏗️ Architecture

VCS Webhooks (GitHub, Bitbucket, Azure DevOps)
        ↓
    Webhook Service (multiple replicas)
        ↓
    Repo Ingestor (clones & diffs)
        ↓
    DiffChunker (index-ready format)
        ↓
    ChromaDB Vector Store (persistent)
        ↓
    RAG Retriever + Gemini LLM
        ↓
    PR Analysis & Insights

📦 Components

Core Services

services/webhook/ - Receives events from version control platforms
services/repo_ingestor/ - Clones repositories and generates diffs
services/rag/ - Semantic search and LLM analysis

RAG Module (`services/rag/`)

initializer.py - Creates & caches expensive resources:
- Ollama embeddings model
- ChromaDB vector store
- Google Gemini LLM
db_repo_ingestor.py - Fetches diffs and converts to index-ready format:
- RepoIngestorClient - API client for repo ingestor service
- DiffChunker - Splits diffs into overlapping chunks with metadata & IDs
retriever.py - Queries the vector store:
- Semantic search for similar diffs
- Automatic query embedding
- Collection statistics

Common Models

common/models/filediff.py - Diff data structures
common/models/commit.py - Commit metadata

🚀 Quick Start

Prerequisites

Python 3.9+
Docker & Docker Compose
Kubernetes cluster (for production)
Google API key (for Gemini LLM)

Local Setup

Clone the repository

git clone https://github.com/adsdemaybe/pr_resolver.git
cd pr_resolver

Create virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies

pip install -r requirements.txt
pip install langchain-google-genai

Set environment variables

export GOOGLE_API_KEY="your-api-key"
export OLLAMA_API_URL="http://localhost:11434"

Start services (using Docker Compose)

cd services/repo_ingestor
docker-compose up -d

Initialize RAG

from services.rag.initializer import ChromaDBRAGStore
from services.rag.retriever import ChromaDBRetriever

# Create & cache expensive resources
store = ChromaDBRAGStore()

# Use the vector store for queries
retriever = ChromaDBRetriever(vector_store=store.vector_store)

📚 Usage Examples

Ingest Repository Diffs

from services.rag.db_repo_ingestor import RepoIngestorClient, DiffChunker

# Fetch diffs from repo_ingestor API
client = RepoIngestorClient("http://localhost:8000")
diffs = await client.preview_diffs(
    repo_url="https://github.com/myorg/myrepo.git",
    branch="main",
    max_commits=50
)

# Convert to index-ready format
chunker = DiffChunker(chunk_size=1000, chunk_overlap=100)
texts, metadatas, ids = chunker.diffs_to_index_format(diffs)

Add Diffs to Vector Store

from services.rag.initializer import ChromaDBRAGStore

store = ChromaDBRAGStore()
num_added = await store.add(diffs)
print(f"Added {num_added} diffs to ChromaDB")

Search for Similar Diffs

retriever = ChromaDBRetriever(vector_store=store.vector_store)

results = await retriever.search(
    query="fixed bug in authentication module",
    k=5,
    similarity_threshold=0.7
)

for doc, score in results:
    print(f"Score: {score}")
    print(f"File: {doc.metadata['file_path']}")
    print(f"Commit: {doc.metadata['commit_hash']}")

Get Collection Stats

stats = await retriever.get_stats()
print(f"Documents indexed: {stats['document_count']}")

🐳 Docker Deployment

Build Services

# Repo Ingestor
cd services/repo_ingestor
docker build -t pr-resolver/repo-ingestor:latest .

# Webhook Service (if available)
cd services/webhook
docker build -t pr-resolver/webhook:latest .

Run with Docker Compose

docker-compose -f docker-compose.yml up -d

☸️ Kubernetes Deployment

Deploy to Cluster

kubectl apply -f k8s/repo-ingestor-deployment.yaml
kubectl apply -f k8s/webhook-deployment.yaml
kubectl apply -f k8s/rag-service.yaml

Scale Services

# Scale repo ingestor to 3 replicas
kubectl scale deployment repo-ingestor --replicas=3

# Scale webhook listener to 5 replicas
kubectl scale deployment webhook --replicas=5

⚙️ Configuration

Environment Variables

Variable	Default	Description
`GOOGLE_API_KEY`	Required	Google Gemini API key
`OLLAMA_API_URL`	`http://localhost:11434`	Ollama embeddings service URL
`CHROMA_DB_PATH`	`./chroma_db`	ChromaDB persistence directory
`REPO_INGESTOR_URL`	`http://localhost:8000`	Repo ingestor service URL

RAG Configuration

store = ChromaDBRAGStore(
    collection_name="pr_resolver_diffs",
    persist_directory="./chroma_db",
    embedding_model="nomic-embed-text",
    embedding_api_url="http://localhost:11434",
    llm_model="gemini-pro",
    google_api_key="your-key"
)

📊 Performance Tuning

Chunking Strategy

# Smaller chunks = more precise search, higher latency
chunker = DiffChunker(chunk_size=500, chunk_overlap=50)

# Larger chunks = faster search, less precision
chunker = DiffChunker(chunk_size=2000, chunk_overlap=200)

Search Parameters

# Higher k = more results to analyze
results = await retriever.search(query, k=10)

# Higher threshold = stricter relevance filtering
results = await retriever.search(query, k=5, similarity_threshold=0.8)

🔄 Webhook Integration

Configure webhooks in your VCS:

GitHub: Repository Settings → Webhooks → Add webhook
- Payload URL: https://your-domain/webhooks/github
Bitbucket: Repository Settings → Webhooks → Create trigger
- URL: https://your-domain/webhooks/bitbucket
Azure DevOps: Project Settings → Service hooks → Create subscription
- URL: https://your-domain/webhooks/azure-devops

🧪 Testing

# Run tests
pytest tests/

# Run with coverage
pytest --cov=services tests/

📝 License

MIT

🤝 Contributing

Contributions welcome! Please open an issue or submit a PR.

📧 Contact

For questions or support, reach out to the development team.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
common/models		common/models
scripts		scripts
services		services
.gitignore		.gitignore
README.md		README.md

adsdemaybe/pr_resolver

Folders and files

Latest commit

History

Repository files navigation