An intelligent, scalable PR analysis platform powered by RAG (Retrieval Augmented Generation).
PR Resolver uses semantic search + LLM reasoning to analyze pull requests by finding contextually similar code changes across your repository history. Deployed as containerized microservices on Kubernetes for enterprise-grade scalability.
Webhook-driven analysis of PRs from GitHub, Bitbucket, Azure DevOps, and other version control systems. For each PR:
- Fetches diffs from the repository
- Chunks & indexes code changes with semantic embeddings
- Searches for similar historical changes
- Analyzes with Gemini LLM for intelligent insights and recommendations
VCS Webhooks (GitHub, Bitbucket, Azure DevOps)
β
Webhook Service (multiple replicas)
β
Repo Ingestor (clones & diffs)
β
DiffChunker (index-ready format)
β
ChromaDB Vector Store (persistent)
β
RAG Retriever + Gemini LLM
β
PR Analysis & Insights
services/webhook/- Receives events from version control platformsservices/repo_ingestor/- Clones repositories and generates diffsservices/rag/- Semantic search and LLM analysis
-
initializer.py- Creates & caches expensive resources:- Ollama embeddings model
- ChromaDB vector store
- Google Gemini LLM
-
db_repo_ingestor.py- Fetches diffs and converts to index-ready format:RepoIngestorClient- API client for repo ingestor serviceDiffChunker- Splits diffs into overlapping chunks with metadata & IDs
-
retriever.py- Queries the vector store:- Semantic search for similar diffs
- Automatic query embedding
- Collection statistics
common/models/filediff.py- Diff data structurescommon/models/commit.py- Commit metadata
- Python 3.9+
- Docker & Docker Compose
- Kubernetes cluster (for production)
- Google API key (for Gemini LLM)
-
Clone the repository
git clone https://github.com/adsdemaybe/pr_resolver.git cd pr_resolver -
Create virtual environment
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt pip install langchain-google-genai
-
Set environment variables
export GOOGLE_API_KEY="your-api-key" export OLLAMA_API_URL="http://localhost:11434"
-
Start services (using Docker Compose)
cd services/repo_ingestor docker-compose up -d -
Initialize RAG
from services.rag.initializer import ChromaDBRAGStore from services.rag.retriever import ChromaDBRetriever # Create & cache expensive resources store = ChromaDBRAGStore() # Use the vector store for queries retriever = ChromaDBRetriever(vector_store=store.vector_store)
from services.rag.db_repo_ingestor import RepoIngestorClient, DiffChunker
# Fetch diffs from repo_ingestor API
client = RepoIngestorClient("http://localhost:8000")
diffs = await client.preview_diffs(
repo_url="https://github.com/myorg/myrepo.git",
branch="main",
max_commits=50
)
# Convert to index-ready format
chunker = DiffChunker(chunk_size=1000, chunk_overlap=100)
texts, metadatas, ids = chunker.diffs_to_index_format(diffs)from services.rag.initializer import ChromaDBRAGStore
store = ChromaDBRAGStore()
num_added = await store.add(diffs)
print(f"Added {num_added} diffs to ChromaDB")retriever = ChromaDBRetriever(vector_store=store.vector_store)
results = await retriever.search(
query="fixed bug in authentication module",
k=5,
similarity_threshold=0.7
)
for doc, score in results:
print(f"Score: {score}")
print(f"File: {doc.metadata['file_path']}")
print(f"Commit: {doc.metadata['commit_hash']}")stats = await retriever.get_stats()
print(f"Documents indexed: {stats['document_count']}")# Repo Ingestor
cd services/repo_ingestor
docker build -t pr-resolver/repo-ingestor:latest .
# Webhook Service (if available)
cd services/webhook
docker build -t pr-resolver/webhook:latest .docker-compose -f docker-compose.yml up -dkubectl apply -f k8s/repo-ingestor-deployment.yaml
kubectl apply -f k8s/webhook-deployment.yaml
kubectl apply -f k8s/rag-service.yaml# Scale repo ingestor to 3 replicas
kubectl scale deployment repo-ingestor --replicas=3
# Scale webhook listener to 5 replicas
kubectl scale deployment webhook --replicas=5| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY |
Required | Google Gemini API key |
OLLAMA_API_URL |
http://localhost:11434 |
Ollama embeddings service URL |
CHROMA_DB_PATH |
./chroma_db |
ChromaDB persistence directory |
REPO_INGESTOR_URL |
http://localhost:8000 |
Repo ingestor service URL |
store = ChromaDBRAGStore(
collection_name="pr_resolver_diffs",
persist_directory="./chroma_db",
embedding_model="nomic-embed-text",
embedding_api_url="http://localhost:11434",
llm_model="gemini-pro",
google_api_key="your-key"
)# Smaller chunks = more precise search, higher latency
chunker = DiffChunker(chunk_size=500, chunk_overlap=50)
# Larger chunks = faster search, less precision
chunker = DiffChunker(chunk_size=2000, chunk_overlap=200)# Higher k = more results to analyze
results = await retriever.search(query, k=10)
# Higher threshold = stricter relevance filtering
results = await retriever.search(query, k=5, similarity_threshold=0.8)Configure webhooks in your VCS:
-
GitHub: Repository Settings β Webhooks β Add webhook
- Payload URL:
https://your-domain/webhooks/github
- Payload URL:
-
Bitbucket: Repository Settings β Webhooks β Create trigger
- URL:
https://your-domain/webhooks/bitbucket
- URL:
-
Azure DevOps: Project Settings β Service hooks β Create subscription
- URL:
https://your-domain/webhooks/azure-devops
- URL:
# Run tests
pytest tests/
# Run with coverage
pytest --cov=services tests/MIT
Contributions welcome! Please open an issue or submit a PR.
For questions or support, reach out to the development team.