Skip to content

LLM-powered repository intelligence engine for contextual code understanding, retrieval-augmented QA, and conversational exploration of codebases.

Notifications You must be signed in to change notification settings

garg-khushi/repo-intelligence-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Repo Intelligence Engine

An LLM-powered system for understanding, exploring, and querying software repositories using retrieval-augmented generation (RAG).

This project enables users to ask natural language questions about a codebase and receive context-grounded answers by combining document ingestion, embeddings, vector search, and conversational reasoning.


πŸš€ What This Project Does

  • Clones a public GitHub repository
  • Parses source files and documentation
  • Splits code into semantic chunks
  • Generates vector embeddings
  • Stores embeddings in a vector database
  • Enables conversational Q&A grounded in repository context

This allows questions such as:

  • "What does this project do?"
  • "How is authentication handled?"
  • "Where is the main training loop defined?"
  • "Explain the data flow in this codebase."

🧠 System Architecture

GitHub Repo
↑
↑
File Loader & Parser
↑
Text Chunking
↑
Embedding Model
↑
Vector Store (Chroma)
↑
Retriever
↑
LLM (Conversational QA)
↑
Streamlit UI

πŸ› οΈ Tech Stack

Layer Technology
Language Python
UI Streamlit
LLM OpenAI GPT models
Framework LangChain
Embeddings OpenAI Embeddings
Vector Store Chroma
Version Control Git

πŸ“‚ Repository Structure

.
β”œβ”€β”€ app_updated.py              # Main Streamlit application
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ Projreport_OpenAI.docx    # Project report and documentation
└── data/                       # Vector store persistence (generated)

πŸ”¬οΈ What's Implemented

  • βœ… GitHub repository cloning
  • βœ… File loading and preprocessing
  • βœ… Chunk-based embedding generation
  • βœ… Vector search using Chroma
  • βœ… Conversational retrieval QA
  • βœ… Interactive Streamlit interface

⚠️ Scope & Limitations

  • Designed for demonstration and experimentation
  • Not optimized for very large repositories
  • Embedding persistence is local
  • Security hardening and sandboxing are out of scope

This project intentionally prioritizes clarity and correctness over production scaling.


😧 Planned Enhancements

  • Per-repository isolated vector stores
  • Language-aware file parsing
  • Support for multiple embedding models
  • Repository summarization and dependency graphs
  • Deployment-ready API layer (FastAPI)

πŸ‘¨β€πŸ’» How to Run Locally

git clone https://github.com/garg-khushi/repo-intelligence-engine.git
cd repo-intelligence-engine
pip install -r requirements.txt
streamlit run app_updated.py

Create a .env file with:

OPENAI_API_KEY=your_api_key_here

🎯 Why This Project Matters

This project demonstrates:

  • Practical use of LLMs beyond chat
  • Retrieval-augmented generation pipelines
  • Vector databases and semantic search
  • Developer tooling and code intelligence systems

It reflects real-world patterns used in modern AI-powered developer platforms.


πŸ“ƒ License

MIT

About

LLM-powered repository intelligence engine for contextual code understanding, retrieval-augmented QA, and conversational exploration of codebases.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages