An LLM-powered system for understanding, exploring, and querying software repositories using retrieval-augmented generation (RAG).
This project enables users to ask natural language questions about a codebase and receive context-grounded answers by combining document ingestion, embeddings, vector search, and conversational reasoning.
- Clones a public GitHub repository
- Parses source files and documentation
- Splits code into semantic chunks
- Generates vector embeddings
- Stores embeddings in a vector database
- Enables conversational Q&A grounded in repository context
This allows questions such as:
- "What does this project do?"
- "How is authentication handled?"
- "Where is the main training loop defined?"
- "Explain the data flow in this codebase."
GitHub Repo
β
β
File Loader & Parser
β
Text Chunking
β
Embedding Model
β
Vector Store (Chroma)
β
Retriever
β
LLM (Conversational QA)
β
Streamlit UI
| Layer | Technology |
|---|---|
| Language | Python |
| UI | Streamlit |
| LLM | OpenAI GPT models |
| Framework | LangChain |
| Embeddings | OpenAI Embeddings |
| Vector Store | Chroma |
| Version Control | Git |
.
βββ app_updated.py # Main Streamlit application
βββ requirements.txt # Python dependencies
βββ Projreport_OpenAI.docx # Project report and documentation
βββ data/ # Vector store persistence (generated)
- β GitHub repository cloning
- β File loading and preprocessing
- β Chunk-based embedding generation
- β Vector search using Chroma
- β Conversational retrieval QA
- β Interactive Streamlit interface
- Designed for demonstration and experimentation
- Not optimized for very large repositories
- Embedding persistence is local
- Security hardening and sandboxing are out of scope
This project intentionally prioritizes clarity and correctness over production scaling.
- Per-repository isolated vector stores
- Language-aware file parsing
- Support for multiple embedding models
- Repository summarization and dependency graphs
- Deployment-ready API layer (FastAPI)
git clone https://github.com/garg-khushi/repo-intelligence-engine.git
cd repo-intelligence-engine
pip install -r requirements.txt
streamlit run app_updated.pyCreate a .env file with:
OPENAI_API_KEY=your_api_key_here
This project demonstrates:
- Practical use of LLMs beyond chat
- Retrieval-augmented generation pipelines
- Vector databases and semantic search
- Developer tooling and code intelligence systems
It reflects real-world patterns used in modern AI-powered developer platforms.
MIT