RAG-based GitHub Repo Analysis Platform
Analyse any public GitHub repository with LLM-powered chat and advanced semantic search.
Untitled.video.-.Made.with.Clipchamp.2.mp4
As a participant in open-source competitions and project exhibitions (EPICS, university projects), I often struggled to deeply understand large codebases—especially when onboarding new repositories from group members or exploring unfamiliar open-source projects. Sifting through thousands of files, dependencies, and scattered documentation was tedious and overwhelming, making it hard to answer even basic questions like "Where is X implemented?" or "How does this module work?"
I needed a platform that would let me:
- Instantly chat with any GitHub repo to ask questions about code, architecture, or logic.
- Quickly visualize and explore repo structure, file contents, and metadata.
- Perform semantic code search (not just by filename/text).
- Support multiple users and projects securely for my team and in competitions.
I independently designed and built gitRAG—an end-to-end, multi-tenant platform that ingests any public GitHub repo, chunks and indexes its code using embeddings and vector search, and enables users to interactively chat, search, and analyse codebases using a modern LLM (via LangChain and OpenAI API).
- Built secure, scalable backend using FastAPI, PostgreSQL (Aiven), PineconeDB, and LangChain.
- Developed a modern React frontend with hierarchical file explorer, real-time AI chat, and repo analytics.
- Integrated Google/GitHub OAuth2 for authentication, and per-user encrypted API key management for privacy.
- Engineered ingestion pipelines to chunk, embed, and index 50MB+ codebases with 10,000+ files.
- Tested and deployed the platform on multiple real-world repos for open-source events and university project groups.
- Significantly reduced onboarding time for new repositories—now get context, explanations, and code Q&A in seconds.
- Enabled my team and myself to confidently tackle larger, more complex projects in hackathons and coursework.
- gitRAG is now a robust, reusable tool for anyone needing rapid understanding of unfamiliar codebases.
- LLM-powered code chat: Ask questions about repo structure, functions, or files—get contextual, AI-driven answers.
- Semantic code search: Find relevant code snippets using meaning, not just keywords.
- Hierarchical file explorer: Browse and preview the full repo tree with metadata and analytics.
- Multi-user & multi-repo support: Secure, per-user data isolation with Google/GitHub OAuth2.
- Repo analytics: Visualize language breakdown, file types, contributors, and more.
- Encrypted API key management: User API keys are encrypted and never exposed.
- Blazing fast: Sub-second query responses (vector search and retrieval).
- Modern UI: Built with React, TailwindCSS, and Three.js (for 3D hero effect).
- Frontend: React.js, TailwindCSS, Vite, Three.js
- Backend: FastAPI (Python), LangChain, PostgreSQL (Aiven), PineconeDB
- AI/Vector Search: OpenAI API, PineconeDB, LangChain
- Auth: Google OAuth2, GitHub OAuth2
- Integrations: GitHub API (repo fetching, metadata), Node.js (utility scripts)





- Login with Google or GitHub OAuth2 (secure, per-user).
- Paste any public GitHub repo URL and your OpenAI API key (encrypted).
- Ingestion:
- Fetches repo files via GitHub API
- Chunks code using custom logic (by file type/size)
- Generates vector embeddings (LangChain + OpenAI API)
- Stores chunks and metadata in PineconeDB and PostgreSQL
- Analysis & Chat:
- Use AI chat to ask any question about the repo (“What does X function do?” “Show me auth logic”)
- Semantic search finds and retrieves the most relevant code chunks
- LLM (via LangChain) generates contextual, accurate answers using retrieved code
- Explore:
- Hierarchical explorer shows real file tree, lets you preview content and metadata
- Repo analytics panel for high-level insights

- Hackathons/open-source events: Instantly understand any team repo or competition project.
- University coursework: Quickly onboard and analyze group project submissions.
- Personal learning: Explore popular open-source projects by chatting and searching their code.
- Team code reviews: Get instant explanations and context for PRs and legacy code.