🤖 Codebase-Companion

Chat with any public GitHub repository using Retrieval-Augmented Generation (RAG)

📌 Overview

Codebase Companion is a full‑stack AI app that lets you chat with any public GitHub repository. It clones a repo, chunks & embeds the content, stores vectors in Astra DB, and answers questions using a RAG pipeline powered by Hugging Face embeddings and Groq (Llama 3 8B).

✨ Features

📚 Multi‑Repository Support – Index multiple repos; switch chat sessions instantly.
🧠 Intelligent Q&A – Ask about logic, structure, or purpose in natural language.
🔁 Streaming Responses – Word‑by‑word streaming for a ChatGPT‑like feel.
📎 Source Citing – Each answer lists the code files used as context.
🧩 Modern RAG Pipeline – Accurate, grounded answers using retrieve‑then‑read.
🔍 Code Location Search – Surface exact files/paths relevant to your query.
⚡ Fast Inference – Groq Llama 3 8B for low‑latency responses.

🛠️ Tech Stack

Frontend: React, Vite, Tailwind CSS
Backend: Node.js, Express.js
AI & Data Processing:
- Embedding Model: BAAI/bge-small-en-v1.5 (Hugging Face)
- Vector Database: Astra DB (DataStax)
- LLM: Groq – Llama 3 8B
Tools: simple-git, cors, dotenv, concurrently

⚙️ How It Works (RAG Pipeline)

Phase 1 — Index

Input: User submits a public GitHub repo URL.
Clone & Parse: Backend clones the repo and walks the file tree.
Chunking: Code/docs are split into semantic chunks.
Embedding: Chunks embedded via BAAI/bge-small-en-v1.5.
Storage: Vectors + metadata saved to Astra DB (collections created dynamically).

Phase 2 — Query

Semantic Retrieval: Top‑k chunks fetched from Astra DB.
Context Assembly: Relevant snippets + paths composed.
Answer Generation: Groq Llama 3 8B produces the final, cited answer.

🧪 Local Development

Prerequisites

Node.js v18+
npm
Accounts/keys for Hugging Face, Groq, and Astra DB

Clone

git clone https://github.com/kartik0905/codebase-companion.git
cd codebase-companion

Install (Monorepo)

If using a single repo with shared root scripts:

npm install

Install (Split: client / server)

# Frontend
cd client && npm install
# Backend
cd ../server && npm install

Environment Variables

Create a .env in the backend root (server/.env if split; project root if monorepo) with:

# Hugging Face
HF_TOKEN="hf_..."  # used for BAAI/bge-small-en-v1.5

# Groq
GROQ_API_KEY="gsk_..."  # Llama 3 8B

# Astra DB (DataStax)
ASTRA_DB_APPLICATION_TOKEN="AstraCS:..."
ASTRA_DB_API_ENDPOINT="https://..."  # REST endpoint for your DB keyspace
ASTRA_DB_COLLECTION="codebase_chunks"  # app may create collections dynamically

Keep keys private. Do not commit .env.

🚀 Run the App

All‑in‑one (concurrently)

npm run dev
# Backend: http://localhost:3001
# Frontend: http://localhost:5173

Split terminals

Terminal 1 — Backend

cd server
npm run dev
# http://localhost:3001

Terminal 2 — Frontend

cd client
npm run dev
# http://localhost:5173

📁 Folder Structure (example)

codebase-companion/
├─ client/
│  ├─ src/
│  └─ package.json
├─ server/
│  ├─ routes/
│  ├─ services/
│  ├─ rag/
│  │  ├─ chunking.js
│  │  ├─ embed.js
│  │  └─ retrieve.js
│  ├─ server.js
│  └─ package.json
├─ README.md
└─ ...

🔌 API (quick peek)

POST /api/index
Body: { repoUrl: string } → clones, chunks, embeds, and stores vectors.

POST /api/chat
Body: { repoId: string, question: string } → streams an answer + cites files.

Endpoint names are placeholders; adjust to match your actual routes.

🧭 Tips

Ignore large/binary folders (.git, node_modules, dist, images) during indexing.
Tune chunk size/overlap for your languages to maximize retrieval quality.
Persist per‑repo metadata so users can switch sessions quickly.

🗺️ Roadmap / Future Improvements

🔐 User Authentication to associate repos with users
🔒 Private Repos via GitHub OAuth
☁️ Cloud Deploy (Vercel + Render/Fly/railway)
📈 Analytics (query quality, hit‑rate, latency)
🧪 Eval Suite for retrieval precision/recall

🙌 Acknowledgments

Built with ❤️ by Kartik Garg

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 Codebase-Companion

Chat with any public GitHub repository using Retrieval-Augmented Generation (RAG)

📌 Overview

✨ Features

🛠️ Tech Stack

⚙️ How It Works (RAG Pipeline)

Phase 1 — Index

Phase 2 — Query

🧪 Local Development

Prerequisites

Clone

Install (Monorepo)

Install (Split: client / server)

Environment Variables

🚀 Run the App

All‑in‑one (concurrently)

Split terminals

📁 Folder Structure (example)

🔌 API (quick peek)

🧭 Tips

🗺️ Roadmap / Future Improvements

🙌 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

kartik0905/Codebase-Companion

Folders and files

Latest commit

History

Repository files navigation

🤖 Codebase-Companion

Chat with any public GitHub repository using Retrieval-Augmented Generation (RAG)

📌 Overview

✨ Features

🛠️ Tech Stack

⚙️ How It Works (RAG Pipeline)

Phase 1 — Index

Phase 2 — Query

🧪 Local Development

Prerequisites

Clone

Install (Monorepo)

Install (Split: client / server)

Environment Variables

🚀 Run the App

All‑in‑one (concurrently)

Split terminals

📁 Folder Structure (example)

🔌 API (quick peek)

🧭 Tips

🗺️ Roadmap / Future Improvements

🙌 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages