Skip to content

Repository-aware AI assistant that understands your GitHub projects and answers questions about code, issues, documentation, and development history.

License

Notifications You must be signed in to change notification settings

sourabh-virdi/chatbot-for-opensource-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 AI Chatbot for Open-Source Repos

A sophisticated repository-aware AI assistant that understands your GitHub projects and answers questions about code, issues, documentation, and development history. Built with modern technologies and featuring a clean black & white UI, real-time progress tracking, and comprehensive error handling.

Features

  • Repository Intelligence: Deep understanding of code, issues, commits, and documentation
  • Context-Aware Chat: Get relevant answers with confidence scores and source citations
  • Clean UI Design: Modern black & white theme with excellent readability
  • Real-Time Progress: Watch repository indexing with live updates and detailed status
  • Clear Chat Functionality: Reset conversations with keyboard shortcut (Ctrl+K/Cmd+K)
  • Multiple AI Providers: OpenAI GPT, Hugging Face transformers, or AWS Bedrock
  • Vector Search: ChromaDB for fast, semantic code and document retrieval
  • Smart Error Handling: Detailed error messages with troubleshooting guidance in UI
  • Modern Interface: Responsive Next.js with TypeScript and excellent UX
  • Easy Setup: Automated installation with comprehensive SSL issue resolution
  • Type Safety: Full TypeScript implementation with enhanced validation

Architecture

graph TB
    A[Next.js Frontend] --> B[FastAPI Backend]
    B --> C[GitHub Crawler]
    B --> D[Vector Store ChromaDB]
    B --> E[Embedding Service]
    B --> F[LLM Service]
    
    C --> G[GitHub API]
    E --> H[Hugging Face Models]
    F --> I[OpenAI GPT]
    F --> J[AWS Bedrock]
    F --> K[Hugging Face]
    
    D --> L[Document Storage]
    E --> L
Loading

Quick Start

Prerequisites

  • Python 3.8+ (for backend)
  • Node.js 16+ (for frontend)
  • GitHub Personal Access Token (Get one here)
  • OpenAI API Key (recommended) or other LLM provider

Automated Setup

Run the setup script to install everything automatically:

python setup.py

This will:

  • Check all prerequisites
  • Create Python virtual environment
  • Install backend dependencies (including pydantic-settings)
  • Install frontend dependencies
  • Generate configuration files
  • Create startup scripts

Configuration

  1. Edit the .env file (created by setup):

    # Required: GitHub access
    GITHUB_TOKEN=your_github_token_here
    
    # Recommended: OpenAI for best responses
    LLM_PROVIDER=openai
    OPENAI_API_KEY=your_openai_api_key_here
    
    # Or use free Hugging Face (basic responses)
    LLM_PROVIDER=huggingface
    
    # Or use AWS Bedrock (with optional session token for temporary credentials)
    LLM_PROVIDER=bedrock
    AWS_ACCESS_KEY_ID=your_aws_key
    AWS_SECRET_ACCESS_KEY=your_aws_secret
    AWS_SESSION_TOKEN=your_session_token  # Optional, for temporary credentials
    
    # SSL bypass for development (if needed)
    PYTHONHTTPSVERIFY=0
    HF_HUB_DISABLE_SSL_VERIFY=1
    CURL_CA_BUNDLE=
    REQUESTS_CA_BUNDLE=
  2. Configure API keys:

Start the Application

Windows:

# Start both services
start-all.bat

# Or start individually
start-backend.bat    # http://localhost:8000
start-frontend.bat   # http://localhost:3000

Mac/Linux:

# Start individually
./start-backend.sh   # http://localhost:8000
./start-frontend.sh  # http://localhost:3000

Usage

1. Index a Repository

  • Open http://localhost:3000
  • Go to "Repositories" tab
  • Enter a GitHub repository (e.g., microsoft/vscode)
  • Click "Start Indexing"
  • Watch real-time progress with detailed status updates

2. Chat with Your Code

  • Switch to "Chat" tab
  • Select your indexed repository
  • Ask questions like:
    • "What does this project do?"
    • "How do I contribute to this repository?"
    • "What are the main components?"
    • "Show me recent issues about authentication"
    • "How does the user login work?"

3. Advanced Features

  • Clear Chat: Use "Clear Chat" button or press Ctrl+K (Cmd+K) to reset conversation
  • View API Details: Click debug info on responses for confidence scores and processing details
  • Error Troubleshooting: Get detailed error messages with solutions directly in the UI
  • Source Citations: See which documents informed each response with clickable links
  • Multiple Repositories: Index and switch between different projects seamlessly

UI Features

Modern Black & White Design

  • Clean Aesthetic: Professional black and white theme throughout
  • High Contrast: Excellent readability with proper text contrast
  • Consistent Icons: All UI elements follow the monochrome design
  • Responsive Layout: Works perfectly on desktop and mobile
  • Smart Focus States: Clear visual feedback for all interactive elements

Enhanced User Experience

  • Real-time Status: Live progress tracking during repository indexing
  • Error Display: Helpful error messages shown directly in chat interface
  • Loading States: Smooth loading indicators matching the theme
  • Keyboard Shortcuts: Quick actions with intuitive key combinations

Development

Manual Setup (Alternative)

If the automated setup doesn't work, you can set up manually:

  1. Backend Setup:

    cd backend
    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # Mac/Linux
    source venv/bin/activate
    
    pip install -r requirements.txt
    pip install pydantic-settings==2.10.1 openai transformers torch sentence-transformers
  2. Frontend Setup:

    cd frontend
    npm install
    echo "NEXT_PUBLIC_API_URL=http://localhost:8000" > .env.local
  3. Environment Configuration:

    cp env.example .env
    # Edit .env with your API keys

Running for Development

Backend:

cd backend
source venv/bin/activate  # or venv\Scripts\activate on Windows
python main.py

Frontend:

cd frontend
npm run dev

Tech Stack

Backend

  • FastAPI - Modern Python web framework with async support
  • ChromaDB - Vector database for embeddings with metadata sanitization
  • Sentence Transformers - Hugging Face embeddings (all-MiniLM-L6-v2)
  • OpenAI SDK - GPT integration with latest API
  • PyGithub - GitHub API client with rate limiting
  • Uvicorn - ASGI server with high performance
  • Pydantic v2 - Data validation with pydantic-settings

Frontend

  • Next.js 14 - React framework with TypeScript
  • TypeScript - Type safety throughout
  • Tailwind CSS - Modern styling with black/white theme
  • React Markdown - Message rendering with syntax highlighting
  • React Hot Toast - User notifications
  • Heroicons - Consistent icon library

AI & ML

  • OpenAI GPT-3.5/4 - Chat completions (recommended for quality)
  • Hugging Face Transformers - Free embeddings and LLM with timeout handling
  • AWS Bedrock - Enterprise AI models with session token support
  • ChromaDB - Vector similarity search with improved scoring

API Documentation

Once the backend is running, access the interactive API docs:

Key Endpoints

POST /chat              # Send chat messages with repository context
POST /crawl             # Start repository indexing with real-time status
GET  /crawl/status/{owner}/{repo}  # Check indexing progress (10s timeout)
GET  /repositories      # List indexed repositories
DELETE /repositories/{owner}/{repo}  # Remove repository

Configuration Options

LLM Providers

Provider Cost Quality Setup Difficulty Features
OpenAI $$ ⭐⭐⭐⭐⭐ Easy Production-ready, fast
Hugging Face Free ⭐⭐⭐ Easy Offline, timeout handling
AWS Bedrock $$$ ⭐⭐⭐⭐⭐ Medium Enterprise, session tokens

Environment Variables

# LLM Configuration
LLM_PROVIDER=openai|huggingface|bedrock
OPENAI_API_KEY=sk-...
LLM_MODEL_ID=gpt-3.5-turbo
LLM_TEMPERATURE=0.1
LLM_MAX_TOKENS=1000

# Embedding Configuration  
EMBEDDING_MODEL=huggingface
HUGGINGFACE_MODEL=all-MiniLM-L6-v2
HUGGINGFACE_USE_LOCAL=true

# Database
VECTOR_DB_TYPE=chromadb
CHROMADB_PATH=./data/chromadb

# GitHub
GITHUB_TOKEN=ghp_...

# AWS (with session token support)
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_SESSION_TOKEN=...  # Optional for temporary credentials

# SSL Configuration (for development)
PYTHONHTTPSVERIFY=0
HF_HUB_DISABLE_SSL_VERIFY=1
CURL_CA_BUNDLE=
REQUESTS_CA_BUNDLE=
HF_HUB_DISABLE_SYMLINKS_WARNING=1

# Server
DEBUG=false
LOG_LEVEL=INFO

Getting Help

  1. Check the logs: Backend logs show detailed error information
  2. UI Error Messages: The chat interface shows helpful error details
  3. API Documentation: Visit http://localhost:8000/docs
  4. Issues: Create an issue in this repository
  5. Clear Chat: Use Ctrl+K (Cmd+K) to reset if chat gets stuck

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

About

Repository-aware AI assistant that understands your GitHub projects and answers questions about code, issues, documentation, and development history.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published