Bootstrap new developers into open source projects by helping them learn large codebases through historical GitHub/GitLab pull requests.
Open Bootstrap fetches historical PRs, analyzes them using LLM classification, and helps you select the right onboarding exercises for new developers based on difficulty, suitability, and learning value. Developers familiarize themselves with a project by implementing past merged pull requests in a time-rewound fork.
- Python 3.11 or higher
- uv package manager
- API credentials (see Setup section)
- Clone the repository:
git clone <repository-url>
cd openbootstrap
-
Install dependencies using
uv
:uv sync
-
Set up environment variables:
cp .env.example .env # Edit .env with your actual credentials
-
(Optional) Run tests to verify setup:
uv run pytest
The tool requires several API credentials:
- GitHub Token: Personal access token with
repo
scope (for GitHub repositories) - GitLab Token: Personal access token with
read_api
scope (for GitLab repositories) - Supabase: Database credentials for storing PR/MR data
- LLM API: OpenAI or Anthropic API key for classification
- Google Sheets (optional): Service account credentials for export
Note: You only need the token for the platform(s) you're using. At least one platform token is required.
See .env.example
for the complete list of required environment variables.
Before using the tool, you need to set up the Supabase database schema:
-
Get your DATABASE_URL from Supabase:
- Dashboard → Project Settings → Database → Connection pooling
- Copy the URI connection string
- Add to
.env
:DATABASE_URL=postgresql://...
-
Run the setup script:
uv run python setup/setup_database.py
-
Verify the setup:
uv run python setup/setup_database.py --verify
The schema supports a two-phase workflow:
- Phase 1 (Index): Store basic PR metadata (cheap, reliable)
- Phase 2 (Enrichment): Add files, diffs, and linked issues (expensive, can fail per-PR)
See setup/README.md for detailed documentation.
openbootstrap/
├── backend/ # FastAPI REST API
├── frontend/ # React + TypeScript web UI
├── fetchers/ # GitHub/GitLab API clients
├── storage/ # Supabase database client
├── classifier/ # LLM classification logic
├── models/ # Data models (Pydantic)
├── utils/ # Logging, configuration, helpers
├── setup/ # Database setup scripts
└── main.py # CLI entrypoint (fetch/classify)
Fetch and classify PRs/MRs from GitHub or GitLab repositories:
# Short format (GitHub only - backward compatible)
uv run python main.py fetch facebook/react
# Full URL format (also works)
uv run python main.py fetch https://github.com/facebook/react
# Fetch with limits and options
uv run python main.py fetch facebook/react --limit 500
uv run python main.py fetch microsoft/vscode --limit 5000
uv run python main.py fetch facebook/react --no-enrich
uv run python main.py fetch facebook/react --enrich-only
# Classify and export
uv run python main.py classify facebook/react
uv run python main.py export facebook/react
Note: GitLab repositories must use full URL format.
# Fetch 1000 merged MRs (default) from GitLab
uv run python main.py fetch https://gitlab.com/gitlab-org/gitlab
# Fetch with limits and options
uv run python main.py fetch https://gitlab.com/gitlab-org/gitlab --limit 500
uv run python main.py fetch https://gitlab.com/gitlab-org/gitlab --no-enrich
uv run python main.py fetch https://gitlab.com/gitlab-org/gitlab --enrich-only
# Classify and export
uv run python main.py classify https://gitlab.com/gitlab-org/gitlab
uv run python main.py export https://gitlab.com/gitlab-org/gitlab
# Enrich ALL pending/failed PRs/MRs across all repositories (both GitHub and GitLab)
uv run python main.py fetch --enrich-only
How it works:
- Phase 1 (Index): Fetches PR/MR metadata from GitHub/GitLab and stores in database
- Phase 2 (Enrichment): Adds files, diffs, linked issues, and comments/notes to each PR/MR
- Idempotent: Safe to run multiple times, skips already-enriched items
- Resumable: If interrupted, just run again to continue from where it left off
- Rate limiting: Automatically waits when API rate limit is hit
- GitHub: 5000 requests/hour
- GitLab: 2000 requests/minute (more generous!)
- Default limit: 1000 PRs/MRs (automatically handles pagination)
Flags:
--limit N
: Fetch up to N PRs/MRs (default: 1000)--no-enrich
: Skip Phase 2, only fetch and index--enrich-only
: Skip Phase 1, only enrich PRs/MRs already in database (pending/failed). If repository is omitted, enriches all repositories.
# Process multiple repositories (mix of GitHub and GitLab)
uv run python main.py fetch \
facebook/react \
https://github.com/microsoft/vscode \
https://gitlab.com/gitlab-org/gitlab
# Or from a file (one repository per line)
uv run python main.py fetch --repo-list repos.txt
repos.txt example:
facebook/react
https://github.com/microsoft/vscode
https://gitlab.com/gitlab-org/gitlab
https://github.com/vercel/next.js
After fetching and classifying PRs, you can browse them using the web interface:
Start the FastAPI backend server:
# Method 1: Direct Python execution
python backend/server.py
# Method 2: Using uvicorn directly
uvicorn backend.app:app --reload
# Method 3: Python module
python -m backend.server
# Custom host/port
python backend/server.py --host 0.0.0.0 --port 8080
# Production mode (no auto-reload)
python backend/server.py --no-reload
The API will be available at:
- API endpoint: http://127.0.0.1:8000
- Interactive docs: http://127.0.0.1:8000/docs
In a separate terminal, start the React frontend:
cd frontend
npm install # First time only
npm run dev
The web UI will be available at http://localhost:5173
Features:
- Browse classified PRs with pagination
- Filter by repository, difficulty, suitability
- View PR details: files changed, linked issues, comments
- Inspect exact LLM prompts used for classification
- Mark PRs as favorites for learning exercises
- Persistent repository tracking in database
- Progress tracking for learning exercises
- Export learning paths to various formats
🚧 Currently in development - See milestones.md for implementation progress.
The project uses pytest for testing:
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run specific test file
uv run pytest tests/test_config.py
# Run specific test
uv run pytest tests/test_config.py::TestConfig::test_valid_config
backend/
- FastAPI REST API serverfrontend/
- React + TypeScript web UImodels/
- Pydantic data models for validationutils/
- Configuration loading and logging utilitiesfetchers/
- API clients for GitHub and GitLabstorage/
- Supabase database clientclassifier/
- LLM classification logictests/
- Test suite