A searchable data store for GitHub PR data from Prebid repositories with custom tagging and AI-generated summaries.
# Install & setup
uv sync
cp .env.example .env # Add GITHUB_TOKEN
uv run alembic upgrade head
# CLI
uv run ghactivity --help
uv run ghactivity github rate-limit --all # Check rate limits
uv run ghactivity sync pr owner/repo 1234 # Sync single PR
uv run ghactivity sync repo owner/repo --since 2024-10-01 # Bulk sync
uv run ghactivity sync all --since 2024-10-01 # Multi-repo sync
# Development
uv run mypy src/ tests/ # Type check
uv run ruff check src/ # Lint
uv run pytest # Test (573+ tests)
uv run pre-commit run --all-files # All quality checks| Document | Description |
|---|---|
| Contributing | Quality standards, code style, PR process |
| Architecture | Project structure, tech stack, module overview |
| Database | Schema, tables, migrations, usage examples |
| Data Model | PR fields, sync behavior, tagging systems |
| Development | Setup, commands, configuration, troubleshooting |
| Testing | Testing strategy, infrastructure, patterns, coverage |
| Roadmap | Implementation phases and future plans |
Implemented:
- Pull Request data from 9 Prebid repositories
- Custom user tags via CLI
- Agent-generated classifications (
classify_tags) and summaries (ai_summary) - Rate limit monitoring with proactive tracking
- Request pacing with token bucket algorithm and priority queue scheduler
- Single and bulk PR ingestion pipelines with 2-week grace period for merged PRs
- Multi-repository sync orchestration for all tracked Prebid repositories
- Batch commit resilience to prevent data loss during sync failures
- CLI commands:
ghactivity sync pr,ghactivity sync repo, andghactivity sync all
Out of Scope (Future):
- GitHub Issues
- Webhooks / real-time sync
- Web UI
| Component | Technology |
|---|---|
| Runtime | Python 3.12+ |
| Database | SQLite + SQLAlchemy 2.0 + aiosqlite |
| Validation | Pydantic 2.0 |
| GitHub API | githubkit |
| CLI | typer + rich |
| Migrations | alembic |
| Package Manager | uv |
Async GitHub API client with integrated pacing and rate limit tracking. Every API method automatically applies pacing delays before requests and feeds response headers back to the pacer for adaptive throttling. Provides both eager (list_pull_requests) and lazy (iter_pull_requests) iteration for efficient pagination.
RateLimitMonitor: Tracks rate limit state from response headers (zero API cost)RateLimitStatus: State machine (HEALTHY → WARNING → CRITICAL → EXHAUSTED)
RequestPacer: Token bucket algorithm with adaptive throttling (integrated into GitHubClient)RequestScheduler: Priority queue with semaphore-controlled concurrency for bulk operationsBatchExecutor: Coordinates batch operations with progress tracking
PRIngestionService: Single PR fetch → transform → store pipelineBulkPRIngestionService: Multi-PR import using lazy iteration for efficient date filteringMultiRepoOrchestrator: Coordinates syncing all tracked Prebid repositoriesCommitManager: Batch commit boundaries for database resiliencePRIngestionResult/BulkIngestionResult/MultiRepoSyncResult: Structured results
RepositoryRepository: CRUD for Repository table with get_or_createPullRequestRepository: CRUD for PullRequest table with frozen state handling
| Variable | Required | Default | Description |
|---|---|---|---|
GITHUB_TOKEN |
Yes | - | GitHub personal access token |
DATABASE_URL |
No | sqlite+aiosqlite:///./github_activity.db |
Database connection |
LOG_LEVEL |
No | INFO |
Logging level |
ENVIRONMENT |
No | development |
App environment |
SYNC__MERGE_GRACE_PERIOD_DAYS |
No | 14 |
Days after merge before PR is frozen |
SYNC__COMMIT_BATCH_SIZE |
No | 25 |
PRs to commit per batch (limits data loss on failure) |
This project enforces strict type safety through mypy strict mode and custom quality gates.
- All public functions must have type annotations
- All class attributes must have type annotations
- Use
from __future__ import annotationsfor forward references
| Use Case | Allowed | Alternative |
|---|---|---|
| JSON/dict serialization | dict[str, Any] |
N/A |
| Pydantic validators | v: Any |
N/A |
| ORM factories | from_orm(obj: Any) |
N/A |
| Log context | **context: Any |
N/A |
| Lazy initialization | Use T | None with TYPE_CHECKING |
|
| Avoiding generics | Use TYPE_CHECKING block | |
| Third-party without types | Document reason, add stub or TYPE_CHECKING |
- Must include error code:
# type: ignore[arg-type] - Must have justification comment if not obvious
- Prefer fixing the underlying type issue
- Tracked: Pre-commit reports count (target: <=5)
- Centralize in shared utility functions (e.g.,
cli/common.py) - Document why the rule doesn't apply
- Tracked: Pre-commit reports count (target: <=3)
Pre-commit hooks track:
type: ignorecount in src/ (target: <=5)noqacount in src/ (target: <=3)- Blocks new
SomeName = Anytype aliases (use TYPE_CHECKING instead)