Discover and download AGENTS.md and CLAUDE.md files from GitHub repositories.
Note
For learning and inspiration. Downloaded files retain their original licenses—respect those terms.
get_repos.py— Find repos via GitHub Search APIget_agentsmd.py— Download their AGENTS.md and CLAUDE.md files
Searches recent, non-archived GitHub repos sorted by stars (default: 50,000 repos max). Default language: Python. Configurable via config.yaml.
git clone https://github.com/yourusername/github-get-agents.git
cd github-get-agents
pip3 install uv && uv syncAll settings are centralized in config.yaml. Edit this file to customize:
- Repository search: Language, date ranges, star bins, max repos
- API settings: Timeouts, retries, backoff strategies
- Download settings: Delays, output directories
Default values work well for most use cases. CLI arguments override config values when specified.
Create a Personal Access Token with repo and user:read:user permissions:
export GITHUB_TOKEN="ghp_..."uv run python get_repos.py # Use defaults from config.yaml
uv run python get_repos.py -n 1000 # Limit to 1000 repos
uv run python get_repos.py --dry-run # Preview query partitions without fetchingOutput: repos_YYYY-MM-DD_HHMMSS.jsonl
uv run python get_agentsmd.py # Auto-detect newest repos file
uv run python get_agentsmd.py -w 8 # Use 8 parallel workers (faster)
uv run python get_agentsmd.py -r # Resume interrupted download
uv run python get_agentsmd.py -r -w 8 # Resume with parallel workersOutput: agents_md_YYYY-MM-DD_HHMMSS/org/repo/AGENTS.md + download_results.jsonl
| Issue | Solution |
|---|---|
ERROR: set GITHUB_TOKEN |
export GITHUB_TOKEN="..." |
403 Forbidden |
Regenerate token with repo and user:read:user scopes |
| Rate limit | Scripts auto-wait; run during off-peak hours for large jobs |
| Empty repos.jsonl | Adjust filters in get_repos.py or verify token works |
Verify token:
curl -H "Authorization: Bearer $GITHUB_TOKEN" https://api.github.com/user | jq -r .loginGitHub Search API returns max 1,000 results per query. To get more:
Method 1: Edit star bins in config.yaml to partition queries:
star_bins:
- [10000, null]
- [5000, 9999] # Uncomment for 5k-10k stars
- [2000, 4999] # Uncomment for 2k-5k stars
# ... more bins available in configMethod 2: Edit date ranges or other filters in config.yaml
Method 3: Use GitHub on BigQuery for exhaustive queries
| Resource | Limit | Notes |
|---|---|---|
| Search API | 30 req/min | Used by get_repos.py |
| File downloads | N/A | 0.1s delay in get_agentsmd.py |
Both scripts handle rate limits with automatic retry and backoff.
MIT License