Terence is a Python package that makes it easy to scan and analyze GitHub repositories. It simplifies the GitHub API and processes the repo contents into a simple flat dictionary that can be accessed by file path.
pip install terenceCreate a personal access token at: https://github.com/settings/tokens
- New token (classic)
- Only permission required: repo -> public_repo
- Additional permissions are optional
from terence import Terence
# Initialize a new Terence instance
terence = Terence()
# Authenticate Terence
terence.auth("ghp_your_token_here")
# Scan a repository
terence.scan_repository("https://github.com/user/repo_name")
# Access repo contents
print(f"Found {len(terence.results)} files")
for file_path, content in terence.results.items():
print(f"{file_path}: {len(content)} characters")You must authenticate Terence with your GitHub API token before scanning any repository
terence = Terence()
terence.auth("ghp_your_token_here")# Scan entire repository
terence.scan_repository("https://github.com/user/repo_name")You also have the option to scan specific file types by providing the extension in a list argument
Extension can be prepended with "." but not required (py vs .py)
# Scan only Python files
terence.scan_repository("https://github.com/user/repo_name", ["py"])
# Scan multiple file types
terence.scan_repository("https://github.com/user/repo_name", ["py", "js", "html"])You can scan the contents of a specific branch rather than the default main/master branch
# Scan a specific branch name
terence.branch("develop")
terence.scan_repository("https://github.com/user/repo_name")
# Scan a specific tag
terence.branch("v2.0.0")
terence.scan_repository("https://github.com/user/repo_name")
# Scan a specific commit (can chain methods)
terence.branch("abc123def456").scan_repository("https://github.com/user/repo_name")To reset to the default branch, simply clear the results and scan again
# Reset to default branch
terence.clear_results()
terence.scan_repository("https://github.com/user/repo_name")Once a scan is performed, the repository's file contents are stored in a flat dictionary in terence.results.
results = terence.results
# List all files:
for path in results.keys():
print(f" - {path}")
# Print the first 200 characters of a specific file
if "frontend/app/page.tsx" in results:
print(results["frontend/app/page.tsx"][:200])
# Search content across files
for file_path, content in results.items():
if "def main" in content:
print(f"Found 'def main' in: {file_path})Results is a flat dictionary with each key being the path to the file including the file name and the value is the raw contents of the file
terence.results = {
'frontend/app/index.html': '<!DOCTYPE html>\n<html>\n<head>\n<meta charset="utf-8">\n</head></html>...',
'frontend/app/styles/globals.css': 'body {\n font-family: Arial, sans-serif;\n...}\nh1 {\n color: #333;\n}'
}terence.scan_repository("https://github.com/user/repo_name")
repo_info = terence.get_repo_info()
repo_info = {
'owner': 'user',
'repo': 'repo_name',
'url': 'https://github.com/user/repo_name'
}GitHub API allows for 5000 requests per hour per authenticated API token or 60 for unauthenticated.
Terence automatically flags a RateLimitError if rate limit is too low to make a new repository scan request.
rate = terence.get_rate_limit()
rate = {
'remaining': 4102,
'limit': 5000, # GitHub limit
# Date format yyyy-mm-dd hr:min:sec+00:00 timezone
'reset': datetime.datetime(2025, 12, 4, 18, 30, 0, tzinfo=datetime.timezone.utc)
}# Clear results but stay authenticated
terence.clear_results()
# Clear everything (deauthenticate)
terence.clear_all()from terence import Terence, RateLimitException
terence = Terence().auth("ghp_your_token_here")
try:
terence.scan_repository("https://github.com/user/repo_name")
print(f"Success! Found {len(terence.results)} files")
except RateLimitException as e:
print(f"Rate limit reached: {e}")
# Wait until reset time or use different token
except ValueError as e:
print(f"Invalid input: {e}")
# Check URL format or extension list
except Exception as e:
print(f"Error: {e}")
# Handle authentication, repo not found, etc.By default, Terence scans these file types:
- Python:
.py - JavaScript/TypeScript:
.js,.jsx,.ts,.tsx - Web:
.html,.htm,.css,.scss,.sass,.vue,.svelte - Java:
.java - C/C++:
.c,.cpp,.h,.hpp,.cc - Other:
.go,.rs,.rb,.php,.swift,.kt,.cs
The following directories are automatically excluded:
node_modules/,.git/,venv/,env/,.venv/__pycache__/,dist/,build/.next/,.nuxt/,target/,bin/,obj/test/,tests/,.pytest_cache/,coverage/
Raised when GitHub API rate limit is too low (< 10 requests remaining).
from terence import RateLimitException
try:
terence.scan_repository(url)
except RateLimitException as e:
print(f"Rate limit reached: {e}")Raised when:
- Invalid GitHub URL format
- Extension not in allowed extensions list
Raised for:
- Not authenticated
- Invalid GitHub token
- Repository not found (or private)
- Other GitHub API errors
git clone https://github.com/yourusername/terence.git
cd terence
pip install -e ".[dev]"# Run all tests
pytest tests/test_client.py -v
# Run specific test
pytest tests/test_client.py::TestTerence::test_auth -v
# Run with coverage
pytest tests/test_client.py --cov=terence --cov-report=html- Python 3.7+
- PyGithub >= 2.1.1
- python-dotenv >= 1.0.0
MIT License - see LICENSE file for details
Contributions are welcome! Feel free to fork and submit a pull request.
For any questions or concerns, please reach out to me at louieyin6@gmail.com
Created by Louie Yin (GarfieldFluffJr)
