A powerful Repository Context Packager CLI tool that analyzes local git repositories and creates comprehensive text files containing repository content optimized for sharing with Large Language Models (LLMs).
We welcome contributions! Please see our Contributing Guide for details on:
- Setting up the development environment with uv
- Running tests and code quality checks
- Submitting pull requests
- Code style guidelines
- Project structure and development workflow
# Install from PyPI
pip install repo-contextr
# Or use pipx (recommended for CLI tools)
pipx install repo-contextr
# Package your current repository
repo-contextr .
# Save to file
repo-contextr . -o my-project-context.txtWhen developers want to get help from ChatGPT, Claude, or other LLMs about their code, they often struggle with how to share their codebase effectively. repo-contextr solves this by automatically collecting and formatting repository content into a single, well-structured text file that provides rich context to LLMs, enabling them to give much better assistance with your code.
- Git Integration: Extracts commit SHA, branch, author, and date information
- Project Structure: Generates a clear directory tree visualization
- Token Counting: Estimate and visualize token distribution across your codebase (NEW!)
- File Content Packaging: Includes file contents with syntax highlighting
- Smart File Discovery: Recursively scans directories with configurable filtering
- Binary File Detection: Automatically skips binary files
- Recent Changes Mode: Focus on files modified in the last 7 days with git timestamps
- File Type Statistics: Shows breakdown of file types with counts (e.g., .py (8), .md (3))
- Size Analytics: Displays largest file and average file size information
- Git Timestamps: Shows last modified dates for files using git history
- Pattern Matching: Include/exclude files using glob patterns
- Error Handling: Gracefully handles permission errors and provides helpful messages
- Flexible Output: Write to stdout or save to a file
# Install globally with pipx (recommended)
pipx install repo-contextr
# Or install with pip
pip install repo-contextr
# Verify installation
repo-contextr --version# Clone the repository
git clone https://github.com/dharamghevariya/repo-contextr.git
cd repo-contextr
# Install in development mode
pip install -e .# Package current directory
repo-contextr .
# Package specific directory
repo-contextr /path/to/your/project
# Package specific files
repo-contextr src/main.py src/utils.py
# Save output to file
repo-contextr . -o my-project-context.txt
# Include only Python files
repo-contextr . --include "*.py"
# Include only recent changes (last 7 days)
repo-contextr . --recent
# Combine filters
repo-contextr . --recent --include "*.py" -o recent-python.txt
# Show token counts in structure and summary
repo-contextr . --token-count-tree -o context-with-tokens.txt
# Filter tree to show only high-token files
repo-contextr . --token-count-tree --token-threshold 1000
# Just get total token count
repo-contextr . --tokens| Option | Short | Description | Example |
|---|---|---|---|
paths |
- | File or directory paths to analyze | repo-contextr src/ docs/ |
--output |
-o |
Output file path (default: stdout) | -o context.txt |
--include |
- | Pattern to include files (glob pattern) | --include "*.py" |
--recent |
-r |
Only files modified in last 7 days | --recent |
--token-count-tree |
- | Show token counts in structure | --token-count-tree |
--token-threshold |
- | Minimum token count to include | --token-threshold 1000 |
--tokens |
- | Show estimated total token count | --tokens |
--version |
-v |
Show version and exit | -v |
--help |
-h |
Show help message | -h |
# Get help with a Python project
repo-contextr . --include "*.py" -o python-context.txt
# Share recent changes for code review
repo-contextr . --recent -o recent-changes.txt
# Package documentation files
repo-contextr . --include "*.md" -o docs-context.txt
# Full project context for LLM assistance
repo-contextr . -o full-project.txt
# Focus on backend code only
repo-contextr backend/ --include "*.{py,sql,yaml}" -o backend-context.txt
# Estimate token usage for LLM context planning
repo-contextr . --tokens
# Identify token-heavy files for optimization
repo-contextr . --token-count-tree --token-threshold 500 -o high-token-files.txtThe tool generates a structured text file with the following sections:
Absolute path to the repository being analyzed
- Commit SHA
- Current branch
- Last commit author
- Last commit date
Directory tree showing the organization of included files
With Token Counting (when using --token-count-tree or --tokens):
- Shows token estimates for each file and directory
- Displays total tokens for the entire project
- Helps identify token-heavy files and directories
- Supports filtering with
--token-thresholdto focus on high-token files
Example with tokens:
## Structure
**Total Tokens:** 11,738
contextr/ (11,738 tokens)
├── commands/ (1,416 tokens)
│ ├── package.py (726 tokens)
│ └── token_commands.py (690 tokens)
├── formatters/ (3,542 tokens)
│ ├── report_formatter.py (2,157 tokens)
│ └── token_tree_formatter.py (1,385 tokens)
└── cli.py (1,068 tokens)
Each file's content with:
- Clear file path headers with git timestamps (when available)
- Appropriate syntax highlighting language tags
- Truncation notices for large files
- Shows only files modified in the last 7 days with git timestamps
- Includes file contents and statistics for those files
- Adds a summary line indicating how many recent files were found
- Total number of files processed
- Total lines of code
- Recent files count (last 7 days)
- Estimated tokens (when token counting is enabled)
- File type breakdown with counts
- Largest file with line count
- Average file size in lines
repo-contextr includes built-in token estimation to help you optimize content for LLM context windows. This feature uses the industry-standard approximation of ~4 characters per token.
When working with LLMs like ChatGPT, Claude, or other AI assistants, you're limited by context windows (e.g., 8K, 32K, 128K tokens). Understanding your repository's token distribution helps you:
- Stay within context limits: Know if your repo fits in the LLM's context window
- Optimize file selection: Identify which files consume the most tokens
- Manage API costs: Estimate token usage for cost planning
- Make strategic decisions: Decide what to include, summarize, or exclude
Get just the total estimated tokens:
repo-contextr . --tokens
# Output: Estimated tokens: 24,515 (across 40 files)See token counts for each file and directory in the structure:
repo-contextr . --token-count-tree -o context.txtThis adds token annotations to your regular repository output:
## Structure
**Total Tokens:** 11,738
src/ (11,738 tokens)
├── core/ (7,402 tokens)
│ ├── engine.py (4,250 tokens)
│ └── utils.py (3,152 tokens)
└── tests/ (4,336 tokens)
└── test_engine.py (4,336 tokens)
Focus on token-heavy files only:
repo-contextr . --token-count-tree --token-threshold 1000This shows only files and directories with ≥1000 tokens, helping you identify optimization targets.
- Context Planning: Determine if your project fits in GPT-4's 128K context
- File Prioritization: Identify which files to include for maximum value
- Cost Estimation: Estimate API costs before processing
- Optimization: Find files that could be summarized or split
- Debugging: Understand why you're hitting context limits
The tool uses ~4 characters per token, which is accurate for:
- English code and comments
- Common programming languages (Python, JavaScript, Java, etc.)
- Documentation in English
May be less accurate for:
- Non-English text
- Heavily compressed/minified code
- Special characters and unicode
When you run repo-contextr . --include "*.py", the output looks like this:
# Repository Context
## File System Location
/home/user/my-project
## Git Info
- Commit: a1b2c3d4e5f6789...
- Branch: main
- Author: John Doe <john@example.com>
- Date: Wed Sep 25 14:30:15 2025 -0400
## Structure
```
├── src/
│ ├── main.py
│ └── utils/
│ └── helpers.py
├── pyproject.toml
└── README.md
```
## Recent Changes
### File: src/main.py (Modified: 2025-09-25 14:30:22)
```python
#!/usr/bin/env python3
"""Main entry point for the application."""
def main():
print("Hello, World!")
if __name__ == "__main__":
main()
```
## File Contents
### File: src/utils/helpers.py (Modified: 2025-09-24 10:15:33)
```python
"""Utility functions for the application."""
def format_output(data):
"""Format data for display."""
return str(data)
```
## Summary
- Total files: 2
- Total lines: 12
- Recent files (last 7 days): 1
- Estimated tokens: 3,142
- File types: .py (2)
- Largest file: src/utils/helpers.py (8 lines)
- Average file size: 6 linesThe tool includes most text files but automatically excludes:
.git,.svn,.hg(version control)__pycache__,.pytest_cache(Python cache)node_modules,.npm(Node.js).vscode,.idea(IDE directories)build,dist,target(build directories).env,venv,.venv(virtual environments)
- Binary files: Automatically detected and skipped
- Large files: Files larger than 16KB are truncated with notice
- Permission errors: Skipped with warning message to stderr
- Text files: All readable text files are included by default
Use the --include option to filter files:
repo-contextr . --include "*.py" # Only Python files
repo-contextr . --include "*.{js,ts}" # JavaScript and TypeScript
repo-contextr . --include "*.md" # Only Markdown files
repo-contextr . --include "src/**/*.py" # Python files in src/The tool handles errors gracefully:
| Error Type | Behavior | Example |
|---|---|---|
| Permission errors | Skipped with warning | Warning: Permission denied: /restricted/file.txt |
| Binary files | Automatically skipped | .exe, .jpg, .pdf files ignored |
| Large files | Truncated with notice | [File truncated - original size: 25KB] |
| Invalid paths | Clear error messages | Error: Path does not exist: /invalid/path |
| Non-git repositories | Works fine | Shows "Not a git repository" in output |
| Network issues | Graceful fallback | Git info shows as unavailable |
This project uses pytest for comprehensive testing with full coverage reporting and detailed code coverage analysis.
Quick Start:
# Run all tests
uv run pytest
# Run with coverage report
uv run pytest --cov=src --cov-report=term-missing
# Generate interactive HTML coverage report
uv run pytest --cov=src --cov-report=html
start htmlcov/index.htmlAdvanced Testing Features:
Run specific tests instead of the entire suite:
# Run a single test function
uv run pytest tests/unit/test_token_counter.py::TestEstimateTokens::test_estimate_tokens_empty_string -v
# Run all tests in a class
uv run pytest tests/unit/test_token_counter.py::TestEstimateTokens -v
# Run specific test file
uv run pytest tests/unit/test_token_counter.py -v
# Run tests matching a pattern
uv run pytest -k "token" -vTrack which code is tested and identify missing test cases:
# Coverage with missing lines highlighted
uv run pytest --cov=src --cov-report=term-missing
# Generate HTML coverage report (interactive, visual)
uv run pytest --cov=src --cov-report=html
start htmlcov/index.html
# Coverage for specific module
uv run pytest tests/unit/test_token_counter.py --cov=src.contextr.statistics.token_counter --cov-report=term-missingCoverage Status:
- file_reader.py: 96% coverage
- file_stats.py: 94% coverage
- token_counter.py: 95% coverage
- Core modules: 93%+ coverage
- Total: 104 comprehensive tests
Understanding Coverage Reports:
- Terminal Report: Shows percentage and missing line numbers
- HTML Report: Interactive, color-coded line-by-line view
- 🟢 Green = Covered by tests
- 🔴 Red = Not covered (write tests for these!)
- ⚪ Gray = Non-executable (comments, blank lines)
# Show print statements in tests
uv run pytest -s tests/unit/test_token_counter.py
# Stop on first failure
uv run pytest -x
# Run only failed tests from last run
uv run pytest --lf
# Very verbose output
uv run pytest -vvTesting Documentation:
- TESTING.md - Comprehensive testing guide
- COVERAGE.md - Code coverage guide
Test Structure:
tests/
├── conftest.py # Shared fixtures (git repos, temp dirs, mock files)
├── test_smoke.py # Basic smoke tests
├── unit/ # Unit tests for individual modules
│ ├── test_config.py # Configuration and TOML tests
│ ├── test_file_reader.py # File reading and processing tests
│ ├── test_file_stats.py # File statistics tests
│ └── test_token_counter.py # Token counting tests
├── integration/ # End-to-end workflow tests
└── fixtures/ # Test data and samples
Available Fixtures:
temp_dir- Temporary directory for isolated testssample_git_repo- Initialized git repository with sample filesnon_git_dir- Non-git directory for testing fallback behaviormock_files_dir- Directory with various file types (Python, JS, MD, binary)empty_dir- Empty directoryrecent_files_repo- Git repo with recent commits for timestamp testingsample_python_file- Sample Python file with functions and classes
Continuous Integration: Tests run automatically on push/PR via GitHub Actions on Ubuntu, Windows, and macOS.
This project uses Ruff for both code formatting and linting, which provides fast and comprehensive code quality checks.
Running the formatter manually:
uv run ruff format .Running the linter:
uv run ruff check .Auto-fix linting issues:
uv run ruff check --fix .Pre-commit hooks: The project uses pre-commit hooks that automatically format and check your code before each commit. To install the hooks:
uv run pre-commit installOnce installed, Ruff will automatically format your code and run linting checks whenever you commit changes.
Running all checks (like CI does):
# Linting
uv run ruff check .
uv run ruff format --check .
# Type checking
uv run mypy src
# Tests with coverage
uv run pytest --cov=src --cov-report=term-missingPerfect for these scenarios:
- AI Assistance: Get better help from ChatGPT, Claude, or GitHub Copilot
- LLM Context Optimization: Estimate and manage token usage to stay within context limits
- Code Reviews: Share complete project context with team members
- Documentation: Create comprehensive project snapshots
- Onboarding: Help new team members understand project structure
- Debugging: Share complete context when asking for help
- Learning: Analyze and understand other projects' structure
- Cost Management: Estimate API costs for LLM processing before submission
This project is licensed under the MIT License. See the LICENSE file for details.
- PyPI Package: https://pypi.org/project/repo-contextr/
- GitHub Repository: https://github.com/dharamghevariya/repo-contextr
- Issue Tracker: https://github.com/dharamghevariya/repo-contextr/issues
- Contributing Guide: CONTRIBUTING.md
- Documentation: This README