Purpose: Assess repositories against agent-ready best practices and generate actionable reports.
Last Updated: 2026-03-03
AgentReady is a Python CLI tool that evaluates repositories against a comprehensive set of carefully researched attributes that make codebases more effective for AI-assisted development. It generates interactive HTML reports, version-control friendly Markdown reports, and machine-readable JSON output.
Current Status: v2.29.5 - Core assessment engine complete, most essential assessors implemented, LLM-powered learning, research report management
Self-Assessment Score: 80.0/100 (Gold) - See examples/self-assessment/
For User Documentation: See README.md for installation, usage examples, and feature tutorials.
src/agentready/
├── models/ # Data models (Repository, Attribute, Finding, Assessment)
├── services/ # Scanner orchestration and language detection
│ ├── llm_cache.py # LLM response caching (7-day TTL)
│ ├── research_loader.py # Research report loading and validation
│ └── research_formatter.py # Research report formatting utilities
├── assessors/ # Attribute assessment implementations
│ ├── base.py # BaseAssessor abstract class
│ ├── documentation.py # CLAUDE.md, README assessors
│ ├── code_quality.py # Type annotations, complexity
│ ├── testing.py # Test coverage, pre-commit hooks
│ ├── structure.py # Standard layout, gitignore
│ ├── repomix.py # Repomix configuration assessor
│ └── stub_assessors.py # Remaining assessors in development
├── learners/ # Pattern extraction and LLM enrichment
│ ├── pattern_extractor.py # Heuristic skill extraction
│ ├── skill_generator.py # SKILL.md generation
│ ├── code_sampler.py # Repository code sampling
│ ├── llm_enricher.py # Claude API integration
│ └── prompt_templates.py # LLM prompt engineering
├── reporters/ # Report generation (HTML, Markdown, JSON)
│ ├── html.py # Interactive HTML with Jinja2
│ └── markdown.py # GitHub-Flavored Markdown
├── prompts/ # LLM prompt .md templates; load_prompt(name) from loader.py
├── templates/ # Jinja2 templates
│ └── report.html.j2 # Self-contained HTML report (73KB)
└── cli/ # Click-based CLI
├── main.py # assess, research-version, generate-config commands
├── learn.py # Continuous learning loop with LLM enrichment
└── research.py # Research report management commands
Repository → Scanner → Assessors → Findings → Assessment → Reporters → Reports
↓
Language Detection
(git ls-files)
-
Tier-Based Weighting (50/30/15/5 distribution):
- Tier 1 (Essential): 50% of total score
- Tier 2 (Critical): 30% of total score
- Tier 3 (Important): 15% of total score
- Tier 4 (Advanced): 5% of total score
-
Attribute Scoring: Each attribute returns 0-100 score
-
Weighted Aggregation:
final_score = Σ(attribute_score × weight) -
Certification Levels:
- Platinum: 90-100
- Gold: 75-89
- Silver: 60-74
- Bronze: 40-59
- Needs Improvement: 0-39
# Create virtual environment
uv venv
source .venv/bin/activate
# Install dependencies
uv pip install -e .
# Install development tools
uv pip install pytest black isort ruff# Run all tests
pytest
# Run with coverage
pytest --cov=src/agentready --cov-report=html
# Run specific test file
pytest tests/unit/test_models.py -vCurrent Coverage: 37% (focused on core logic, targeting >80%)
# Pre-push linting workflow
black src/ tests/ && isort src/ tests/ && ruff check src/ tests/Enable context-free agent handoff by creating self-contained prompts in plans/ (gitignored).
Include: Requirements, implementation approach, code patterns, test guidance, dependencies
Workflow: Planning → plans/feature-name.md → GitHub issue (copy prompt) → Future agent implements
Benefits: Asynchronous development, complete context without chat history, standardized knowledge transfer
- Expand a stub assessor in
src/agentready/assessors/stub_assessors.py - Create new assessor class inheriting from
BaseAssessor - Implement required methods:
attribute_idpropertyassess(repository)methodis_applicable(repository)method (optional)
- Add tests in
tests/unit/test_assessors_*.py - Register in scanner's assessor list
Example:
class MyAssessor(BaseAssessor):
@property
def attribute_id(self) -> str:
return "my_attribute_id"
def assess(self, repository: Repository) -> Finding:
# Implement assessment logic
if condition_met:
return Finding.create_pass(self.attribute, ...)
else:
return Finding.create_fail(self.attribute, ...)Reference Implementations:
- Simple:
CLAUDEmdAssessor(file existence check) - Complex:
TypeAnnotationsAssessor(proportional scoring) - Language-aware:
TestCoverageAssessor(conditional logic)
agentready/
├── src/agentready/ # Source code
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ └── integration/ # End-to-end tests
├── examples/ # Example reports
│ └── self-assessment/ # AgentReady's own assessment
├── specs/ # Feature specifications
├── plans/ # Cold-start prompts (gitignored)
├── experiments/ # SWE-bench validation studies
├── contracts/ # Data schemas and validation rules
├── pyproject.toml # Python package configuration
├── CLAUDE.md # This file (developer guide)
├── README.md # User-facing documentation
├── BACKLOG.md # Future features and enhancements
└── GITHUB_ISSUES.md # GitHub-ready issue templates
Empirically measure Claude Code performance impact of .claude/agents/doubleagent.md using Harbor's Terminal-Bench.
The Harbor comparison feature automates A/B testing by running Terminal-Bench tasks with/without agent files, calculating deltas and statistical significance, and generating comprehensive reports (JSON, Markdown, HTML).
# Install Harbor
uv tool install harbor
# Run comparison (3 tasks, ~30-60 min)
agentready harbor compare \
-t adaptive-rejection-sampler \
-t async-http-client \
-t terminal-file-browser \
--verbose \
--open-dashboard- Success Rate: Percentage of tasks completed successfully
- Duration: Average time to complete tasks
- Statistical Significance: T-tests (p<0.05) and Cohen's d effect sizes
- Per-Task Impact: Individual task improvements/regressions
Results stored in .agentready/harbor_comparisons/ (gitignored):
- JSON: Machine-readable comparison data
- Markdown: GitHub-friendly report (commit this for PRs)
- HTML: Interactive dashboard with Chart.js visualizations
Compare:
agentready harbor compare -t task1 -t task2 [--verbose] [--open-dashboard]List comparisons:
agentready harbor listView comparison:
agentready harbor view .agentready/harbor_comparisons/comparison_latest.json| Component | Location | Purpose |
|---|---|---|
| Data Models | models/harbor.py |
HarborTaskResult, HarborRunMetrics, HarborComparison |
| Services | services/harbor/ |
HarborRunner, AgentFileToggler, ResultParser, HarborComparer |
| Reporters | reporters/ |
HarborMarkdownReporter, DashboardGenerator |
- Significance: P-value < 0.05 (t-test) AND Cohen's d effect size (Small: 0.2-0.5, Medium: 0.5-0.8, Large: ≥0.8)
- Sample Sizes: Minimum 3 tasks, Recommended 5-10, Comprehensive 20+
- User Guide:
docs/harbor-comparison-guide.md - Implementation Plan:
.claude/plans/vivid-knitting-codd.md - Harbor Docs: https://harborframework.com/docs
- Python 3.11+ (only N and N-1 versions supported)
- Click - CLI framework
- Jinja2 - HTML template engine
- Anthropic - Claude API client (for LLM enrichment)
- Pytest - Testing framework
- Black - Code formatter
- isort - Import sorter
- Ruff - Fast Python linter
- Harbor - Evaluation framework (optional, for benchmarks)
AgentReady validates dependencies before running benchmarks:
- Harbor CLI: Checked automatically before Terminal-Bench runs
- Interactive installation: Prompts user with
uv tool install harbor(orpip install harborfallback) - Opt-out: Use
--skip-preflightflag to bypass checks for advanced users - Package manager fallback: Prefers
uv, falls back topipifuvnot available - Security: Uses
safe_subprocess_run()with 5-minute timeout
Implementation:
- Module:
src/agentready/utils/preflight.py - Tests:
tests/unit/utils/test_preflight.py(100% coverage) - Integration:
src/agentready/cli/benchmark.py
Usage Examples:
# Normal usage (preflight check runs automatically)
agentready benchmark --subset smoketest
# Skip preflight (advanced users)
agentready benchmark --subset smoketest --skip-preflight- Create feature branch from
main - Implement changes with tests
- Run linters:
black . && isort . && ruff check . - Run tests:
pytest - Commit with conventional commit messages
- Push and create PR
feat: Add new assessor for dependency freshness
fix: Correct type annotation detection in Python 3.12
docs: Update CLAUDE.md with architecture details
test: Add integration test for HTML report generation
refactor: Extract common assessor logic to base class
chore: Update dependencies
- All new assessors must have unit tests
- Integration tests for new reporters
- Maintain >80% coverage for new code
- All tests must pass before merge
GitHub Actions:
- Run tests on PR
- Run linters (black, isort, ruff)
- Generate coverage report
- Run AgentReady self-assessment
- Post assessment results as PR comment
Current: Manual workflow (tests run locally before push)
- Stub Assessors: 9/31 assessors still return "not_applicable" - need implementation
- No Lock File: Intentionally excluded for library project (assessed as deliberate choice)
- Test Coverage: Currently at ~37%, targeting >80% for production readiness
- In Progress: Expand remaining stub assessors
- In Progress: Improve test coverage to >80%
- Planned: Bootstrap command (automated remediation)
- Planned: Align command (automated alignment)
- Customizable HTML themes with dark/light toggle
- Organization-wide dashboards
- Historical trend analysis
- AI-powered assessors with deeper code analysis
See BACKLOG.md for complete feature list.
When working on AgentReady:
- Read before modifying: Always read existing assessors before implementing new ones
- Follow patterns: Use
CLAUDEmdAssessorandREADMEAssessoras reference implementations - Test thoroughly: Add unit tests for all new assessors
- Maintain backwards compatibility: Don't change Assessment model without schema version bump
- Stub assessors first: Check if attribute already has stub before creating new class
- Proportional scoring: Use
calculate_proportional_score()for partial compliance - Graceful degradation: Return "skipped" if tools missing, don't crash
- Rich remediation: Provide actionable steps, tools, commands, examples, citations
Key Principles:
- Library-first architecture (no global state)
- Strategy pattern for assessors (each is independent)
- Fail gracefully (missing tools → skip, don't crash)
- User-focused (actionable remediation over theoretical guidance)
Command Reference:
- User tutorials → See
README.md - CLI help → Run
agentready --help - SWE-bench experiments → See
experiments/README.md - Research report schema → See
contracts/research-report-schema.md
CRITICAL: Proactive Documentation Agent Usage
Use the github-pages-docs agent for documentation updates after:
| Trigger | Action |
|---|---|
| New features implemented | Update user guide, developer guide, API reference, examples |
| Source-of-truth files modified | Cascade updates: CLAUDE.md → developer-guide.md, attributes.md → attributes.md |
| Bugs fixed or issues addressed | Update troubleshooting, known issues, migration guides |
| Project status changes | Update certification levels, versions, roadmap, milestones |
Documentation Sources of Truth (priority order):
- CLAUDE.md - Project guide
- RESEARCH_REPORT.md - Research report
- contracts/ - Schemas and validation
- specs/ - Feature specifications
- Source code - Actual implementation
Automation: Manual trigger via .github/workflows/update-docs.yml (workflow_dispatch)
- .github/CLAUDE_INTEGRATION.md - Dual Claude integration guide (automated + interactive)
- BACKLOG.md - Future features and enhancements
- GITHUB_ISSUES.md - GitHub-ready issue templates
- README.md - User-facing documentation
- specs/ - Feature specifications and design documents
- experiments/README.md - SWE-bench validation workflow
- examples/self-assessment/ - AgentReady's own assessment (80.0/100 Gold)
Last Updated: 2026-03-03 by Jeremy Eder AgentReady Version: 2.29.5 Self-Assessment: 80.0/100 (Gold) ✨
- ALWAYS run actionlint and fix any issues before pushing changes to GitHub Actions workflows
- All workflows must pass actionlint validation with zero errors/warnings
- Use proper shell quoting and combined redirects for efficiency