chore(release): 2.10.0 [skip ci]

semantic-release-bot · jeremyeder · commit 1d32bc7b8b6e · 2025-12-09T16:18:57.000-05:00
* disable attestations for Test PyPI to avoid conflict ([ambient-code#155](https://github.com/jeremyeder/agentready/issues/155)) ([a33e3cd](a33e3cd)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish) * leaderboard workflow and SSH URL support ([ambient-code#147](https://github.com/jeremyeder/agentready/issues/147)) ([de28cd0](de28cd0)) * resolve 45 test failures across CLI, services, and assessors ([#4](#4)) ([3405142](3405142)), closes [ambient-code#178](https://github.com/jeremyeder/agentready/issues/178) [ambient-code#178](https://github.com/jeremyeder/agentready/issues/178) * resolve broken links and workflow failures ([ambient-code#160](https://github.com/jeremyeder/agentready/issues/160)) ([fbf5cf7](fbf5cf7)) * skip PR comments for external forks to prevent permission errors ([ambient-code#163](https://github.com/jeremyeder/agentready/issues/163)) ([2a29fb8](2a29fb8)) * add ambient-code/agentready to leaderboard ([ambient-code#148](https://github.com/jeremyeder/agentready/issues/148)) ([621152e](621152e)) * add quay/quay to leaderboard ([ambient-code#162](https://github.com/jeremyeder/agentready/issues/162)) ([d6e8df0](d6e8df0)) * Add weekly research update skill and automation ([ambient-code#145](https://github.com/jeremyeder/agentready/issues/145)) ([7ba17a6](7ba17a6)) * automate PyPI publishing with trusted publishing (OIDC) ([ambient-code#154](https://github.com/jeremyeder/agentready/issues/154)) ([71f4632](71f4632)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish) * implement lazy loading for heavy CLI commands ([ambient-code#151](https://github.com/jeremyeder/agentready/issues/151)) ([6a7cd4e](6a7cd4e))
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,30 +1,26 @@
-# [2.15.0](https://github.com/ambient-code/agentready/compare/v2.14.1...v2.15.0) (2025-12-09)
+# [2.10.0](https://github.com/jeremyeder/agentready/compare/v2.9.0...v2.10.0) (2025-12-08)
 
 
 ### Bug Fixes
 
-* resolve all test suite failures - achieve zero failures ([#180](https://github.com/ambient-code/agentready/issues/180)) ([990fa2d](https://github.com/ambient-code/agentready/commit/990fa2d4725842df60af151d1ba058cd43a90d3c)), closes [#148](https://github.com/ambient-code/agentready/issues/148) [#147](https://github.com/ambient-code/agentready/issues/147) [#145](https://github.com/ambient-code/agentready/issues/145)
-* resolve YAML syntax error in update-docs workflow and add actionlint ([#173](https://github.com/ambient-code/agentready/issues/173)) ([97b06af](https://github.com/ambient-code/agentready/commit/97b06af1d2adc17ec385d658310f3562f19b1a95))
+* disable attestations for Test PyPI to avoid conflict ([#155](https://github.com/jeremyeder/agentready/issues/155)) ([a33e3cd](https://github.com/jeremyeder/agentready/commit/a33e3cd2d86d4a461701e906070ab3eae8ca8082)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish)
+* leaderboard workflow and SSH URL support ([#147](https://github.com/jeremyeder/agentready/issues/147)) ([de28cd0](https://github.com/jeremyeder/agentready/commit/de28cd0a6037a0951ba370aa73832553c088cfb8))
+* resolve 45 test failures across CLI, services, and assessors ([#4](https://github.com/jeremyeder/agentready/issues/4)) ([3405142](https://github.com/jeremyeder/agentready/commit/340514251d40f283afa24d5c3068f294727fd839)), closes [#178](https://github.com/jeremyeder/agentready/issues/178) [#178](https://github.com/jeremyeder/agentready/issues/178)
+* resolve broken links and workflow failures ([#160](https://github.com/jeremyeder/agentready/issues/160)) ([fbf5cf7](https://github.com/jeremyeder/agentready/commit/fbf5cf7a1fdcb65ef4d3943a2d84e46aa831d337))
+* skip PR comments for external forks to prevent permission errors ([#163](https://github.com/jeremyeder/agentready/issues/163)) ([2a29fb8](https://github.com/jeremyeder/agentready/commit/2a29fb84485a1ac6beff1675131bf50c1b702585))
 
 
 ### Features
 
-* replace markdown-link-check with lychee for link validation ([#177](https://github.com/ambient-code/agentready/issues/177)) ([f1a4545](https://github.com/ambient-code/agentready/commit/f1a4545e4718b735df3e1fa7e0b60eba9ed0173b))
-* Terminal-Bench eval harness (MVP Phase 1) ([#178](https://github.com/ambient-code/agentready/issues/178)) ([d06bab4](https://github.com/ambient-code/agentready/commit/d06bab42848847df26d83c7a44e5ee0e84ae0445)), closes [#171](https://github.com/ambient-code/agentready/issues/171)
+* add ambient-code/agentready to leaderboard ([#148](https://github.com/jeremyeder/agentready/issues/148)) ([621152e](https://github.com/jeremyeder/agentready/commit/621152e46bd8e9505e3bc1775d2cd61a80af5a62))
+* add quay/quay to leaderboard ([#162](https://github.com/jeremyeder/agentready/issues/162)) ([d6e8df0](https://github.com/jeremyeder/agentready/commit/d6e8df0e9d92c4ec82004c5e62c798986feb1000))
+* Add weekly research update skill and automation ([#145](https://github.com/jeremyeder/agentready/issues/145)) ([7ba17a6](https://github.com/jeremyeder/agentready/commit/7ba17a6b045251cbc9f26b5c2f4a0ec31d89dd11))
+* automate PyPI publishing with trusted publishing (OIDC) ([#154](https://github.com/jeremyeder/agentready/issues/154)) ([71f4632](https://github.com/jeremyeder/agentready/commit/71f4632cb188d8c9db377c9f216c047e20727f99)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish)
 
-## [2.14.1](https://github.com/ambient-code/agentready/compare/v2.14.0...v2.14.1) (2025-12-05)
 
+### Performance Improvements
 
-### Bug Fixes
-
-* resolve YAML syntax error in continuous-learning workflow ([#172](https://github.com/ambient-code/agentready/issues/172)) ([3d40fcc](https://github.com/ambient-code/agentready/commit/3d40fcccd4e8d722303d322716454869ca7db9d0))
-
-# [2.14.0](https://github.com/ambient-code/agentready/compare/v2.13.0...v2.14.0) (2025-12-05)
-
-
-### Features
-
-* container support ([#171](https://github.com/ambient-code/agentready/issues/171)) ([c6874ea](https://github.com/ambient-code/agentready/commit/c6874ea035775ac86ef5012bbfdf52e7b96f556f))
+* implement lazy loading for heavy CLI commands ([#151](https://github.com/jeremyeder/agentready/issues/151)) ([6a7cd4e](https://github.com/jeremyeder/agentready/commit/6a7cd4e147ebfdfc95921b86599a5b650db76153))
 
 # [2.13.0](https://github.com/ambient-code/agentready/compare/v2.12.3...v2.13.0) (2025-12-04)
 
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -2,15 +2,15 @@
 
 **Purpose**: Assess repositories against agent-ready best practices and generate actionable reports.
 
-**Last Updated**: 2025-12-09
+**Last Updated**: 2025-12-08
 
 ---
 
 ## Overview
 
 AgentReady is a Python CLI tool that evaluates repositories against a comprehensive set of carefully researched attributes that make codebases more effective for AI-assisted development. It generates interactive HTML reports, version-control friendly Markdown reports, and machine-readable JSON output.
 
-**Current Status**: v2.15.0 - Core assessment engine complete, most essential assessors implemented, LLM-powered learning, research report management
+**Current Status**: v2.10.0 - Core assessment engine complete, most essential assessors implemented, LLM-powered learning, research report management
 
 **Self-Assessment Score**: 80.0/100 (Gold) - See `examples/self-assessment/`
 
@@ -192,133 +192,6 @@ class MyAssessor(BaseAssessor):
 
 ---
 
-## Terminal-Bench Eval Harness
-
-**Purpose**: Empirically measure the impact of AgentReady assessors on Terminal-Bench performance through systematic A/B testing.
-
-### Overview
-
-The eval harness tests each assessor independently to measure its specific impact on agentic development benchmarks. This provides evidence-based validation of AgentReady's recommendations.
-
-**Architecture**:
-1. **Baseline**: Run Terminal-Bench on unmodified repository (5 iterations)
-2. **Per-Assessor Test**: Apply single assessor remediation → measure delta
-3. **Aggregate**: Rank assessors by impact, calculate tier statistics
-4. **Dashboard**: Generate interactive visualization for GitHub Pages
-
-**Components**:
-- `src/agentready/services/eval_harness/` - Core services (TbenchRunner, BaselineEstablisher, AssessorTester, ResultsAggregator, DashboardGenerator)
-- `src/agentready/models/eval_harness.py` - Data models (TbenchResult, BaselineMetrics, AssessorImpact, EvalSummary)
-- `src/agentready/cli/eval_harness.py` - CLI commands (baseline, test-assessor, run-tier, summarize, dashboard)
-- `docs/tbench.md` - Interactive dashboard with Chart.js
-- `docs/tbench/methodology.md` - Detailed statistical methodology
-
-### Running Evaluations
-
-```bash
-# 1. Establish baseline (run Terminal-Bench 5 times on unmodified repo)
-agentready eval-harness baseline --repo . --iterations 5
-
-# 2. Test single assessor
-agentready eval-harness test-assessor \
-  --assessor-id claude_md_file \
-  --iterations 5
-
-# 3. Test all Tier 1 assessors
-agentready eval-harness run-tier --tier 1 --iterations 5
-
-# 4. Aggregate results (rank by impact, calculate statistics)
-agentready eval-harness summarize --verbose
-
-# 5. Generate dashboard data files for GitHub Pages
-agentready eval-harness dashboard --verbose
-```
-
-### File Structure
-
-```
-.agentready/eval_harness/          # Results storage (gitignored)
-├── baseline/
-│   ├── run_001.json              # Individual tbench runs
-│   ├── run_002.json
-│   ├── ...
-│   └── summary.json              # BaselineMetrics
-├── assessors/
-│   ├── claude_md_file/
-│   │   ├── finding.json          # Assessment result
-│   │   ├── fixes_applied.log     # Remediation log
-│   │   ├── run_001.json          # Post-remediation runs
-│   │   ├── ...
-│   │   └── impact.json           # AssessorImpact metrics
-│   └── ...
-└── summary.json                   # EvalSummary (ranked impacts)
-
-docs/_data/tbench/                 # Dashboard data (committed)
-├── summary.json
-├── ranked_assessors.json
-├── tier_impacts.json
-├── baseline.json
-└── stats.json
-```
-
-### Statistical Methods
-
-**Significance Criteria** (both required):
-- **P-value < 0.05**: 95% confidence (two-sample t-test)
-- **|Cohen's d| > 0.2**: Meaningful effect size
-
-**Effect Size Interpretation**:
-- **0.2 ≤ |d| < 0.5**: Small effect
-- **0.5 ≤ |d| < 0.8**: Medium effect
-- **|d| ≥ 0.8**: Large effect
-
-### Current Status
-
-**Phase 1 (MVP)**: Mocked Terminal-Bench integration ✅
-- All core services implemented and tested
-- CLI commands functional
-- Dashboard with Chart.js visualizations
-- 6 CLI unit tests + 5 integration tests passing
-
-**Phase 2 (Planned)**: Real Terminal-Bench integration
-- Harbor framework client
-- Actual benchmark submissions
-- Leaderboard integration
-
-### Testing
-
-```bash
-# Run eval harness tests
-pytest tests/unit/test_eval_harness*.py -v
-pytest tests/integration/test_eval_harness_e2e.py -v
-```
-
-**Test Coverage**:
-- Models: 90-95%
-- Services: 85-90%
-- CLI: 100% (help commands validated)
-- Integration: End-to-end workflow tested
-
-### Troubleshooting
-
-**Issue**: `FileNotFoundError: Baseline directory not found`
-**Solution**: Run `agentready eval-harness baseline` first
-
-**Issue**: `No assessor results found`
-**Solution**: Run `agentready eval-harness test-assessor` or `run-tier` first
-
-**Issue**: Mocked scores seem unrealistic
-**Solution**: This is expected in Phase 1 (mocked mode) - real integration coming in Phase 2
-
-### Documentation
-
-- **User Guide**: `docs/eval-harness-guide.md` - Step-by-step tutorials
-- **Methodology**: `docs/tbench/methodology.md` - Statistical methods explained
-- **Dashboard**: `docs/tbench.md` - Interactive results visualization
-- **Plan**: `.claude/plans/quirky-squishing-plum.md` - Implementation roadmap
-
----
-
 ## Project Structure
 
 ```
@@ -517,6 +390,6 @@ Use the @agent-github-pages-docs to [action] based on:
 
 ---
 
-**Last Updated**: 2025-12-09 by Jeremy Eder
-**AgentReady Version**: 2.15.0
+**Last Updated**: 2025-12-08 by Jeremy Eder
+**AgentReady Version**: 2.10.0
 **Self-Assessment**: 80.0/100 (Gold) ✨
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "agentready"
-version = "2.15.0"
+version = "2.10.0"
 description = "Assess and bootstrap git repositories for AI-assisted development with automated remediation and continuous learning"
 authors = [{name = "Jeremy Eder", email = "jeder@redhat.com"}]
 readme = "README.md"