Skip to content

Commit 1d32bc7

Browse files
semantic-release-botjeremyeder
authored andcommitted
chore(release): 2.10.0 [skip ci]
* disable attestations for Test PyPI to avoid conflict ([ambient-code#155](https://github.com/jeremyeder/agentready/issues/155)) ([a33e3cd](a33e3cd)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish) * leaderboard workflow and SSH URL support ([ambient-code#147](https://github.com/jeremyeder/agentready/issues/147)) ([de28cd0](de28cd0)) * resolve 45 test failures across CLI, services, and assessors ([#4](#4)) ([3405142](3405142)), closes [ambient-code#178](https://github.com/jeremyeder/agentready/issues/178) [ambient-code#178](https://github.com/jeremyeder/agentready/issues/178) * resolve broken links and workflow failures ([ambient-code#160](https://github.com/jeremyeder/agentready/issues/160)) ([fbf5cf7](fbf5cf7)) * skip PR comments for external forks to prevent permission errors ([ambient-code#163](https://github.com/jeremyeder/agentready/issues/163)) ([2a29fb8](2a29fb8)) * add ambient-code/agentready to leaderboard ([ambient-code#148](https://github.com/jeremyeder/agentready/issues/148)) ([621152e](621152e)) * add quay/quay to leaderboard ([ambient-code#162](https://github.com/jeremyeder/agentready/issues/162)) ([d6e8df0](d6e8df0)) * Add weekly research update skill and automation ([ambient-code#145](https://github.com/jeremyeder/agentready/issues/145)) ([7ba17a6](7ba17a6)) * automate PyPI publishing with trusted publishing (OIDC) ([ambient-code#154](https://github.com/jeremyeder/agentready/issues/154)) ([71f4632](71f4632)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish) * implement lazy loading for heavy CLI commands ([ambient-code#151](https://github.com/jeremyeder/agentready/issues/151)) ([6a7cd4e](6a7cd4e))
1 parent 16132e2 commit 1d32bc7

File tree

3 files changed

+17
-148
lines changed

3 files changed

+17
-148
lines changed

CHANGELOG.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,26 @@
1-
# [2.15.0](https://github.com/ambient-code/agentready/compare/v2.14.1...v2.15.0) (2025-12-09)
1+
# [2.10.0](https://github.com/jeremyeder/agentready/compare/v2.9.0...v2.10.0) (2025-12-08)
22

33

44
### Bug Fixes
55

6-
* resolve all test suite failures - achieve zero failures ([#180](https://github.com/ambient-code/agentready/issues/180)) ([990fa2d](https://github.com/ambient-code/agentready/commit/990fa2d4725842df60af151d1ba058cd43a90d3c)), closes [#148](https://github.com/ambient-code/agentready/issues/148) [#147](https://github.com/ambient-code/agentready/issues/147) [#145](https://github.com/ambient-code/agentready/issues/145)
7-
* resolve YAML syntax error in update-docs workflow and add actionlint ([#173](https://github.com/ambient-code/agentready/issues/173)) ([97b06af](https://github.com/ambient-code/agentready/commit/97b06af1d2adc17ec385d658310f3562f19b1a95))
6+
* disable attestations for Test PyPI to avoid conflict ([#155](https://github.com/jeremyeder/agentready/issues/155)) ([a33e3cd](https://github.com/jeremyeder/agentready/commit/a33e3cd2d86d4a461701e906070ab3eae8ca8082)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish)
7+
* leaderboard workflow and SSH URL support ([#147](https://github.com/jeremyeder/agentready/issues/147)) ([de28cd0](https://github.com/jeremyeder/agentready/commit/de28cd0a6037a0951ba370aa73832553c088cfb8))
8+
* resolve 45 test failures across CLI, services, and assessors ([#4](https://github.com/jeremyeder/agentready/issues/4)) ([3405142](https://github.com/jeremyeder/agentready/commit/340514251d40f283afa24d5c3068f294727fd839)), closes [#178](https://github.com/jeremyeder/agentready/issues/178) [#178](https://github.com/jeremyeder/agentready/issues/178)
9+
* resolve broken links and workflow failures ([#160](https://github.com/jeremyeder/agentready/issues/160)) ([fbf5cf7](https://github.com/jeremyeder/agentready/commit/fbf5cf7a1fdcb65ef4d3943a2d84e46aa831d337))
10+
* skip PR comments for external forks to prevent permission errors ([#163](https://github.com/jeremyeder/agentready/issues/163)) ([2a29fb8](https://github.com/jeremyeder/agentready/commit/2a29fb84485a1ac6beff1675131bf50c1b702585))
811

912

1013
### Features
1114

12-
* replace markdown-link-check with lychee for link validation ([#177](https://github.com/ambient-code/agentready/issues/177)) ([f1a4545](https://github.com/ambient-code/agentready/commit/f1a4545e4718b735df3e1fa7e0b60eba9ed0173b))
13-
* Terminal-Bench eval harness (MVP Phase 1) ([#178](https://github.com/ambient-code/agentready/issues/178)) ([d06bab4](https://github.com/ambient-code/agentready/commit/d06bab42848847df26d83c7a44e5ee0e84ae0445)), closes [#171](https://github.com/ambient-code/agentready/issues/171)
15+
* add ambient-code/agentready to leaderboard ([#148](https://github.com/jeremyeder/agentready/issues/148)) ([621152e](https://github.com/jeremyeder/agentready/commit/621152e46bd8e9505e3bc1775d2cd61a80af5a62))
16+
* add quay/quay to leaderboard ([#162](https://github.com/jeremyeder/agentready/issues/162)) ([d6e8df0](https://github.com/jeremyeder/agentready/commit/d6e8df0e9d92c4ec82004c5e62c798986feb1000))
17+
* Add weekly research update skill and automation ([#145](https://github.com/jeremyeder/agentready/issues/145)) ([7ba17a6](https://github.com/jeremyeder/agentready/commit/7ba17a6b045251cbc9f26b5c2f4a0ec31d89dd11))
18+
* automate PyPI publishing with trusted publishing (OIDC) ([#154](https://github.com/jeremyeder/agentready/issues/154)) ([71f4632](https://github.com/jeremyeder/agentready/commit/71f4632cb188d8c9db377c9f216c047e20727f99)), closes [pypa/#action-pypi-publish](https://github.com/jeremyeder/agentready/issues/action-pypi-publish)
1419

15-
## [2.14.1](https://github.com/ambient-code/agentready/compare/v2.14.0...v2.14.1) (2025-12-05)
1620

21+
### Performance Improvements
1722

18-
### Bug Fixes
19-
20-
* resolve YAML syntax error in continuous-learning workflow ([#172](https://github.com/ambient-code/agentready/issues/172)) ([3d40fcc](https://github.com/ambient-code/agentready/commit/3d40fcccd4e8d722303d322716454869ca7db9d0))
21-
22-
# [2.14.0](https://github.com/ambient-code/agentready/compare/v2.13.0...v2.14.0) (2025-12-05)
23-
24-
25-
### Features
26-
27-
* container support ([#171](https://github.com/ambient-code/agentready/issues/171)) ([c6874ea](https://github.com/ambient-code/agentready/commit/c6874ea035775ac86ef5012bbfdf52e7b96f556f))
23+
* implement lazy loading for heavy CLI commands ([#151](https://github.com/jeremyeder/agentready/issues/151)) ([6a7cd4e](https://github.com/jeremyeder/agentready/commit/6a7cd4e147ebfdfc95921b86599a5b650db76153))
2824

2925
# [2.13.0](https://github.com/ambient-code/agentready/compare/v2.12.3...v2.13.0) (2025-12-04)
3026

CLAUDE.md

Lines changed: 4 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22

33
**Purpose**: Assess repositories against agent-ready best practices and generate actionable reports.
44

5-
**Last Updated**: 2025-12-09
5+
**Last Updated**: 2025-12-08
66

77
---
88

99
## Overview
1010

1111
AgentReady is a Python CLI tool that evaluates repositories against a comprehensive set of carefully researched attributes that make codebases more effective for AI-assisted development. It generates interactive HTML reports, version-control friendly Markdown reports, and machine-readable JSON output.
1212

13-
**Current Status**: v2.15.0 - Core assessment engine complete, most essential assessors implemented, LLM-powered learning, research report management
13+
**Current Status**: v2.10.0 - Core assessment engine complete, most essential assessors implemented, LLM-powered learning, research report management
1414

1515
**Self-Assessment Score**: 80.0/100 (Gold) - See `examples/self-assessment/`
1616

@@ -192,133 +192,6 @@ class MyAssessor(BaseAssessor):
192192

193193
---
194194

195-
## Terminal-Bench Eval Harness
196-
197-
**Purpose**: Empirically measure the impact of AgentReady assessors on Terminal-Bench performance through systematic A/B testing.
198-
199-
### Overview
200-
201-
The eval harness tests each assessor independently to measure its specific impact on agentic development benchmarks. This provides evidence-based validation of AgentReady's recommendations.
202-
203-
**Architecture**:
204-
1. **Baseline**: Run Terminal-Bench on unmodified repository (5 iterations)
205-
2. **Per-Assessor Test**: Apply single assessor remediation → measure delta
206-
3. **Aggregate**: Rank assessors by impact, calculate tier statistics
207-
4. **Dashboard**: Generate interactive visualization for GitHub Pages
208-
209-
**Components**:
210-
- `src/agentready/services/eval_harness/` - Core services (TbenchRunner, BaselineEstablisher, AssessorTester, ResultsAggregator, DashboardGenerator)
211-
- `src/agentready/models/eval_harness.py` - Data models (TbenchResult, BaselineMetrics, AssessorImpact, EvalSummary)
212-
- `src/agentready/cli/eval_harness.py` - CLI commands (baseline, test-assessor, run-tier, summarize, dashboard)
213-
- `docs/tbench.md` - Interactive dashboard with Chart.js
214-
- `docs/tbench/methodology.md` - Detailed statistical methodology
215-
216-
### Running Evaluations
217-
218-
```bash
219-
# 1. Establish baseline (run Terminal-Bench 5 times on unmodified repo)
220-
agentready eval-harness baseline --repo . --iterations 5
221-
222-
# 2. Test single assessor
223-
agentready eval-harness test-assessor \
224-
--assessor-id claude_md_file \
225-
--iterations 5
226-
227-
# 3. Test all Tier 1 assessors
228-
agentready eval-harness run-tier --tier 1 --iterations 5
229-
230-
# 4. Aggregate results (rank by impact, calculate statistics)
231-
agentready eval-harness summarize --verbose
232-
233-
# 5. Generate dashboard data files for GitHub Pages
234-
agentready eval-harness dashboard --verbose
235-
```
236-
237-
### File Structure
238-
239-
```
240-
.agentready/eval_harness/ # Results storage (gitignored)
241-
├── baseline/
242-
│ ├── run_001.json # Individual tbench runs
243-
│ ├── run_002.json
244-
│ ├── ...
245-
│ └── summary.json # BaselineMetrics
246-
├── assessors/
247-
│ ├── claude_md_file/
248-
│ │ ├── finding.json # Assessment result
249-
│ │ ├── fixes_applied.log # Remediation log
250-
│ │ ├── run_001.json # Post-remediation runs
251-
│ │ ├── ...
252-
│ │ └── impact.json # AssessorImpact metrics
253-
│ └── ...
254-
└── summary.json # EvalSummary (ranked impacts)
255-
256-
docs/_data/tbench/ # Dashboard data (committed)
257-
├── summary.json
258-
├── ranked_assessors.json
259-
├── tier_impacts.json
260-
├── baseline.json
261-
└── stats.json
262-
```
263-
264-
### Statistical Methods
265-
266-
**Significance Criteria** (both required):
267-
- **P-value < 0.05**: 95% confidence (two-sample t-test)
268-
- **|Cohen's d| > 0.2**: Meaningful effect size
269-
270-
**Effect Size Interpretation**:
271-
- **0.2 ≤ |d| < 0.5**: Small effect
272-
- **0.5 ≤ |d| < 0.8**: Medium effect
273-
- **|d| ≥ 0.8**: Large effect
274-
275-
### Current Status
276-
277-
**Phase 1 (MVP)**: Mocked Terminal-Bench integration ✅
278-
- All core services implemented and tested
279-
- CLI commands functional
280-
- Dashboard with Chart.js visualizations
281-
- 6 CLI unit tests + 5 integration tests passing
282-
283-
**Phase 2 (Planned)**: Real Terminal-Bench integration
284-
- Harbor framework client
285-
- Actual benchmark submissions
286-
- Leaderboard integration
287-
288-
### Testing
289-
290-
```bash
291-
# Run eval harness tests
292-
pytest tests/unit/test_eval_harness*.py -v
293-
pytest tests/integration/test_eval_harness_e2e.py -v
294-
```
295-
296-
**Test Coverage**:
297-
- Models: 90-95%
298-
- Services: 85-90%
299-
- CLI: 100% (help commands validated)
300-
- Integration: End-to-end workflow tested
301-
302-
### Troubleshooting
303-
304-
**Issue**: `FileNotFoundError: Baseline directory not found`
305-
**Solution**: Run `agentready eval-harness baseline` first
306-
307-
**Issue**: `No assessor results found`
308-
**Solution**: Run `agentready eval-harness test-assessor` or `run-tier` first
309-
310-
**Issue**: Mocked scores seem unrealistic
311-
**Solution**: This is expected in Phase 1 (mocked mode) - real integration coming in Phase 2
312-
313-
### Documentation
314-
315-
- **User Guide**: `docs/eval-harness-guide.md` - Step-by-step tutorials
316-
- **Methodology**: `docs/tbench/methodology.md` - Statistical methods explained
317-
- **Dashboard**: `docs/tbench.md` - Interactive results visualization
318-
- **Plan**: `.claude/plans/quirky-squishing-plum.md` - Implementation roadmap
319-
320-
---
321-
322195
## Project Structure
323196

324197
```
@@ -517,6 +390,6 @@ Use the @agent-github-pages-docs to [action] based on:
517390

518391
---
519392

520-
**Last Updated**: 2025-12-09 by Jeremy Eder
521-
**AgentReady Version**: 2.15.0
393+
**Last Updated**: 2025-12-08 by Jeremy Eder
394+
**AgentReady Version**: 2.10.0
522395
**Self-Assessment**: 80.0/100 (Gold) ✨

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "agentready"
3-
version = "2.15.0"
3+
version = "2.10.0"
44
description = "Assess and bootstrap git repositories for AI-assisted development with automated remediation and continuous learning"
55
authors = [{name = "Jeremy Eder", email = "[email protected]"}]
66
readme = "README.md"

0 commit comments

Comments
 (0)