Skip to content

Commit c5e7312

Browse files
committed
docs: Add v1.8.2 release notes to ROAD-TO-V2-Overhaul.md
1 parent 14efc13 commit c5e7312

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

docs/ROAD-TO-V2-Overhaul.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,13 @@ Recent `1.8.x` work focused on separating model failures from provider/auth/infr
134134

135135
The practical outcome is cleaner benchmarking: weighted scores reflect task-solving ability, while external instability is tracked separately and recoverably.
136136

137+
## What changed in v1.8.2 — prompt integration and workspace cleanup
138+
139+
Two focused improvements to eval ergonomics:
140+
141+
- **MCP tool guidance integrated into task prompt sections.** Instead of appending a separate `MCP TOOLS:` block, MCP guidance is now woven into the existing `ENVIRONMENT`, `YOUR TASK`, `IMPORTANT`, and `RULES` sections when `--use-mcp-tools` is enabled. This produces a more natural prompt structure and makes MCP tool usage a first-class instruction rather than an afterthought. Agent-specific `mcp_prompt` text is no longer appended separately.
142+
- **Workspace cleanup preserves eval artifacts.** Previously, `--keep-workspaces=false` (the default) removed the entire workspace directory after validation, which also deleted `agent.log`, `validation.log`, and integrity artifacts since the workspace dir doubles as the task output dir. Cleanup now selectively removes only source files while preserving harness-produced outputs (`agent.log`, `validation.log`, `integrity.json`, `integrity-files/`, `integrity-diff/`).
143+
137144
## Compatibility and comparing old runs
138145

139146
`1.7.x` is intentionally not identical to `v1.6.1` behavior. If you are comparing against historical leaderboard-era runs, use legacy mode:
@@ -167,3 +174,5 @@ Notable commits in range (chronological):
167174
- `8b8f55d`
168175
- `6f61a69`
169176
- `d027b85`
177+
- `1b26446`
178+
- `14efc13`

0 commit comments

Comments
 (0)