fix(baa-dev): route session analysis to session-analyst; ground architect reviews in tool evidence#255
Merged
Conversation
…tect reviews in tool evidence Two targeted, structural fixes to the behavioral-anchor-amplifier-dev experiment bundle, ahead of a verifying measurement sweep. FIX 1 (session-analyst routing): register `foundation:session-analyst` as a delegable agent and add a "Delegate session analysis" principle to the system prompt. Closes a model-agnostic defect where session-analysis / events.jsonl tasks were not routed to the specialist (base BAA scored 2/2/2 vs amplifier-dev 10/9/9 across opus-4.7/opus-4.8/gpt-5.5). FIX 2 (architect verification): grant the architect agent `tool-web` and turn its cite-evidence rule into an evidence gate, so PR/code reviews must fetch and read what they assert rather than confabulating (addresses the pr185 fabricated-review failure on opus-4.7). Deliberately scoped: kernel-vocab phrasing, anti-paralysis, and DTU-persistence changes were evaluated and EXCLUDED as not worth the cost / not reproducible defects. Impact to be confirmed by a measurement run after merge. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
e36cee1 to
fdaf2f9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two targeted, structural fixes to the behavioral-anchor-amplifier-dev experiment bundle, ahead of a verifying measurement sweep.
FIX 1 (session-analyst routing): register
foundation:session-analystas a delegable agent and add a "Delegate session analysis" principle to the system prompt. Closes a model-agnostic defect where session-analysis / events.jsonl tasks were not routed to the specialist (base BAA scored 2/2/2 vs amplifier-dev 10/9/9 across opus-4.7/opus-4.8/gpt-5.5).FIX 2 (architect verification): grant the architect agent
tool-weband turn its cite-evidence rule into an evidence gate, so PR/code reviews must fetch and read what they assert rather than confabulating (addresses the pr185 fabricated-review failure on opus-4.7).Deliberately scoped: kernel-vocab phrasing, anti-paralysis, and DTU-persistence changes were evaluated and EXCLUDED as not worth the cost / not reproducible defects.
Impact to be confirmed by a measurement run after merge.
Note: This PR includes both the bundle-add commit (257c8f9) and this fix commit (e36cee1) — that is intentional. The PR delivers the complete, fixed bundle into the experiments folder. After approval, a measurement sweep will validate both fixes.