Skip to content

Commit 9b6f7a0

Browse files
authored
fix: harden debate prompts and fix consult model/flag issues (#226)
* fix: remove auto-generated header from adapter files The `<!-- AUTO-GENERATED ... -->` HTML comment before frontmatter prevented tools like agnix from parsing YAML frontmatter on line 1. Removed the header entirely - the adapters/ directory is self-explanatory. Drops agnix errors from 83 to 8. * fix: apply debate findings and enhance analysis to consult and debate plugins Debate skill hardened based on its own first debate's findings: - Universal evidence standard for both proposer AND challenger - Proposer prompts now require cited evidence (was challenger-only) - Challenger follow-up reordered: anti-convergence guard first - Minimum-disagreement requirement per round added - Context summarization criteria specified (500-800 tokens) - Rigor indicator and Debate Quality rating in synthesis output Consult skill fixes from enhance analysis: - Gemini section: added missing Session ID extraction line - Codex: removed invalid -a suggest flag (codex exec doesn't support it) - Codex: added -c model_reasoning_effort to safe command patterns - Gemini models: replaced all -preview suffixes with stable names Also updates README, test strategy doc, and all adapters.
1 parent 0f0bcf6 commit 9b6f7a0

File tree

11 files changed

+100
-77
lines changed

11 files changed

+100
-77
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -651,7 +651,7 @@ agent-knowledge/
651651
| Tool | Default Model (high) | Reasoning Control |
652652
|------|---------------------|-------------------|
653653
| Claude | opus | max-turns |
654-
| Gemini | gemini-3-pro-preview | built-in |
654+
| Gemini | gemini-3-pro | built-in |
655655
| Codex | gpt-5.3-codex | model_reasoning_effort |
656656
| OpenCode | github-copilot/claude-opus-4-6 | --variant |
657657
| Copilot | (default) | none |

adapters/codex/skills/consult/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You are executing the /consult command. Your job is to parse the user's request
1111

1212
- NEVER expose API keys in commands or output
1313
- NEVER run with permission-bypassing flags (`--dangerously-skip-permissions`, `bypassPermissions`)
14-
- MUST use safe-mode defaults (`-a suggest` for Codex, `--allowedTools "Read,Glob,Grep"` for Claude)
14+
- MUST use safe-mode defaults (`--allowedTools "Read,Glob,Grep"` for Claude, `-c model_reasoning_effort` for Codex)
1515
- MUST enforce 120s timeout on all tool executions
1616
- MUST validate tool names against allow-list: gemini, codex, claude, opencode, copilot (reject all others)
1717
- MUST validate `--context=file=PATH` is within the project directory (reject absolute paths outside cwd)

adapters/opencode/agents/consult-agent.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,9 +120,9 @@ Run N Bash commands **in parallel** (multiple Bash tool calls in a single messag
120120

121121
Example for 3 parallel Codex calls:
122122
```
123-
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-1.tmp")" --json -m "gpt-5.3-codex" -a suggest
124-
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-2.tmp")" --json -m "gpt-5.3-codex" -a suggest
125-
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-3.tmp")" --json -m "gpt-5.3-codex" -a suggest
123+
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-1.tmp")" --json -m "gpt-5.3-codex" -c model_reasoning_effort="high"
124+
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-2.tmp")" --json -m "gpt-5.3-codex" -c model_reasoning_effort="high"
125+
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-3.tmp")" --json -m "gpt-5.3-codex" -c model_reasoning_effort="high"
126126
```
127127

128128
#### 4d. Parse and Format Results

adapters/opencode/commands/consult.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ You are executing the /consult command. Your job is to parse the user's request
1717

1818
- NEVER expose API keys in commands or output
1919
- NEVER run with permission-bypassing flags (`--dangerously-skip-permissions`, `bypassPermissions`)
20-
- MUST use safe-mode defaults (`-a suggest` for Codex, `--allowedTools "Read,Glob,Grep"` for Claude)
20+
- MUST use safe-mode defaults (`--allowedTools "Read,Glob,Grep"` for Claude, `-c model_reasoning_effort` for Codex)
2121
- MUST enforce 120s timeout on all tool executions
2222
- MUST validate tool names against allow-list: gemini, codex, claude, opencode, copilot (reject all others)
2323
- MUST validate `--context=file=PATH` is within the project directory (reject absolute paths outside cwd)

adapters/opencode/skills/consult/SKILL.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -66,22 +66,23 @@ Command: gemini -p "QUESTION" --output-format json -m "MODEL"
6666
Session resume: --resume "SESSION_ID"
6767
```
6868

69-
Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-preview
69+
Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash, gemini-3-pro
7070

7171
| Effort | Model |
7272
|--------|-------|
7373
| low | gemini-2.5-flash |
74-
| medium | gemini-2.5-pro |
75-
| high | gemini-3-flash-preview |
76-
| max | gemini-3-pro-preview |
74+
| medium | gemini-3-flash |
75+
| high | gemini-3-pro |
76+
| max | gemini-3-pro |
7777

7878
**Parse output**: `JSON.parse(stdout).response`
79+
**Session ID**: `JSON.parse(stdout).session_id`
7980
**Continuable**: Yes (via `--resume`)
8081

8182
### Codex
8283

8384
```
84-
Command: codex exec "QUESTION" --json -m "MODEL" -a suggest -c model_reasoning_effort="LEVEL"
85+
Command: codex exec "QUESTION" --json -m "MODEL" -c model_reasoning_effort="LEVEL"
8586
Session resume: codex exec resume SESSION_ID "QUESTION" --json
8687
Session resume (latest): codex exec resume --last "QUESTION" --json
8788
```
@@ -193,7 +194,7 @@ User-provided question text MUST NOT be interpolated into shell command strings.
193194
| Claude (resume) | `claude -p - --output-format json --model "MODEL" --max-turns TURNS --allowedTools "Read,Glob,Grep" --resume "SESSION_ID" < "{AI_STATE_DIR}/consult/question.tmp"` |
194195
| Gemini | `gemini -p - --output-format json -m "MODEL" < "{AI_STATE_DIR}/consult/question.tmp"` |
195196
| Gemini (resume) | `gemini -p - --output-format json -m "MODEL" --resume "SESSION_ID" < "{AI_STATE_DIR}/consult/question.tmp"` |
196-
| Codex | `codex exec "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL" -a suggest` (Codex exec lacks stdin mode -- cat reads from platform-controlled path, not user input) |
197+
| Codex | `codex exec "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL" -c model_reasoning_effort="LEVEL"` (Codex exec lacks stdin mode -- cat reads from platform-controlled path, not user input) |
197198
| Codex (resume) | `codex exec resume SESSION_ID "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL"` |
198199
| Codex (resume latest) | `codex exec resume --last "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL"` |
199200
| OpenCode | `opencode run - --format json --model "MODEL" --variant "VARIANT" < "{AI_STATE_DIR}/consult/question.tmp"` |
@@ -266,7 +267,7 @@ Return a plain JSON object to stdout (no markers or wrappers):
266267
```json
267268
{
268269
"tool": "gemini",
269-
"model": "gemini-3-pro-preview",
270+
"model": "gemini-3-pro",
270271
"effort": "high",
271272
"duration_ms": 12300,
272273
"response": "The AI's response text here...",

adapters/opencode/skills/debate/SKILL.md

Lines changed: 29 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ Parse from `$ARGUMENTS`:
2626
- **--model-proposer**: Specific model for proposer (optional)
2727
- **--model-challenger**: Specific model for challenger (optional)
2828

29+
## Universal Rules
30+
31+
ALL participants (proposer AND challenger) MUST support claims with specific evidence (file path, code pattern, benchmark, or documented behavior). Unsupported claims from either side will be flagged by the other participant and noted in the verdict. This applies to every round.
32+
2933
## Prompt Templates
3034

3135
### Round 1: Proposer Opening
@@ -35,7 +39,9 @@ You are participating in a structured debate as the PROPOSER.
3539
3640
Topic: {topic}
3741
38-
Your job: Analyze this topic thoroughly and present your position. Be specific, cite concrete reasons, and consider tradeoffs. Do not hedge excessively - take a clear stance.
42+
Your job: Analyze this topic thoroughly and present your position. Take a clear stance. Do not hedge excessively.
43+
44+
You MUST support each claim with specific evidence (file path, code pattern, benchmark, or documented behavior). Unsupported claims will be challenged. "I think" or "generally speaking" without evidence is not acceptable.
3945
4046
Provide your analysis:
4147
```
@@ -60,6 +66,9 @@ Rules:
6066
- Lead with what's WRONG or MISSING, then acknowledge what's right
6167
- If you genuinely agree on a point, explain what RISK remains despite the agreement
6268
- Propose at least one concrete alternative approach
69+
- You MUST address at least these categories: correctness, security implications, and developer experience
70+
- Do NOT agree with ANY claim unless you can cite specific evidence (file path, code pattern, or documented behavior) that supports the agreement. Unsupported agreement is not allowed.
71+
- If the proposer makes a claim without evidence, call it out: "This claim is unsupported."
6372
6473
Provide your challenge:
6574
```
@@ -81,8 +90,10 @@ The CHALLENGER ({challenger_tool}) raised these points in round {round-1}:
8190
8291
Your job: Address each challenge directly. For each point:
8392
- If they're right, concede explicitly and explain how your position evolves
84-
- If they're wrong, explain why with specific reasoning
85-
- If it's a tradeoff, acknowledge the tradeoff and explain why you still favor your approach
93+
- If they're wrong, explain why with specific evidence (file path, code pattern, benchmark, or documented behavior)
94+
- If it's a tradeoff, acknowledge the tradeoff and explain why you still favor your approach with evidence
95+
96+
Every claim you make -- whether concession, rebuttal, or new argument -- MUST cite specific evidence. The challenger will reject unsupported claims.
8697
8798
Do NOT simply restate your original position. Your response must show you engaged with the specific challenges raised.
8899
@@ -91,29 +102,7 @@ Provide your defense:
91102

92103
### Round 2+: Challenger Follow-up
93104

94-
```
95-
You are the CHALLENGER in round {round} of a structured debate.
96-
97-
Topic: {topic}
98-
99-
{context_summary}
100-
101-
The PROPOSER ({proposer_tool}) responded to your challenges:
102-
103-
---
104-
{proposer_previous_response}
105-
---
106-
107-
Your job: Evaluate the proposer's defense. For each point they addressed:
108-
- Did they adequately address your concern? If so, acknowledge it
109-
- Did they dodge or superficially address it? Call it out specifically
110-
- Are there NEW weaknesses in their revised position?
111-
112-
If you're genuinely convinced on a point, say so - but explain what convinced you.
113-
If you see new problems, raise them.
114-
115-
Provide your follow-up:
116-
```
105+
*(JavaScript reference - not executable in OpenCode)*
117106

118107
## Context Assembly
119108

@@ -148,11 +137,12 @@ Round {N-1} - Challenger ({challenger_tool}):
148137
{full response}
149138
```
150139

151-
The orchestrator agent (opus) generates the summary. It should preserve:
140+
The orchestrator agent (opus) generates the summary. Target: 500-800 tokens. MUST preserve:
152141
- Each side's core position
153-
- Points of agreement (resolved)
142+
- All concessions (verbatim quotes, not paraphrased)
143+
- All evidence citations that support agreements
154144
- Points of disagreement (unresolved)
155-
- Any concessions made
145+
- Any contradictions between rounds (e.g., proposer concedes in round 1 but walks it back in round 2 -- note both explicitly)
156146

157147
## Synthesis Format
158148

@@ -165,14 +155,22 @@ After all rounds complete, the orchestrator produces this structured output:
165155
**Proposer**: {proposer_tool} ({proposer_model})
166156
**Challenger**: {challenger_tool} ({challenger_model})
167157
**Rounds**: {rounds_completed}
158+
**Rigor**: Structured perspective comparison (prompt-enforced adversarial rules, no deterministic verification)
168159
169160
### Verdict
170161
171162
{winner_tool} had the stronger argument because: {specific reasoning citing debate evidence}
172163
164+
### Debate Quality
165+
166+
Rate the debate on these dimensions:
167+
- **Genuine disagreement**: Did the challenger maintain independent positions, or converge toward the proposer? (high/medium/low)
168+
- **Evidence quality**: Did both sides cite specific examples, or argue from generalities? (high/medium/low)
169+
- **Challenge depth**: Were the challenges substantive, or surface-level? (high/medium/low)
170+
173171
### Key Agreements
174-
- {agreed point 1}
175-
- {agreed point 2}
172+
- {agreed point 1} (evidence: {what supports this agreement})
173+
- {agreed point 2} (evidence: {what supports this agreement})
176174
177175
### Key Disagreements
178176
- {point}: {proposer_tool} argues {X}, {challenger_tool} argues {Y}

docs/consult-command-test-strategy.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -172,9 +172,9 @@ describe('Model Selection', () => {
172172
describe('Gemini models', () => {
173173
it('should map effort levels correctly', () => {
174174
expect(getGeminiModel('low')).toBe('gemini-2.5-flash');
175-
expect(getGeminiModel('medium')).toBe('gemini-3-flash-preview');
176-
expect(getGeminiModel('high')).toBe('gemini-3-pro-preview');
177-
expect(getGeminiModel('max')).toBe('gemini-3-pro-preview');
175+
expect(getGeminiModel('medium')).toBe('gemini-3-flash');
176+
expect(getGeminiModel('high')).toBe('gemini-3-pro');
177+
expect(getGeminiModel('max')).toBe('gemini-3-pro');
178178
});
179179
});
180180

@@ -244,7 +244,7 @@ describe('Session Management', () => {
244244
it('should include question in saved session', () => {
245245
const session = {
246246
tool: 'gemini',
247-
model: 'gemini-3-pro-preview',
247+
model: 'gemini-3-pro',
248248
effort: 'medium',
249249
session_id: 'xyz-789',
250250
timestamp: new Date().toISOString(),
@@ -458,7 +458,7 @@ describe('Session Continuation', () => {
458458
it('should restore tool from saved session', () => {
459459
const session = {
460460
tool: 'gemini',
461-
model: 'gemini-3-pro-preview',
461+
model: 'gemini-3-pro',
462462
effort: 'medium',
463463
session_id: 'session-456',
464464
timestamp: new Date().toISOString(),
@@ -672,18 +672,18 @@ describe('Command Building', () => {
672672

673673
describe('Gemini Command', () => {
674674
it('should build basic command', () => {
675-
const { command, flags } = buildGeminiCommand('question', 'gemini-3-pro-preview');
675+
const { command, flags } = buildGeminiCommand('question', 'gemini-3-pro');
676676
expect(command).toBe('gemini');
677677
expect(flags).toContain('-p');
678678
expect(flags).toContain('"question"');
679679
expect(flags).toContain('--output-format');
680680
expect(flags).toContain('json');
681681
expect(flags).toContain('-m');
682-
expect(flags).toContain('gemini-3-pro-preview');
682+
expect(flags).toContain('gemini-3-pro');
683683
});
684684

685685
it('should append session resume for continuation', () => {
686-
const { flags } = buildGeminiCommand('question', 'gemini-3-pro-preview', 'session-456', true);
686+
const { flags } = buildGeminiCommand('question', 'gemini-3-pro', 'session-456', true);
687687
expect(flags).toContain('--resume');
688688
expect(flags).toContain('session-456');
689689
});
@@ -939,7 +939,7 @@ describe('Full Consultation Flow', () => {
939939
jest.spyOn(fs, 'readFileSync').mockReturnValueOnce(JSON.stringify({
940940
tool: 'gemini',
941941
session_id: 'session-456',
942-
model: 'gemini-3-pro-preview',
942+
model: 'gemini-3-pro',
943943
effort: 'medium',
944944
timestamp: new Date().toISOString(),
945945
question: 'continue',
@@ -1139,7 +1139,7 @@ describe('Mocked Tool Outputs', () => {
11391139
const mockGeminiOutput = `=== CONSULT_RESULT ===
11401140
{
11411141
"tool": "gemini",
1142-
"model": "gemini-3-pro-preview",
1142+
"model": "gemini-3-pro",
11431143
"effort": "medium",
11441144
"duration_ms": 23400,
11451145
"response": "Based on my analysis, the approach seems sound but could benefit from error handling for edge cases.",
@@ -1175,7 +1175,7 @@ describe('Mocked Tool Outputs', () => {
11751175
it('should parse structured output correctly', () => {
11761176
const result = parseMockOutput(mockGeminiOutput, 'gemini');
11771177
expect(result.tool).toBe('gemini');
1178-
expect(result.model).toBe('gemini-3-pro-preview');
1178+
expect(result.model).toBe('gemini-3-pro');
11791179
expect(result.duration_ms).toBe(23400);
11801180
expect(result.session_id).toBe('session-xyz-789');
11811181
});

plugins/consult/agents/consult-agent.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,9 @@ Run N Bash commands **in parallel** (multiple Bash tool calls in a single messag
126126

127127
Example for 3 parallel Codex calls:
128128
```
129-
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-1.tmp")" --json -m "gpt-5.3-codex" -a suggest
130-
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-2.tmp")" --json -m "gpt-5.3-codex" -a suggest
131-
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-3.tmp")" --json -m "gpt-5.3-codex" -a suggest
129+
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-1.tmp")" --json -m "gpt-5.3-codex" -c model_reasoning_effort="high"
130+
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-2.tmp")" --json -m "gpt-5.3-codex" -c model_reasoning_effort="high"
131+
Bash: codex exec "$(cat "{AI_STATE_DIR}/consult/question-3.tmp")" --json -m "gpt-5.3-codex" -c model_reasoning_effort="high"
132132
```
133133

134134
#### 4d. Parse and Format Results

plugins/consult/commands/consult.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ You are executing the /consult command. Your job is to parse the user's request
1414

1515
- NEVER expose API keys in commands or output
1616
- NEVER run with permission-bypassing flags (`--dangerously-skip-permissions`, `bypassPermissions`)
17-
- MUST use safe-mode defaults (`-a suggest` for Codex, `--allowedTools "Read,Glob,Grep"` for Claude)
17+
- MUST use safe-mode defaults (`--allowedTools "Read,Glob,Grep"` for Claude, `-c model_reasoning_effort` for Codex)
1818
- MUST enforce 120s timeout on all tool executions
1919
- MUST validate tool names against allow-list: gemini, codex, claude, opencode, copilot (reject all others)
2020
- MUST validate `--context=file=PATH` is within the project directory (reject absolute paths outside cwd)

plugins/consult/skills/consult/SKILL.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -60,22 +60,23 @@ Command: gemini -p "QUESTION" --output-format json -m "MODEL"
6060
Session resume: --resume "SESSION_ID"
6161
```
6262

63-
Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-preview
63+
Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash, gemini-3-pro
6464

6565
| Effort | Model |
6666
|--------|-------|
6767
| low | gemini-2.5-flash |
68-
| medium | gemini-2.5-pro |
69-
| high | gemini-3-flash-preview |
70-
| max | gemini-3-pro-preview |
68+
| medium | gemini-3-flash |
69+
| high | gemini-3-pro |
70+
| max | gemini-3-pro |
7171

7272
**Parse output**: `JSON.parse(stdout).response`
73+
**Session ID**: `JSON.parse(stdout).session_id`
7374
**Continuable**: Yes (via `--resume`)
7475

7576
### Codex
7677

7778
```
78-
Command: codex exec "QUESTION" --json -m "MODEL" -a suggest -c model_reasoning_effort="LEVEL"
79+
Command: codex exec "QUESTION" --json -m "MODEL" -c model_reasoning_effort="LEVEL"
7980
Session resume: codex exec resume SESSION_ID "QUESTION" --json
8081
Session resume (latest): codex exec resume --last "QUESTION" --json
8182
```
@@ -187,7 +188,7 @@ User-provided question text MUST NOT be interpolated into shell command strings.
187188
| Claude (resume) | `claude -p - --output-format json --model "MODEL" --max-turns TURNS --allowedTools "Read,Glob,Grep" --resume "SESSION_ID" < "{AI_STATE_DIR}/consult/question.tmp"` |
188189
| Gemini | `gemini -p - --output-format json -m "MODEL" < "{AI_STATE_DIR}/consult/question.tmp"` |
189190
| Gemini (resume) | `gemini -p - --output-format json -m "MODEL" --resume "SESSION_ID" < "{AI_STATE_DIR}/consult/question.tmp"` |
190-
| Codex | `codex exec "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL" -a suggest` (Codex exec lacks stdin mode -- cat reads from platform-controlled path, not user input) |
191+
| Codex | `codex exec "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL" -c model_reasoning_effort="LEVEL"` (Codex exec lacks stdin mode -- cat reads from platform-controlled path, not user input) |
191192
| Codex (resume) | `codex exec resume SESSION_ID "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL"` |
192193
| Codex (resume latest) | `codex exec resume --last "$(cat "{AI_STATE_DIR}/consult/question.tmp")" --json -m "MODEL"` |
193194
| OpenCode | `opencode run - --format json --model "MODEL" --variant "VARIANT" < "{AI_STATE_DIR}/consult/question.tmp"` |
@@ -260,7 +261,7 @@ Return a plain JSON object to stdout (no markers or wrappers):
260261
```json
261262
{
262263
"tool": "gemini",
263-
"model": "gemini-3-pro-preview",
264+
"model": "gemini-3-pro",
264265
"effort": "high",
265266
"duration_ms": 12300,
266267
"response": "The AI's response text here...",

0 commit comments

Comments
 (0)