From 00fe691f2fd7e16fdcf265f1ef46217d8b1fe994 Mon Sep 17 00:00:00 2001 From: Avi Fenesh Date: Fri, 20 Feb 2026 16:37:50 +0200 Subject: [PATCH 1/5] feat(consult,debate): update Gemini 3.1 as default for high effort tier (#234) Update the Gemini model default for the `high` effort tier from `gemini-3-pro-preview` to `gemini-3.1-pro-preview` across all consult and debate configuration files. The `max` tier already uses `gemini-3.1-pro-preview` and is unchanged. Updated across 3 platforms (Claude Code plugins, OpenCode adapter, Codex adapter) in skill files, command files, and README. --- README.md | 2 +- adapters/codex/skills/consult/SKILL.md | 4 ++-- adapters/codex/skills/debate/SKILL.md | 2 +- adapters/opencode/commands/consult.md | 4 ++-- adapters/opencode/commands/debate.md | 2 +- adapters/opencode/skills/consult/SKILL.md | 8 ++++---- adapters/opencode/skills/debate/SKILL.md | 4 ++-- plugins/consult/commands/consult.md | 4 ++-- plugins/consult/skills/consult/SKILL.md | 8 ++++---- plugins/debate/commands/debate.md | 2 +- plugins/debate/skills/debate/SKILL.md | 4 ++-- 11 files changed, 22 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index 6aa4a5df..8fb84287 100644 --- a/README.md +++ b/README.md @@ -651,7 +651,7 @@ agent-knowledge/ | Tool | Default Model (high) | Reasoning Control | |------|---------------------|-------------------| | Claude | claude-opus-4-6 | max-turns | -| Gemini | gemini-3-pro-preview | built-in | +| Gemini | gemini-3.1-pro-preview | built-in | | Codex | o3 | model_reasoning_effort | | OpenCode | (user-selected or default) | --variant | | Copilot | (default) | none | diff --git a/adapters/codex/skills/consult/SKILL.md b/adapters/codex/skills/consult/SKILL.md index 271a7874..595ab146 100644 --- a/adapters/codex/skills/consult/SKILL.md +++ b/adapters/codex/skills/consult/SKILL.md @@ -169,7 +169,7 @@ request_user_input: - header: "Model" question: "Which Gemini model?" options: - - label: "gemini-3-pro" description: "Most capable, strong reasoning" + - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" - label: "gemini-3-flash" description: "Fast, 78% SWE-bench" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" @@ -233,7 +233,7 @@ Invoke the `consult` skill directly using the Skill tool: Skill: consult Args: "[question]" --tool=[tool] --effort=[effort] --model=[model] [--context=[context]] [--continue=[session_id]] -Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3-pro +Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro ``` The skill handles the full consultation lifecycle: model resolution, command building, context packaging, execution with 120s timeout, and returns a plain JSON result. diff --git a/adapters/codex/skills/debate/SKILL.md b/adapters/codex/skills/debate/SKILL.md index c4acd101..59d76007 100644 --- a/adapters/codex/skills/debate/SKILL.md +++ b/adapters/codex/skills/debate/SKILL.md @@ -291,7 +291,7 @@ Read the consult skill file to get the exact patterns and replacements. |--------|--------|--------|-------|----------|---------| | low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | | medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3-pro-preview | o3 (high) | default (high) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | | max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | ### Output Parsing diff --git a/adapters/opencode/commands/consult.md b/adapters/opencode/commands/consult.md index 4ec42ac0..5754fccc 100644 --- a/adapters/opencode/commands/consult.md +++ b/adapters/opencode/commands/consult.md @@ -177,7 +177,7 @@ AskUserQuestion: question: "Which Gemini model?" multiSelect: false options: - - label: "gemini-3-pro" description: "Most capable, strong reasoning" + - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" - label: "gemini-3-flash" description: "Fast, 78% SWE-bench" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" @@ -241,7 +241,7 @@ Invoke the `consult` skill directly using the Skill tool: Skill: consult Args: "[question]" --tool=[tool] --effort=[effort] --model=[model] [--context=[context]] [--continue=[session_id]] -Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3-pro +Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro ``` The skill handles the full consultation lifecycle: model resolution, command building, context packaging, execution with 120s timeout, and returns a plain JSON result. diff --git a/adapters/opencode/commands/debate.md b/adapters/opencode/commands/debate.md index 079cbd18..c8a2413d 100644 --- a/adapters/opencode/commands/debate.md +++ b/adapters/opencode/commands/debate.md @@ -295,7 +295,7 @@ Read the consult skill file to get the exact patterns and replacements. |--------|--------|--------|-------|----------|---------| | low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | | medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3-pro-preview | o3 (high) | default (high) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | | max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | ### Output Parsing diff --git a/adapters/opencode/skills/consult/SKILL.md b/adapters/opencode/skills/consult/SKILL.md index c0441e71..8dab2009 100644 --- a/adapters/opencode/skills/consult/SKILL.md +++ b/adapters/opencode/skills/consult/SKILL.md @@ -72,7 +72,7 @@ Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-p |--------|-------| | low | gemini-2.5-flash | | medium | gemini-3-flash-preview | -| high | gemini-3-pro-preview | +| high | gemini-3.1-pro-preview | | max | gemini-3.1-pro-preview | **Parse output**: `JSON.parse(stdout).response` @@ -110,7 +110,7 @@ Session resume: opencode run "QUESTION" --format json --model "MODEL" --variant With thinking: add --thinking flag ``` -Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.2, o3, gemini-3-pro-preview, minimax-m2.1 +Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.2, o3, gemini-3.1-pro-preview, minimax-m2.1 | Effort | Model | Variant | |--------|-------|---------| @@ -277,7 +277,7 @@ Return a plain JSON object to stdout (no markers or wrappers): ```json { "tool": "gemini", - "model": "gemini-3-pro-preview", + "model": "gemini-3.1-pro-preview", "effort": "high", "duration_ms": 12300, "response": "The AI's response text here...", @@ -315,4 +315,4 @@ This skill is invoked by: - `consult-agent` for `/consult` command - Direct invocation: `Skill('consult', '"question" --tool=gemini --effort=high')` -Example: `Skill('consult', '"Is this approach correct?" --tool=gemini --effort=high --model=gemini-3-pro-preview')` +Example: `Skill('consult', '"Is this approach correct?" --tool=gemini --effort=high --model=gemini-3.1-pro-preview')` diff --git a/adapters/opencode/skills/debate/SKILL.md b/adapters/opencode/skills/debate/SKILL.md index 08d762b8..f8594b58 100644 --- a/adapters/opencode/skills/debate/SKILL.md +++ b/adapters/opencode/skills/debate/SKILL.md @@ -222,7 +222,7 @@ Save to `{AI_STATE_DIR}/debate/last-debate.json`: "id": "debate-{ISO timestamp}-{4 char random hex}", "topic": "original topic text", "proposer": {"tool": "claude", "model": "opus"}, - "challenger": {"tool": "gemini", "model": "gemini-3-pro"}, + "challenger": {"tool": "gemini", "model": "gemini-3.1-pro"}, "effort": "high", "rounds_completed": 2, "max_rounds": 2, @@ -279,7 +279,7 @@ Platform state directory: |--------|--------|--------|-------|----------|---------| | low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | | medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3-pro-preview | o3 (high) | default (high) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | | max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | ### Output Parsing diff --git a/plugins/consult/commands/consult.md b/plugins/consult/commands/consult.md index b8402675..cb7f75b1 100644 --- a/plugins/consult/commands/consult.md +++ b/plugins/consult/commands/consult.md @@ -174,7 +174,7 @@ AskUserQuestion: question: "Which Gemini model?" multiSelect: false options: - - label: "gemini-3-pro" description: "Most capable, strong reasoning" + - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" - label: "gemini-3-flash" description: "Fast, 78% SWE-bench" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" @@ -238,7 +238,7 @@ Invoke the `consult` skill directly using the Skill tool: Skill: consult Args: "[question]" --tool=[tool] --effort=[effort] --model=[model] [--context=[context]] [--continue=[session_id]] -Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3-pro +Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro ``` The skill handles the full consultation lifecycle: model resolution, command building, context packaging, execution with 120s timeout, and returns a plain JSON result. diff --git a/plugins/consult/skills/consult/SKILL.md b/plugins/consult/skills/consult/SKILL.md index aa2ebc8b..39cc0c03 100644 --- a/plugins/consult/skills/consult/SKILL.md +++ b/plugins/consult/skills/consult/SKILL.md @@ -66,7 +66,7 @@ Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-p |--------|-------| | low | gemini-2.5-flash | | medium | gemini-3-flash-preview | -| high | gemini-3-pro-preview | +| high | gemini-3.1-pro-preview | | max | gemini-3.1-pro-preview | **Parse output**: `JSON.parse(stdout).response` @@ -104,7 +104,7 @@ Session resume: opencode run "QUESTION" --format json --model "MODEL" --variant With thinking: add --thinking flag ``` -Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.2, o3, gemini-3-pro-preview, minimax-m2.1 +Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.2, o3, gemini-3.1-pro-preview, minimax-m2.1 | Effort | Model | Variant | |--------|-------|---------| @@ -271,7 +271,7 @@ Return a plain JSON object to stdout (no markers or wrappers): ```json { "tool": "gemini", - "model": "gemini-3-pro-preview", + "model": "gemini-3.1-pro-preview", "effort": "high", "duration_ms": 12300, "response": "The AI's response text here...", @@ -309,4 +309,4 @@ This skill is invoked by: - `consult-agent` for `/consult` command - Direct invocation: `Skill('consult', '"question" --tool=gemini --effort=high')` -Example: `Skill('consult', '"Is this approach correct?" --tool=gemini --effort=high --model=gemini-3-pro-preview')` +Example: `Skill('consult', '"Is this approach correct?" --tool=gemini --effort=high --model=gemini-3.1-pro-preview')` diff --git a/plugins/debate/commands/debate.md b/plugins/debate/commands/debate.md index 139c6d28..a8048f1b 100644 --- a/plugins/debate/commands/debate.md +++ b/plugins/debate/commands/debate.md @@ -298,7 +298,7 @@ Read the consult skill file to get the exact patterns and replacements. |--------|--------|--------|-------|----------|---------| | low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | | medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3-pro-preview | o3 (high) | default (high) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | | max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | ### Output Parsing diff --git a/plugins/debate/skills/debate/SKILL.md b/plugins/debate/skills/debate/SKILL.md index 852e118b..077a7735 100644 --- a/plugins/debate/skills/debate/SKILL.md +++ b/plugins/debate/skills/debate/SKILL.md @@ -216,7 +216,7 @@ Save to `{AI_STATE_DIR}/debate/last-debate.json`: "id": "debate-{ISO timestamp}-{4 char random hex}", "topic": "original topic text", "proposer": {"tool": "claude", "model": "opus"}, - "challenger": {"tool": "gemini", "model": "gemini-3-pro"}, + "challenger": {"tool": "gemini", "model": "gemini-3.1-pro"}, "effort": "high", "rounds_completed": 2, "max_rounds": 2, @@ -273,7 +273,7 @@ Platform state directory: |--------|--------|--------|-------|----------|---------| | low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | | medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3-pro-preview | o3 (high) | default (high) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | | max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | ### Output Parsing From 36915a5114b2d3cc4ca5f891e93afebd25af59ae Mon Sep 17 00:00:00 2001 From: Avi Fenesh Date: Fri, 20 Feb 2026 16:46:00 +0200 Subject: [PATCH 2/5] fix(consult,debate): address review findings from gemini-3.1 update - Update Copilot picker labels from gemini-3-pro to gemini-3.1-pro in plugins/consult/commands/consult.md, adapters/opencode/commands/consult.md, adapters/codex/skills/consult/SKILL.md - Add gemini-3.1-pro-preview to expectedModels assertion in debate-command.test.js to catch regressions - Add gemini high-effort model assertion in debate-command.test.js for consult skill adapter sync - Update docs/consult-command-test-strategy.md stale model references --- __tests__/debate-command.test.js | 10 +++++++++- adapters/codex/skills/consult/SKILL.md | 2 +- adapters/opencode/commands/consult.md | 2 +- docs/consult-command-test-strategy.md | 20 ++++++++++---------- plugins/consult/commands/consult.md | 2 +- 5 files changed, 22 insertions(+), 14 deletions(-) diff --git a/__tests__/debate-command.test.js b/__tests__/debate-command.test.js index 5a99771a..be245c0b 100644 --- a/__tests__/debate-command.test.js +++ b/__tests__/debate-command.test.js @@ -696,7 +696,7 @@ describe('external tool quick reference (#232)', () => { }); test('current model names present in effort-to-model mapping of each skill copy', () => { - const expectedModels = ['claude-haiku-4-5', 'claude-sonnet-4-6', 'claude-opus-4-6', 'o4-mini', 'o3', 'gemini-2.5-flash']; + const expectedModels = ['claude-haiku-4-5', 'claude-sonnet-4-6', 'claude-opus-4-6', 'o4-mini', 'o3', 'gemini-2.5-flash', 'gemini-3.1-pro-preview']; for (const content of allDebateSkillContents()) { for (const model of expectedModels) { expect(content).toMatch(new RegExp(`Effort-to-Model Mapping[\\s\\S]*${model}`)); @@ -734,4 +734,12 @@ describe('consult skill opencode adapter sync (#232)', () => { expect(consultSkillContent).toContain('o4-mini'); expect(consultSkillContent).toContain('o3'); }); + + test('consult skill uses gemini-3.1-pro-preview as high-effort Gemini default (#234)', () => { + expect(consultSkillContent).toContain('gemini-3.1-pro-preview'); + expect(openCodeConsultSkillContent).toContain('gemini-3.1-pro-preview'); + // Ensure old model is not used as a default (it may still appear in the models list) + expect(consultSkillContent).not.toMatch(/\|\s*high\s*\|\s*gemini-3-pro-preview/); + expect(openCodeConsultSkillContent).not.toMatch(/\|\s*high\s*\|\s*gemini-3-pro-preview/); + }); }); diff --git a/adapters/codex/skills/consult/SKILL.md b/adapters/codex/skills/consult/SKILL.md index 595ab146..a9e6370a 100644 --- a/adapters/codex/skills/consult/SKILL.md +++ b/adapters/codex/skills/consult/SKILL.md @@ -214,7 +214,7 @@ request_user_input: - label: "claude-sonnet-4-5" description: "Default Copilot model" - label: "claude-opus-4-6" description: "Most capable Claude model" - label: "gpt-5.3-codex" description: "OpenAI GPT-5.3 Codex" - - label: "gemini-3-pro" description: "Google Gemini 3 Pro" + - label: "gemini-3.1-pro" description: "Google Gemini 3.1 Pro" ``` Map the user's choice to the model string (strip " (Recommended)" suffix if present). diff --git a/adapters/opencode/commands/consult.md b/adapters/opencode/commands/consult.md index 5754fccc..3f86b395 100644 --- a/adapters/opencode/commands/consult.md +++ b/adapters/opencode/commands/consult.md @@ -222,7 +222,7 @@ AskUserQuestion: - label: "claude-sonnet-4-5" description: "Default Copilot model" - label: "claude-opus-4-6" description: "Most capable Claude model" - label: "gpt-5.3-codex" description: "OpenAI GPT-5.3 Codex" - - label: "gemini-3-pro" description: "Google Gemini 3 Pro" + - label: "gemini-3.1-pro" description: "Google Gemini 3.1 Pro" ``` Map the user's choice to the model string (strip " (Recommended)" suffix if present). diff --git a/docs/consult-command-test-strategy.md b/docs/consult-command-test-strategy.md index 760b9a86..97bc2053 100644 --- a/docs/consult-command-test-strategy.md +++ b/docs/consult-command-test-strategy.md @@ -173,8 +173,8 @@ describe('Model Selection', () => { it('should map effort levels correctly', () => { expect(getGeminiModel('low')).toBe('gemini-2.5-flash'); expect(getGeminiModel('medium')).toBe('gemini-3-flash'); - expect(getGeminiModel('high')).toBe('gemini-3-pro'); - expect(getGeminiModel('max')).toBe('gemini-3-pro'); + expect(getGeminiModel('high')).toBe('gemini-3.1-pro'); + expect(getGeminiModel('max')).toBe('gemini-3.1-pro'); }); }); @@ -244,7 +244,7 @@ describe('Session Management', () => { it('should include question in saved session', () => { const session = { tool: 'gemini', - model: 'gemini-3-pro', + model: 'gemini-3.1-pro', effort: 'medium', session_id: 'xyz-789', timestamp: new Date().toISOString(), @@ -458,7 +458,7 @@ describe('Session Continuation', () => { it('should restore tool from saved session', () => { const session = { tool: 'gemini', - model: 'gemini-3-pro', + model: 'gemini-3.1-pro', effort: 'medium', session_id: 'session-456', timestamp: new Date().toISOString(), @@ -672,18 +672,18 @@ describe('Command Building', () => { describe('Gemini Command', () => { it('should build basic command', () => { - const { command, flags } = buildGeminiCommand('question', 'gemini-3-pro'); + const { command, flags } = buildGeminiCommand('question', 'gemini-3.1-pro'); expect(command).toBe('gemini'); expect(flags).toContain('-p'); expect(flags).toContain('"question"'); expect(flags).toContain('--output-format'); expect(flags).toContain('json'); expect(flags).toContain('-m'); - expect(flags).toContain('gemini-3-pro'); + expect(flags).toContain('gemini-3.1-pro'); }); it('should append session resume for continuation', () => { - const { flags } = buildGeminiCommand('question', 'gemini-3-pro', 'session-456', true); + const { flags } = buildGeminiCommand('question', 'gemini-3.1-pro', 'session-456', true); expect(flags).toContain('--resume'); expect(flags).toContain('session-456'); }); @@ -939,7 +939,7 @@ describe('Full Consultation Flow', () => { jest.spyOn(fs, 'readFileSync').mockReturnValueOnce(JSON.stringify({ tool: 'gemini', session_id: 'session-456', - model: 'gemini-3-pro', + model: 'gemini-3.1-pro', effort: 'medium', timestamp: new Date().toISOString(), question: 'continue', @@ -1139,7 +1139,7 @@ describe('Mocked Tool Outputs', () => { const mockGeminiOutput = `=== CONSULT_RESULT === { "tool": "gemini", - "model": "gemini-3-pro", + "model": "gemini-3.1-pro", "effort": "medium", "duration_ms": 23400, "response": "Based on my analysis, the approach seems sound but could benefit from error handling for edge cases.", @@ -1175,7 +1175,7 @@ describe('Mocked Tool Outputs', () => { it('should parse structured output correctly', () => { const result = parseMockOutput(mockGeminiOutput, 'gemini'); expect(result.tool).toBe('gemini'); - expect(result.model).toBe('gemini-3-pro'); + expect(result.model).toBe('gemini-3.1-pro'); expect(result.duration_ms).toBe(23400); expect(result.session_id).toBe('session-xyz-789'); }); diff --git a/plugins/consult/commands/consult.md b/plugins/consult/commands/consult.md index cb7f75b1..5d017ddb 100644 --- a/plugins/consult/commands/consult.md +++ b/plugins/consult/commands/consult.md @@ -219,7 +219,7 @@ AskUserQuestion: - label: "claude-sonnet-4-5" description: "Default Copilot model" - label: "claude-opus-4-6" description: "Most capable Claude model" - label: "gpt-5.3-codex" description: "OpenAI GPT-5.3 Codex" - - label: "gemini-3-pro" description: "Google Gemini 3 Pro" + - label: "gemini-3.1-pro" description: "Google Gemini 3.1 Pro" ``` Map the user's choice to the model string (strip " (Recommended)" suffix if present). From 0668724d13ad37242f86213dbfb5335c03cd7362 Mon Sep 17 00:00:00 2001 From: Avi Fenesh Date: Fri, 20 Feb 2026 16:56:40 +0200 Subject: [PATCH 3/5] fix(consult,debate): update stale Codex and Gemini low-tier model defaults - Codex: replace o4-mini/o3 with gpt-5.3-codex across all effort tiers in consult and debate skill files, command files, and adapters - Gemini low tier: replace gemini-2.5-flash with gemini-3-flash-preview (now consistent: low=gemini-3-flash-preview, medium=gemini-3-flash-preview, high/max=gemini-3.1-pro-preview) - Update model picker label for Gemini flash in consult command files - Update README, top picks, and test strategy doc - Fix debate-command.test.js expectedModels and consult adapter sync assertions to reflect current model names (remove o4-mini/o3/gemini-2.5-flash, add gpt-5.3-codex/gemini-3-flash-preview/gemini-3.1-pro-preview) --- README.md | 2 +- __tests__/debate-command.test.js | 17 ++++++++--------- adapters/codex/skills/consult/SKILL.md | 2 +- adapters/codex/skills/debate/SKILL.md | 8 ++++---- adapters/opencode/commands/consult.md | 2 +- adapters/opencode/commands/debate.md | 8 ++++---- adapters/opencode/skills/consult/SKILL.md | 14 +++++++------- adapters/opencode/skills/debate/SKILL.md | 8 ++++---- docs/consult-command-test-strategy.md | 4 ++-- plugins/consult/commands/consult.md | 2 +- plugins/consult/skills/consult/SKILL.md | 14 +++++++------- plugins/debate/commands/debate.md | 8 ++++---- plugins/debate/skills/debate/SKILL.md | 8 ++++---- 13 files changed, 48 insertions(+), 49 deletions(-) diff --git a/README.md b/README.md index 8fb84287..32395c5e 100644 --- a/README.md +++ b/README.md @@ -652,7 +652,7 @@ agent-knowledge/ |------|---------------------|-------------------| | Claude | claude-opus-4-6 | max-turns | | Gemini | gemini-3.1-pro-preview | built-in | -| Codex | o3 | model_reasoning_effort | +| Codex | gpt-5.3-codex | model_reasoning_effort | | OpenCode | (user-selected or default) | --variant | | Copilot | (default) | none | diff --git a/__tests__/debate-command.test.js b/__tests__/debate-command.test.js index be245c0b..4a3b5554 100644 --- a/__tests__/debate-command.test.js +++ b/__tests__/debate-command.test.js @@ -696,7 +696,7 @@ describe('external tool quick reference (#232)', () => { }); test('current model names present in effort-to-model mapping of each skill copy', () => { - const expectedModels = ['claude-haiku-4-5', 'claude-sonnet-4-6', 'claude-opus-4-6', 'o4-mini', 'o3', 'gemini-2.5-flash', 'gemini-3.1-pro-preview']; + const expectedModels = ['claude-haiku-4-5', 'claude-sonnet-4-6', 'claude-opus-4-6', 'gpt-5.3-codex', 'gemini-3-flash-preview', 'gemini-3.1-pro-preview']; for (const content of allDebateSkillContents()) { for (const model of expectedModels) { expect(content).toMatch(new RegExp(`Effort-to-Model Mapping[\\s\\S]*${model}`)); @@ -719,20 +719,19 @@ describe('consult skill opencode adapter sync (#232)', () => { expect(openCodeConsultSkillContent).toContain('claude-opus-4-6'); }); - test('opencode consult adapter has updated codex model names (no speculative gpt-5.x)', () => { - expect(openCodeConsultSkillContent).not.toContain('gpt-5.3-codex'); - expect(openCodeConsultSkillContent).not.toContain('gpt-5.2-codex'); - expect(openCodeConsultSkillContent).toContain('o4-mini'); - expect(openCodeConsultSkillContent).toContain('o3'); + test('opencode consult adapter has updated codex model names', () => { + expect(openCodeConsultSkillContent).toContain('gpt-5.3-codex'); + expect(openCodeConsultSkillContent).not.toContain('o4-mini'); + expect(openCodeConsultSkillContent).not.toMatch(/\|\s*(?:low|medium|high|max)\s*\|\s*o3\s*\|/); }); test('canonical consult skill has updated model names', () => { expect(consultSkillContent).toContain('claude-haiku-4-5'); expect(consultSkillContent).toContain('claude-sonnet-4-6'); expect(consultSkillContent).toContain('claude-opus-4-6'); - expect(consultSkillContent).not.toContain('gpt-5.3-codex'); - expect(consultSkillContent).toContain('o4-mini'); - expect(consultSkillContent).toContain('o3'); + expect(consultSkillContent).toContain('gpt-5.3-codex'); + expect(consultSkillContent).not.toContain('o4-mini'); + expect(consultSkillContent).not.toMatch(/\|\s*(?:low|medium|high|max)\s*\|\s*o3\s*\|/); }); test('consult skill uses gemini-3.1-pro-preview as high-effort Gemini default (#234)', () => { diff --git a/adapters/codex/skills/consult/SKILL.md b/adapters/codex/skills/consult/SKILL.md index a9e6370a..a2c56010 100644 --- a/adapters/codex/skills/consult/SKILL.md +++ b/adapters/codex/skills/consult/SKILL.md @@ -170,7 +170,7 @@ request_user_input: question: "Which Gemini model?" options: - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" - - label: "gemini-3-flash" description: "Fast, 78% SWE-bench" + - label: "gemini-3-flash-preview" description: "Fast, efficient coding" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" ``` diff --git a/adapters/codex/skills/debate/SKILL.md b/adapters/codex/skills/debate/SKILL.md index 59d76007..3b52ef64 100644 --- a/adapters/codex/skills/debate/SKILL.md +++ b/adapters/codex/skills/debate/SKILL.md @@ -289,10 +289,10 @@ Read the consult skill file to get the exact patterns and replacements. | Effort | Claude | Gemini | Codex | OpenCode | Copilot | |--------|--------|--------|-------|----------|---------| -| low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | -| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | -| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | +| low | claude-haiku-4-5 (1 turn) | gemini-3-flash-preview | gpt-5.3-codex (low) | default (low) | no control | +| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | gpt-5.3-codex (medium) | default (medium) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default (high) | no control | +| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default + --thinking | no control | ### Output Parsing diff --git a/adapters/opencode/commands/consult.md b/adapters/opencode/commands/consult.md index 3f86b395..bb147f97 100644 --- a/adapters/opencode/commands/consult.md +++ b/adapters/opencode/commands/consult.md @@ -178,7 +178,7 @@ AskUserQuestion: multiSelect: false options: - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" - - label: "gemini-3-flash" description: "Fast, 78% SWE-bench" + - label: "gemini-3-flash-preview" description: "Fast, efficient coding" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" ``` diff --git a/adapters/opencode/commands/debate.md b/adapters/opencode/commands/debate.md index c8a2413d..ae1e31c8 100644 --- a/adapters/opencode/commands/debate.md +++ b/adapters/opencode/commands/debate.md @@ -293,10 +293,10 @@ Read the consult skill file to get the exact patterns and replacements. | Effort | Claude | Gemini | Codex | OpenCode | Copilot | |--------|--------|--------|-------|----------|---------| -| low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | -| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | -| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | +| low | claude-haiku-4-5 (1 turn) | gemini-3-flash-preview | gpt-5.3-codex (low) | default (low) | no control | +| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | gpt-5.3-codex (medium) | default (medium) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default (high) | no control | +| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default + --thinking | no control | ### Output Parsing diff --git a/adapters/opencode/skills/consult/SKILL.md b/adapters/opencode/skills/consult/SKILL.md index 8dab2009..dd9dc5d4 100644 --- a/adapters/opencode/skills/consult/SKILL.md +++ b/adapters/opencode/skills/consult/SKILL.md @@ -70,7 +70,7 @@ Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-p | Effort | Model | |--------|-------| -| low | gemini-2.5-flash | +| low | gemini-3-flash-preview | | medium | gemini-3-flash-preview | | high | gemini-3.1-pro-preview | | max | gemini-3.1-pro-preview | @@ -89,14 +89,14 @@ Session resume (latest): codex exec resume --last "QUESTION" --json Note: `codex exec` is the non-interactive/headless mode. There is no `-q` flag. The TUI mode is `codex` (no subcommand). -Models: o4-mini, o3 +Models: gpt-5.3-codex | Effort | Model | Reasoning | |--------|-------|-----------| -| low | o4-mini | low | -| medium | o4-mini | medium | -| high | o3 | high | -| max | o3 | high | +| low | gpt-5.3-codex | low | +| medium | gpt-5.3-codex | medium | +| high | gpt-5.3-codex | high | +| max | gpt-5.3-codex | high | **Parse output**: `JSON.parse(stdout).message` or raw text **Session ID**: Codex prints a resume hint at session end (e.g., `codex resume SESSION_ID`). Extract the session ID from stdout or from `JSON.parse(stdout).session_id` if available. @@ -110,7 +110,7 @@ Session resume: opencode run "QUESTION" --format json --model "MODEL" --variant With thinking: add --thinking flag ``` -Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.2, o3, gemini-3.1-pro-preview, minimax-m2.1 +Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.3-codex, gemini-3.1-pro-preview, minimax-m2.1 | Effort | Model | Variant | |--------|-------|---------| diff --git a/adapters/opencode/skills/debate/SKILL.md b/adapters/opencode/skills/debate/SKILL.md index f8594b58..a40c9d71 100644 --- a/adapters/opencode/skills/debate/SKILL.md +++ b/adapters/opencode/skills/debate/SKILL.md @@ -277,10 +277,10 @@ Platform state directory: | Effort | Claude | Gemini | Codex | OpenCode | Copilot | |--------|--------|--------|-------|----------|---------| -| low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | -| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | -| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | +| low | claude-haiku-4-5 (1 turn) | gemini-3-flash-preview | gpt-5.3-codex (low) | default (low) | no control | +| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | gpt-5.3-codex (medium) | default (medium) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default (high) | no control | +| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default + --thinking | no control | ### Output Parsing diff --git a/docs/consult-command-test-strategy.md b/docs/consult-command-test-strategy.md index 97bc2053..4139c99f 100644 --- a/docs/consult-command-test-strategy.md +++ b/docs/consult-command-test-strategy.md @@ -171,8 +171,8 @@ describe('Model Selection', () => { describe('Gemini models', () => { it('should map effort levels correctly', () => { - expect(getGeminiModel('low')).toBe('gemini-2.5-flash'); - expect(getGeminiModel('medium')).toBe('gemini-3-flash'); + expect(getGeminiModel('low')).toBe('gemini-3-flash-preview'); + expect(getGeminiModel('medium')).toBe('gemini-3-flash-preview'); expect(getGeminiModel('high')).toBe('gemini-3.1-pro'); expect(getGeminiModel('max')).toBe('gemini-3.1-pro'); }); diff --git a/plugins/consult/commands/consult.md b/plugins/consult/commands/consult.md index 5d017ddb..5dcc86cd 100644 --- a/plugins/consult/commands/consult.md +++ b/plugins/consult/commands/consult.md @@ -175,7 +175,7 @@ AskUserQuestion: multiSelect: false options: - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" - - label: "gemini-3-flash" description: "Fast, 78% SWE-bench" + - label: "gemini-3-flash-preview" description: "Fast, efficient coding" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" ``` diff --git a/plugins/consult/skills/consult/SKILL.md b/plugins/consult/skills/consult/SKILL.md index 39cc0c03..3b8beb41 100644 --- a/plugins/consult/skills/consult/SKILL.md +++ b/plugins/consult/skills/consult/SKILL.md @@ -64,7 +64,7 @@ Models: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3-pro-p | Effort | Model | |--------|-------| -| low | gemini-2.5-flash | +| low | gemini-3-flash-preview | | medium | gemini-3-flash-preview | | high | gemini-3.1-pro-preview | | max | gemini-3.1-pro-preview | @@ -83,14 +83,14 @@ Session resume (latest): codex exec resume --last "QUESTION" --json Note: `codex exec` is the non-interactive/headless mode. There is no `-q` flag. The TUI mode is `codex` (no subcommand). -Models: o4-mini, o3 +Models: gpt-5.3-codex | Effort | Model | Reasoning | |--------|-------|-----------| -| low | o4-mini | low | -| medium | o4-mini | medium | -| high | o3 | high | -| max | o3 | high | +| low | gpt-5.3-codex | low | +| medium | gpt-5.3-codex | medium | +| high | gpt-5.3-codex | high | +| max | gpt-5.3-codex | high | **Parse output**: `JSON.parse(stdout).message` or raw text **Session ID**: Codex prints a resume hint at session end (e.g., `codex resume SESSION_ID`). Extract the session ID from stdout or from `JSON.parse(stdout).session_id` if available. @@ -104,7 +104,7 @@ Session resume: opencode run "QUESTION" --format json --model "MODEL" --variant With thinking: add --thinking flag ``` -Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.2, o3, gemini-3.1-pro-preview, minimax-m2.1 +Models: 75+ via providers (format: provider/model). Top picks: claude-sonnet-4-6, claude-opus-4-6, gpt-5.3-codex, gemini-3.1-pro-preview, minimax-m2.1 | Effort | Model | Variant | |--------|-------|---------| diff --git a/plugins/debate/commands/debate.md b/plugins/debate/commands/debate.md index a8048f1b..cf7d120a 100644 --- a/plugins/debate/commands/debate.md +++ b/plugins/debate/commands/debate.md @@ -296,10 +296,10 @@ Read the consult skill file to get the exact patterns and replacements. | Effort | Claude | Gemini | Codex | OpenCode | Copilot | |--------|--------|--------|-------|----------|---------| -| low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | -| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | -| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | +| low | claude-haiku-4-5 (1 turn) | gemini-3-flash-preview | gpt-5.3-codex (low) | default (low) | no control | +| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | gpt-5.3-codex (medium) | default (medium) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default (high) | no control | +| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default + --thinking | no control | ### Output Parsing diff --git a/plugins/debate/skills/debate/SKILL.md b/plugins/debate/skills/debate/SKILL.md index 077a7735..1f595182 100644 --- a/plugins/debate/skills/debate/SKILL.md +++ b/plugins/debate/skills/debate/SKILL.md @@ -271,10 +271,10 @@ Platform state directory: | Effort | Claude | Gemini | Codex | OpenCode | Copilot | |--------|--------|--------|-------|----------|---------| -| low | claude-haiku-4-5 (1 turn) | gemini-2.5-flash | o4-mini (low) | default (low) | no control | -| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | o4-mini (medium) | default (medium) | no control | -| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | o3 (high) | default (high) | no control | -| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | o3 (high) | default + --thinking | no control | +| low | claude-haiku-4-5 (1 turn) | gemini-3-flash-preview | gpt-5.3-codex (low) | default (low) | no control | +| medium | claude-sonnet-4-6 (3 turns) | gemini-3-flash-preview | gpt-5.3-codex (medium) | default (medium) | no control | +| high | claude-opus-4-6 (5 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default (high) | no control | +| max | claude-opus-4-6 (10 turns) | gemini-3.1-pro-preview | gpt-5.3-codex (high) | default + --thinking | no control | ### Output Parsing From 9b009adb2a35a5983b6291c25b07d037fc9a972c Mon Sep 17 00:00:00 2001 From: Avi Fenesh Date: Fri, 20 Feb 2026 17:03:53 +0200 Subject: [PATCH 4/5] fix(consult,debate): use full gemini-3.1-pro-preview API name consistently - Update picker labels and example invocations from 'gemini-3.1-pro' to 'gemini-3.1-pro-preview' to match the effort table API model name (plugins/consult/commands, adapters/opencode/commands/consult, adapters/codex/skills/consult) - Fix debate state-schema JSON examples in plugins/debate/skills and adapters/opencode/skills/debate to use 'gemini-3.1-pro-preview' - Update docs/consult-command-test-strategy.md to use full preview name - Strengthen test regression guard to cover both high and max rows --- __tests__/debate-command.test.js | 6 +++--- adapters/codex/skills/consult/SKILL.md | 6 +++--- adapters/opencode/commands/consult.md | 6 +++--- adapters/opencode/skills/debate/SKILL.md | 2 +- docs/consult-command-test-strategy.md | 20 ++++++++++---------- plugins/consult/commands/consult.md | 6 +++--- plugins/debate/skills/debate/SKILL.md | 2 +- 7 files changed, 24 insertions(+), 24 deletions(-) diff --git a/__tests__/debate-command.test.js b/__tests__/debate-command.test.js index 4a3b5554..a47456c4 100644 --- a/__tests__/debate-command.test.js +++ b/__tests__/debate-command.test.js @@ -737,8 +737,8 @@ describe('consult skill opencode adapter sync (#232)', () => { test('consult skill uses gemini-3.1-pro-preview as high-effort Gemini default (#234)', () => { expect(consultSkillContent).toContain('gemini-3.1-pro-preview'); expect(openCodeConsultSkillContent).toContain('gemini-3.1-pro-preview'); - // Ensure old model is not used as a default (it may still appear in the models list) - expect(consultSkillContent).not.toMatch(/\|\s*high\s*\|\s*gemini-3-pro-preview/); - expect(openCodeConsultSkillContent).not.toMatch(/\|\s*high\s*\|\s*gemini-3-pro-preview/); + // Ensure old model is not used as high/max default (may still appear in the models list) + expect(consultSkillContent).not.toMatch(/\|\s*(?:high|max)\s*\|\s*gemini-3-pro-preview/); + expect(openCodeConsultSkillContent).not.toMatch(/\|\s*(?:high|max)\s*\|\s*gemini-3-pro-preview/); }); }); diff --git a/adapters/codex/skills/consult/SKILL.md b/adapters/codex/skills/consult/SKILL.md index a2c56010..e8b4073f 100644 --- a/adapters/codex/skills/consult/SKILL.md +++ b/adapters/codex/skills/consult/SKILL.md @@ -169,7 +169,7 @@ request_user_input: - header: "Model" question: "Which Gemini model?" options: - - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" + - label: "gemini-3.1-pro-preview" description: "Most capable, strong reasoning" - label: "gemini-3-flash-preview" description: "Fast, efficient coding" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" @@ -214,7 +214,7 @@ request_user_input: - label: "claude-sonnet-4-5" description: "Default Copilot model" - label: "claude-opus-4-6" description: "Most capable Claude model" - label: "gpt-5.3-codex" description: "OpenAI GPT-5.3 Codex" - - label: "gemini-3.1-pro" description: "Google Gemini 3.1 Pro" + - label: "gemini-3.1-pro-preview" description: "Google Gemini 3.1 Pro" ``` Map the user's choice to the model string (strip " (Recommended)" suffix if present). @@ -233,7 +233,7 @@ Invoke the `consult` skill directly using the Skill tool: Skill: consult Args: "[question]" --tool=[tool] --effort=[effort] --model=[model] [--context=[context]] [--continue=[session_id]] -Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro +Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro-preview ``` The skill handles the full consultation lifecycle: model resolution, command building, context packaging, execution with 120s timeout, and returns a plain JSON result. diff --git a/adapters/opencode/commands/consult.md b/adapters/opencode/commands/consult.md index bb147f97..f6268bda 100644 --- a/adapters/opencode/commands/consult.md +++ b/adapters/opencode/commands/consult.md @@ -177,7 +177,7 @@ AskUserQuestion: question: "Which Gemini model?" multiSelect: false options: - - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" + - label: "gemini-3.1-pro-preview" description: "Most capable, strong reasoning" - label: "gemini-3-flash-preview" description: "Fast, efficient coding" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" @@ -222,7 +222,7 @@ AskUserQuestion: - label: "claude-sonnet-4-5" description: "Default Copilot model" - label: "claude-opus-4-6" description: "Most capable Claude model" - label: "gpt-5.3-codex" description: "OpenAI GPT-5.3 Codex" - - label: "gemini-3.1-pro" description: "Google Gemini 3.1 Pro" + - label: "gemini-3.1-pro-preview" description: "Google Gemini 3.1 Pro" ``` Map the user's choice to the model string (strip " (Recommended)" suffix if present). @@ -241,7 +241,7 @@ Invoke the `consult` skill directly using the Skill tool: Skill: consult Args: "[question]" --tool=[tool] --effort=[effort] --model=[model] [--context=[context]] [--continue=[session_id]] -Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro +Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro-preview ``` The skill handles the full consultation lifecycle: model resolution, command building, context packaging, execution with 120s timeout, and returns a plain JSON result. diff --git a/adapters/opencode/skills/debate/SKILL.md b/adapters/opencode/skills/debate/SKILL.md index a40c9d71..95c7d11d 100644 --- a/adapters/opencode/skills/debate/SKILL.md +++ b/adapters/opencode/skills/debate/SKILL.md @@ -222,7 +222,7 @@ Save to `{AI_STATE_DIR}/debate/last-debate.json`: "id": "debate-{ISO timestamp}-{4 char random hex}", "topic": "original topic text", "proposer": {"tool": "claude", "model": "opus"}, - "challenger": {"tool": "gemini", "model": "gemini-3.1-pro"}, + "challenger": {"tool": "gemini", "model": "gemini-3.1-pro-preview"}, "effort": "high", "rounds_completed": 2, "max_rounds": 2, diff --git a/docs/consult-command-test-strategy.md b/docs/consult-command-test-strategy.md index 4139c99f..afc75ed5 100644 --- a/docs/consult-command-test-strategy.md +++ b/docs/consult-command-test-strategy.md @@ -173,8 +173,8 @@ describe('Model Selection', () => { it('should map effort levels correctly', () => { expect(getGeminiModel('low')).toBe('gemini-3-flash-preview'); expect(getGeminiModel('medium')).toBe('gemini-3-flash-preview'); - expect(getGeminiModel('high')).toBe('gemini-3.1-pro'); - expect(getGeminiModel('max')).toBe('gemini-3.1-pro'); + expect(getGeminiModel('high')).toBe('gemini-3.1-pro-preview'); + expect(getGeminiModel('max')).toBe('gemini-3.1-pro-preview'); }); }); @@ -244,7 +244,7 @@ describe('Session Management', () => { it('should include question in saved session', () => { const session = { tool: 'gemini', - model: 'gemini-3.1-pro', + model: 'gemini-3.1-pro-preview', effort: 'medium', session_id: 'xyz-789', timestamp: new Date().toISOString(), @@ -458,7 +458,7 @@ describe('Session Continuation', () => { it('should restore tool from saved session', () => { const session = { tool: 'gemini', - model: 'gemini-3.1-pro', + model: 'gemini-3.1-pro-preview', effort: 'medium', session_id: 'session-456', timestamp: new Date().toISOString(), @@ -672,18 +672,18 @@ describe('Command Building', () => { describe('Gemini Command', () => { it('should build basic command', () => { - const { command, flags } = buildGeminiCommand('question', 'gemini-3.1-pro'); + const { command, flags } = buildGeminiCommand('question', 'gemini-3.1-pro-preview'); expect(command).toBe('gemini'); expect(flags).toContain('-p'); expect(flags).toContain('"question"'); expect(flags).toContain('--output-format'); expect(flags).toContain('json'); expect(flags).toContain('-m'); - expect(flags).toContain('gemini-3.1-pro'); + expect(flags).toContain('gemini-3.1-pro-preview'); }); it('should append session resume for continuation', () => { - const { flags } = buildGeminiCommand('question', 'gemini-3.1-pro', 'session-456', true); + const { flags } = buildGeminiCommand('question', 'gemini-3.1-pro-preview', 'session-456', true); expect(flags).toContain('--resume'); expect(flags).toContain('session-456'); }); @@ -939,7 +939,7 @@ describe('Full Consultation Flow', () => { jest.spyOn(fs, 'readFileSync').mockReturnValueOnce(JSON.stringify({ tool: 'gemini', session_id: 'session-456', - model: 'gemini-3.1-pro', + model: 'gemini-3.1-pro-preview', effort: 'medium', timestamp: new Date().toISOString(), question: 'continue', @@ -1139,7 +1139,7 @@ describe('Mocked Tool Outputs', () => { const mockGeminiOutput = `=== CONSULT_RESULT === { "tool": "gemini", - "model": "gemini-3.1-pro", + "model": "gemini-3.1-pro-preview", "effort": "medium", "duration_ms": 23400, "response": "Based on my analysis, the approach seems sound but could benefit from error handling for edge cases.", @@ -1175,7 +1175,7 @@ describe('Mocked Tool Outputs', () => { it('should parse structured output correctly', () => { const result = parseMockOutput(mockGeminiOutput, 'gemini'); expect(result.tool).toBe('gemini'); - expect(result.model).toBe('gemini-3.1-pro'); + expect(result.model).toBe('gemini-3.1-pro-preview'); expect(result.duration_ms).toBe(23400); expect(result.session_id).toBe('session-xyz-789'); }); diff --git a/plugins/consult/commands/consult.md b/plugins/consult/commands/consult.md index 5dcc86cd..9585b6e5 100644 --- a/plugins/consult/commands/consult.md +++ b/plugins/consult/commands/consult.md @@ -174,7 +174,7 @@ AskUserQuestion: question: "Which Gemini model?" multiSelect: false options: - - label: "gemini-3.1-pro" description: "Most capable, strong reasoning" + - label: "gemini-3.1-pro-preview" description: "Most capable, strong reasoning" - label: "gemini-3-flash-preview" description: "Fast, efficient coding" - label: "gemini-2.5-pro" description: "Previous gen pro model" - label: "gemini-2.5-flash" description: "Previous gen flash model" @@ -219,7 +219,7 @@ AskUserQuestion: - label: "claude-sonnet-4-5" description: "Default Copilot model" - label: "claude-opus-4-6" description: "Most capable Claude model" - label: "gpt-5.3-codex" description: "OpenAI GPT-5.3 Codex" - - label: "gemini-3.1-pro" description: "Google Gemini 3.1 Pro" + - label: "gemini-3.1-pro-preview" description: "Google Gemini 3.1 Pro" ``` Map the user's choice to the model string (strip " (Recommended)" suffix if present). @@ -238,7 +238,7 @@ Invoke the `consult` skill directly using the Skill tool: Skill: consult Args: "[question]" --tool=[tool] --effort=[effort] --model=[model] [--context=[context]] [--continue=[session_id]] -Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro +Example: "Is this the right approach?" --tool=gemini --effort=high --model=gemini-3.1-pro-preview ``` The skill handles the full consultation lifecycle: model resolution, command building, context packaging, execution with 120s timeout, and returns a plain JSON result. diff --git a/plugins/debate/skills/debate/SKILL.md b/plugins/debate/skills/debate/SKILL.md index 1f595182..d6cf1fd8 100644 --- a/plugins/debate/skills/debate/SKILL.md +++ b/plugins/debate/skills/debate/SKILL.md @@ -216,7 +216,7 @@ Save to `{AI_STATE_DIR}/debate/last-debate.json`: "id": "debate-{ISO timestamp}-{4 char random hex}", "topic": "original topic text", "proposer": {"tool": "claude", "model": "opus"}, - "challenger": {"tool": "gemini", "model": "gemini-3.1-pro"}, + "challenger": {"tool": "gemini", "model": "gemini-3.1-pro-preview"}, "effort": "high", "rounds_completed": 2, "max_rounds": 2, From c71fde10bd88e1c3cd87ede9f80ba61729e53fbb Mon Sep 17 00:00:00 2001 From: Avi Fenesh Date: Fri, 20 Feb 2026 17:08:03 +0200 Subject: [PATCH 5/5] docs: add CHANGELOG entry for gemini-3.1 model defaults update (#234) --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 321d0cad..d55dd081 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,6 +21,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`/debate` External Tool Quick Reference** — Added a "External Tool Quick Reference" section to all copies of the debate skill (`plugins/debate/skills/debate/SKILL.md`, OpenCode and Codex adapters) with safe command patterns, effort-to-model mapping tables, and output parsing expressions. The section includes a canonical-source pointer to `plugins/consult/skills/consult/SKILL.md` so the debate orchestrator doesn't duplicate provider logic. Added pointer notes in `debate-orchestrator` agents. Fixes issue #232. +- **`/consult` and `/debate` model defaults update** — Gemini high/max effort now uses `gemini-3.1-pro-preview`; Gemini low/medium uses `gemini-3-flash-preview`. Codex uses `gpt-5.3-codex` for all effort tiers. Updated across all platforms: Claude Code plugin, OpenCode adapter, and Codex adapter for both consult and debate skills and commands. Fixes issue #234. + - **`/consult` model name updates** — Updated stale model names in the consult skill: Codex models are now `o4-mini` (low/medium) and `o3` (high/max); Gemini models include `gemini-3-flash-preview`, `gemini-3-pro-preview`, and `gemini-3.1-pro-preview`. Synced to OpenCode adapter consult skill. Fixes issue #232. - **`/next-task` Phase 12 ship invocation** — Phase 12 now invokes `ship:ship` via `await Skill({ name: "ship:ship", args: ... })` instead of `Task({ subagent_type: "ship:ship", ... })`. `ship:ship` is a skill, not an agent; the previous `Task()` call silently failed, leaving the workflow stuck after delivery validation with no PR created. The Codex adapter is updated in parity and regression tests are added. Fixes issue #230.