docs: refresh model guidance for codex max (#485)

factory-ben · web-flow · commit 452795ccdc67 · 2025-12-04T22:36:10.000Z
diff --git a/docs/cli/user-guides/choosing-your-model.mdx b/docs/cli/user-guides/choosing-your-model.mdx
@@ -4,20 +4,23 @@ description: Balance accuracy, speed, and cost by picking the right model and re
 keywords: ['model', 'models', 'llm', 'claude', 'sonnet', 'opus', 'haiku', 'gpt', 'openai', 'anthropic', 'choose model', 'switch model']
 ---
 
-Model quality evolves quickly, and we tune the CLI defaults as the ecosystem shifts. Use this guide as a snapshot of how the major options compare today, and expect to revisit it as we publish updates. This guide was last updated on Wednesday, October 23rd 2025.
+Model quality evolves quickly, and we tune the CLI defaults as the ecosystem shifts. Use this guide as a snapshot of how the major options compare today, and expect to revisit it as we publish updates. This guide was last updated on Thursday, December 4th 2025.
 
 ---
 
-## 1 · Current stack rank (October 2025)
+## 1 · Current stack rank (December 2025)
 
-| Rank | Model                    | Why we reach for it                                                                                                                              |
-| ---- | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- |
-| 1    | **Claude Sonnet 4.5**    | Recommended daily driver. Excellent balance of quality, speed, and cost for most development tasks. Current CLI default.                        |
-| 2    | **GPT-5 Codex**          | Fast iteration loops with strong coding performance. Great for implementation-heavy work at lower cost than Sonnet.                             |
-| 3    | **Claude Haiku 4.5**     | Fast and cost-effective for routine tasks, quick iterations, and high-volume automation. Best for speed-sensitive workflows.                    |
-| 4    | **Droid Core (GLM-4.6)** | Open-source model with 0.25× token multiplier. Lightning-fast and budget-friendly for automation, bulk edits, and air-gapped environments.     |
-| 5    | **GPT-5**                | Strong generalist from OpenAI. Choose when you prefer OpenAI ergonomics or need specific GPT features.                                          |
-| 6    | **Claude Opus 4.1**      | Highest capability for extremely complex work. Use when you need maximum reasoning power for critical architecture decisions or tough problems. |
+| Rank | Model                         | Why we reach for it                                                                                                                               |
+| ---- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
+| 1    | **Claude Opus 4.5 (default)** | Highest quality-and-safety balance; current CLI default for both TUI and exec.                                                                   |
+| 2    | **GPT-5.1-Codex-Max**         | Fast coding loops with support up to **Extra High** reasoning; great for heavy implementation and debugging.                                    |
+| 3    | **Claude Sonnet 4.5**         | Strong daily driver with balanced cost/quality; great general-purpose choice when you don’t need Opus-level depth.                              |
+| 4    | **GPT-5.1-Codex**             | Quick iteration with solid code quality at lower cost; bump reasoning when you need more depth.                                                 |
+| 5    | **GPT-5.1**                   | Good generalist, especially when you want OpenAI ergonomics with flexible reasoning effort.                                                     |
+| 6    | **Claude Haiku 4.5**          | Fast, cost-efficient for routine tasks and high-volume automation.                                                                              |
+| 7    | **Gemini 3 Pro**              | Strong at mixed reasoning with Low/High settings; helpful for researchy flows with structured outputs.                                         |
+| 8    | **Claude Opus 4.1**           | Highest raw capability for extremely complex work; choose when you need maximum reasoning power despite higher cost.                           |
+| 9    | **Droid Core (GLM-4.6)**      | Open-source, 0.25× multiplier, great for bulk automation or air-gapped environments; note: no image support.                                   |
 
 <Note>
   We ship model updates regularly. When a new release overtakes the list above,
@@ -30,11 +33,11 @@ Model quality evolves quickly, and we tune the CLI defaults as the ecosystem shi
 
 | Scenario                                                         | Recommended model                                                                                                                          |
 | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
-| **Deep planning, architecture reviews, ambiguous product specs** | Start with **Sonnet 4.5** for strong reasoning at practical cost. Use **GPT-5 Codex** for faster iteration or **Haiku 4.5** for lighter tasks. |
-| **Full-feature development, large refactors**                    | **Sonnet 4.5** is the recommended daily driver. Try **GPT-5 Codex** when you want faster loops or **Droid Core** for high-volume work.   |
-| **Repeatable edits, summarization, boilerplate generation**      | **Haiku 4.5** or **Droid Core** for speed and cost savings. **GPT-5** or **Sonnet 4.5** when you need higher quality.                     |
-| **CI/CD or automation loops**                                    | Favor **Haiku 4.5** or **Droid Core** for predictable throughput at low cost. Use **Sonnet 4.5** or **Codex** for complex automation.    |
-| **High-volume automation, frequent quick turns**                 | **Haiku 4.5** for speedy feedback loops. **Droid Core** when cost is critical or you need air-gapped deployment.                          |
+| **Deep planning, architecture reviews, ambiguous product specs** | Start with **Opus 4.5 (default)** for depth and safety. Use **Sonnet 4.5** when you want balanced cost/quality, or **Codex/Codex-Max** for faster iteration with reasoning. |
+| **Full-feature development, large refactors**                    | **Opus 4.5** for default depth and safety. **GPT-5.1-Codex-Max** when you need speed plus **Extra High** reasoning; **Sonnet 4.5** for balanced loops. |
+| **Repeatable edits, summarization, boilerplate generation**      | **Haiku 4.5** or **Droid Core** for speed and cost. **GPT-5.1 / GPT-5.1-Codex** when you need higher quality or structured outputs. |
+| **CI/CD or automation loops**                                    | Favor **Haiku 4.5** or **Droid Core** for predictable, low-cost throughput. Use **Codex** or **Codex-Max** when automation needs stronger reasoning. |
+| **High-volume automation, frequent quick turns**                 | **Haiku 4.5** for speedy feedback. **Droid Core** when cost is critical or you need air-gapped deployment. |
 
 <Tip>
   **Claude Opus 4.1** remains available for extremely complex architecture decisions or critical work where you need maximum reasoning capability. Most tasks don't require Opus-level power—start with Sonnet 4.5 and escalate only if needed.
@@ -47,17 +50,22 @@ Tip: you can swap models mid-session with `/model` or by toggling in the setting
 ## 3 · Switching models mid-session
 
 - Use `/model` (or **Shift+Tab → Settings → Model**) to swap without losing your chat history.
-- If you change providers (e.g. Anthropc to OpenAI), the CLI converts the session transcript between Anthropic and OpenAI formats. The translation is lossy—provider-specific metadata is dropped—but we have not seen accuracy regressions in practice.
+- If you change providers (e.g. Anthropic to OpenAI), the CLI converts the session transcript between Anthropic and OpenAI formats. The translation is lossy—provider-specific metadata is dropped—but we have not seen accuracy regressions in practice.
 - For the best context continuity, switch models at natural milestones: after a commit, once a PR lands, or when you abandon a failed approach and reset the plan.
 - If you flip back and forth rapidly, expect the assistant to spend a turn re-grounding itself; consider summarizing recent progress when you switch.
 
 ---
 
 ## 4 · Reasoning effort settings
 
-- Anthropic models (Opus/Sonnet/Haiku) show modest gains between Low and High.
-- GPT models respond much more to higher reasoning effort—bumping **GPT-5** or **GPT-5 Codex** to **High** can materially improve planning and debugging.
-- Reasoning effort increases latency and cost, so start Low for simple work and escalate when you need more depth.
+- **Opus / Sonnet / Haiku**: Off / Low / Medium / High (default: Off)
+- **GPT-5.1**: None / Low / Medium / High (default: None)
+- **GPT-5.1-Codex**: Low / Medium / High (default: Medium)
+- **GPT-5.1-Codex-Max**: Low / Medium / High / **Extra High** (default: Medium)
+- **Gemini 3 Pro**: Low / High (default: High)
+- **Droid Core (GLM-4.6)**: None only (default: None; no image support)
+
+Reasoning effort increases latency and cost—start low for simple work and escalate as needed. **Extra High** is only available on GPT-5.1-Codex-Max.
 
 <Tip>
   Change reasoning effort from `/model` → **Reasoning effort**, or via the
diff --git a/docs/pricing.mdx b/docs/pricing.mdx
@@ -29,6 +29,7 @@ Different models have different multipliers applied to calculate Standard Token
 | Claude Haiku 4.5         | `claude-haiku-4-5-20251001`  | 0.4×       |
 | GPT-5.1                  | `gpt-5.1`                    | 0.5×       |
 | GPT-5.1-Codex            | `gpt-5.1-codex`              | 0.5×       |
+| GPT-5.1-Codex-Max        | `gpt-5.1-codex-max`          | 0.5×       |
 | Gemini 3 Pro             | `gemini-3-pro-preview`       | 0.8×       |
 | Claude Sonnet 4.5        | `claude-sonnet-4-5-20250929` | 1.2×       |
 | Claude Opus 4.5          | `claude-opus-4-5-20251101`   | 1.2×       |
diff --git a/docs/reference/cli-reference.mdx b/docs/reference/cli-reference.mdx
@@ -97,16 +97,17 @@ droid exec --auto high "Run tests, commit, and push changes"
 
 ## Available models
 
-| Model ID                      | Name                    | Reasoning support | Default reasoning |
-| :---------------------------- | :---------------------- | :---------------- | :---------------- |
-| `claude-opus-4-5-20251101`    | Claude Opus 4.5 (default) | Yes             | off               |
-| `gpt-5.1-codex`               | GPT-5.1 Codex           | Yes               | medium            |
-| `gpt-5.1`                     | GPT-5.1                 | Yes               | none              |
-| `claude-sonnet-4-5-20250929`  | Claude Sonnet 4.5       | Yes               | off               |
-| `claude-opus-4-1-20250805`    | Claude Opus 4.1         | Yes               | off               |
-| `claude-haiku-4-5-20251001`   | Claude Haiku 4.5        | Yes               | off               |
-| `gemini-3-pro-preview`        | Gemini 3 Pro            | Yes               | high              |
-| `glm-4.6`                     | Droid Core (GLM-4.6)    | No                | none              |
+| Model ID                      | Name                         | Reasoning support                 | Default reasoning |
+| :---------------------------- | :--------------------------- | :-------------------------------- | :---------------- |
+| `claude-opus-4-5-20251101`    | Claude Opus 4.5 (default)    | Yes (Off/Low/Medium/High)         | off               |
+| `gpt-5.1-codex-max`           | GPT-5.1-Codex-Max            | Yes (Low/Medium/High/Extra High)  | medium            |
+| `gpt-5.1-codex`               | GPT-5.1-Codex                | Yes (Low/Medium/High)             | medium            |
+| `gpt-5.1`                     | GPT-5.1                      | Yes (None/Low/Medium/High)        | none              |
+| `claude-sonnet-4-5-20250929`  | Claude Sonnet 4.5            | Yes (Off/Low/Medium/High)         | off               |
+| `claude-opus-4-1-20250805`    | Claude Opus 4.1              | Yes (Off/Low/Medium/High)         | off               |
+| `claude-haiku-4-5-20251001`   | Claude Haiku 4.5             | Yes (Off/Low/Medium/High)         | off               |
+| `gemini-3-pro-preview`        | Gemini 3 Pro                 | Yes (Low/High)                    | high              |
+| `glm-4.6`                     | Droid Core (GLM-4.6)         | None only                         | none              |
 
 Custom models configured via [BYOK](/cli/configuration/byok) use the format: `custom:<alias>`