This document describes how Auto Code runs agent sessions across Claude and non-Claude providers after the runtime engine split. It is the current compatibility contract for provider support.
Auto Code remains Claude-first for full autonomous coding. The Claude Agent SDK is still the only runtime that provides the complete agent surface Auto Code depends on: tools, MCP servers, shell execution, filesystem edits, security hooks, and session lifecycle behavior.
Other providers can still be useful, but they run through limited runtime modes:
analysis_onlyfor phases that only need text reasoning.generic_editfor an experimental provider-neutral JSON action loop that can read files, write files, apply patches, run validated single commands, inspect git state, and run bounded read-only runtime subagents when the caller wires a subagent session factory.patch_proposalfor providers that return a structured unified diff proposal that Auto Code validates and applies locally.
The runtime engine fails fast when the selected provider cannot satisfy the capabilities required by the current phase.
| Runtime mode | Purpose | Required capabilities | Typical providers |
|---|---|---|---|
full_autonomous |
Full planner/coder/QA workflow with tools and filesystem access | Tools, MCP, shell, filesystem edits, workspace access, structured output | Claude Agent SDK, Codex CLI, Claude Code-compatible runners |
generic_edit |
Experimental local action loop for coder subtasks | Text completion, structured JSON actions, local file/patch/shell tools | OpenAI, Google, LiteLLM, OpenRouter, ZhipuAI, Ollama |
patch_proposal |
Model proposes a patch; Auto Code validates and applies it locally | Text completion, structured output, local patch application | OpenAI, Google, LiteLLM, OpenRouter, ZhipuAI, Ollama |
analysis_only |
Text-only analysis without edits or tools | Text completion or streaming | OpenAI, Google, LiteLLM, OpenRouter, ZhipuAI, Ollama |
The same compatibility table is available from the CLI:
python run.py --runtime-modes
python run.py --runtime-modes --json| Provider | Full autonomous coding | Generic edit | Analysis-only | Patch proposal | Notes |
|---|---|---|---|---|---|
claude |
Yes | Not needed | Yes | Not needed | Uses the Claude Agent SDK path and keeps existing behavior. |
openai |
No | Experimental | Limited | Limited | Direct OpenAI SDK sessions use native tool calls when available, with JSON fallback. |
google |
No | Experimental | Limited | Limited | Gemini can use local actions with Gemini-compatible tool schemas; MCP parity is not implemented. |
litellm |
No | Experimental | Limited | Limited | Gateway provider; native tools depend on routed model/gateway support. |
openrouter |
No | Experimental | Limited | Limited | OpenAI-compatible gateway with native tools plus JSON fallback. |
zhipuai |
No | Experimental | Limited | Limited | Direct ZhipuAI/Z.AI chat path is OpenAI-like and limited. Z.AI's Claude-compatible endpoint is tracked as a separate Claude Code runner profile. |
ollama |
No | Experimental | Limited | Limited | Local models can attempt generic edit without remote code sharing. |
Limited means the provider is allowed only when the selected runtime mode does
not require missing capabilities. It does not imply the provider has been
validated for every agent phase or every model.
zhipuai in this table means Auto Code's direct ZhipuAI/Z.AI provider adapter.
Z.AI also exposes a Claude/Anthropic-compatible endpoint for Claude Code-style
clients. Auto Code tracks that separately as the zai_claude_code CLI runner
profile because the full autonomous behavior comes from the Claude Code runtime
surface, not from the direct ZhipuAI chat-completions adapter.
No runtime override is required for the default full autonomous path:
AI_ENGINE_PROVIDER=claude
CLAUDE_MODEL=claude-sonnet-4-5-20250929Use per-agent provider routing when Claude should remain available for phases that need full autonomy, while coder subtasks use a limited provider:
AI_ENGINE_PROVIDER=claude
AGENT_PROVIDER_CODER=openai
AGENT_MODEL_CODER=gpt-4o
AGENT_RUNTIME_MODE_CODER=patch_proposal
OPENAI_API_KEY=sk-...Use the same per-agent routing shape for the experimental local action loop:
AI_ENGINE_PROVIDER=claude
AGENT_PROVIDER_CODER=openai
AGENT_MODEL_CODER=gpt-4o
AGENT_RUNTIME_MODE_CODER=generic_edit
OPENAI_API_KEY=sk-...The CLI can override the runtime mode for a run:
python run.py --spec 001 --runtime-mode patch_proposalProvider selection can also be supplied on the command line:
python run.py --spec 001 --provider openai --runtime-mode analysis_onlyFor the experimental generic local action loop, use:
AGENT_PROVIDER_CODER=openai AGENT_RUNTIME_MODE_CODER=generic_edit python run.py --spec 001For a non-mutating analysis pass that does not enter the coding loop, use:
python run.py --spec 001 --provider openai --analyze
python run.py --spec 001 --provider openai --analyze --analysis-prompt "Review implementation risks"The analysis output is saved under artifacts/analysis_only_analysis_*.md.
To verify a configured provider before using it in a spec, run an opt-in smoke check:
python run.py --provider openai --provider-smoke
python run.py --provider openai --model gpt-4o --provider-smoke --jsonThe smoke check sends a short text-only request through the provider abstraction
and reports whether the configured key/model can return a response. Its JSON
output includes runtime_diagnostics to make the boundary explicit: provider
smoke validates text completion only, not generic_edit, MCP, subagents, or
full_autonomous coding behavior.
For a deeper provider-readiness signal, select a runtime smoke scope:
python run.py --provider openai --provider-smoke --provider-smoke-runtime generic_edit --json
python run.py --provider openai --provider-smoke --provider-smoke-runtime mini_pipeline --json
python run.py --provider openai --provider-smoke --provider-smoke-runtime provider_e2e --jsongeneric_edit validates one local tool loop and reports native tool support,
JSON fallback, normalized tool results, resume policy, and transaction batch
diagnostics. mini_pipeline runs a temporary planner/coder/test/reviewer task
and now adds a recovery exercise: it creates a recoverable partial edit, checks
the resume preflight, resumes from the generated checkpoint, and requires the
recovery result to resolve cleanly before reporting mini_pipeline_ready.
provider_e2e runs the direct-provider e2e suite: the generic_edit smoke,
the mini pipeline smoke, the transaction_batch_probe, and provider-specific
negative fixtures for unsupported tools plus gateway/model limitations are
executed for the configured provider. The suite also derives live task-family
calibration from those real child runs, then merges everything into one
provider_e2e_suite, provider_e2e_negative_fixtures,
provider_e2e_live_task_families, and provider_reliability payload.
The JSON diagnostics also include provider_reliability: a direct-provider
coverage matrix for the full-autonomy e2e cases. It marks observed cases such
as text completion, generic edit tool loop, native tool calls, tool results,
recovery loop, and transaction batches. In provider_e2e, the unsupported-tool
and gateway/model cases are covered by provider adapter/gateway fixtures for
OpenAI, Google/Gemini, OpenRouter, LiteLLM, ZhipuAI, and Ollama; they prove the
configured direct provider surfaces the right fallback reason at the adapter
boundary. The transaction-batch case requires an observed batch contract, at
least one batch, begin_batch / commit_batch lifecycle actions, and a
committed lifecycle status. Live external fault-injection against real provider
accounts remains optional because it depends on credentials, model availability,
and gateway behavior outside the repository.
The same smoke output now carries provider_autonomous_readiness: an aggregate
scorecard with status, recommendation, recommendation_reasons, blockers,
warnings, evidence, requirements, missing_requirements, and
next_actions. It combines provider e2e results, reliability coverage,
persisted provider-run history, live fault probe evidence, and live task-family
evidence so direct API providers can be promoted only when the data shows they
are stable enough. A direct provider needs at least three stable recent provider
e2e runs, a matching consecutive pass streak, live fault probe coverage for
every required negative case, and live task-family coverage for
single_file_edit, multi_step_edit, recovery_resume, and
transaction_batching before the scorecard can become
full_autonomous_candidate. The latest provider-smoke evidence must also be
fresh; stale timestamps add a
provider_history_stale warning and keep the policy gate blocked until the
provider e2e suite is rerun. The
structured reasons payload exposes stable ids such as history_missing,
history_insufficient_runs, live_fault_probe_missing, and
latest_provider_smoke_failed so UI and policy surfaces can explain the chosen
recommendation without reverse-engineering blockers. The structured requirements
payload exposes the minimum stable-run threshold, observed recent window,
observed consecutive-pass streak, last_run_at, the maximum accepted evidence
age, history freshness, required/covered/missing live fault cases, and booleans
for history and live-fault completion, plus required/covered/missing live task
families and live task-family completion so backend automation and UI surfaces
do not need to parse free-form warning strings. Text CLI output and the
Electron provider diagnostics UI render those requirements as a compact
operator summary next to the stable missing requirement ids.
Provider e2e smoke output also carries provider_autonomous_promotion_gate;
the persisted history stores the latest promotion status plus required, passed,
and missing reliability cases and e2e run modes. Runtime policy and capability
matrices consume that persisted evidence, so a direct provider with an otherwise
green readiness score remains blocked from full_autonomous_ready until the
case-level promotion gate is clean.
Direct API providers can now opt into the first full-coder autonomous runtime
core after the evidence gates are clean. Set
AUTO_CODE_DIRECT_API_FULL_AUTONOMOUS=true only after a provider has fresh
provider e2e history, stable recent pass evidence, full live-fault coverage,
full live task-family coverage, and a passed promotion gate. When those
conditions hold, coder and QA fixer phases may use the
direct_api_autonomous adapter. That adapter is backed by the Generic Edit
engine, but it advertises the full-coder runtime surface only after the local
activation gate passes.
The activation is deliberately narrower than Claude/Codex full autonomy:
- Planner phases still require an existing full runtime or CLI runner.
- Direct API activation remains disabled unless
AUTO_CODE_DIRECT_API_FULL_AUTONOMOUS=trueis set. - Missing, corrupt, stale, or incomplete provider-smoke history blocks the adapter before execution.
- Mutating subagents still require the separate transactional merge gate.
AUTO_CODE_AUTONOMY is the single top-level knob (see
ADR-006). It collapses
AUTO_CODE_RUNTIME_MODE, AUTO_CODE_RUNTIME_FALLBACK, and
AUTO_CODE_DIRECT_API_FULL_AUTONOMOUS into four discrete intents:
| Level | Intent |
|---|---|
off |
Analysis only, never writes the workspace. |
claude (default) |
Claude / Codex CLI full autonomy; direct API providers refused with a capability error. |
safe |
+ direct API providers can be promoted to coder full autonomy when the AutonomyPolicy gate passes. |
bold |
+ skip the AutonomyPolicy gate; for benchmarks and CI evidence seeding. |
AUTO_CODE_AUTONOMY_PRESET=strict|standard|lax selects threshold
presets for the AutonomyPolicy gate. Explicit low-level env vars (the
existing matrix below) keep working and win over the level mapping;
they are advanced configuration, normally not needed. --runtime-modes --json includes an "autonomy" block reporting the resolved level,
preset, and any explicit overrides.
RuntimeCapabilities describes what a runtime physically supports.
RuntimePolicy describes evidence-based promotions layered on top. The
two are intentionally separate so the runtime never claims a capability
it cannot back, and operators can see which decisions came from the
capability layer and which came from a policy gate.
The direct_api_autonomous adapter now advertises
RuntimeCapabilities.promoted_edit() (identical to generic_edit() —
no fake native_tool_loop=True) plus a RuntimePolicy carrying
promoted_to_full_autonomous=True. The shared
capabilities.supports(requirements, policy=...) helper grants
native_tool_loop as satisfied when the policy promotes the runtime, so
the full_coder requirement is met through evidence rather than through
a capability claim. The legacy
RuntimeCapabilities.direct_api_autonomous() constructor still works
but raises a DeprecationWarning and returns the honest promoted-edit
shape.
RuntimePolicy.mcp_execution_enabled=True additionally grants the
mcp capability. The direct_api_autonomous adapter sets this flag
whenever resolve_autonomy_settings(...).external_mcp_client_enabled
is true, which AUTO_CODE_AUTONOMY=safe (and bold) flip on by
default. Explicit AUTO_CODE_EXTERNAL_MCP_CLIENT=true/false still
wins for operators who want fine control.
The provider-neutral MCP bridge (agents/runtime/mcp_bridge.py) has
been execution-capable for every registered external server (Graphiti,
Linear, Electron, Puppeteer, Context7, custom) for a while — it was
gated behind the env var. With the safe/bold level mapping a
direct API session can now actually drive tools/list plus
tools/call through the bridge against any of those servers, not only
Context7. Per-server smoke runs through mcp_execution_smoke(server, project_dir) (see agents/runtime/mcp_execution_smoke.py) which
selects the first non-mutating tool the adapter exposes and validates
the full pipeline end-to-end, returning a structured payload that
includes the normalized result, the failure stage (tools_list vs
tools_call), and the failure kind classification used by the rest of
the diagnostics surface.
Promotion is still narrow: subagents and sandbox are not granted
and still require Phase 1.2 and Phase 1.3 capability work in
docs/roadmap/non-claude-provider-autonomy.md.
qa_reviewer and qa_fixer are now resolved through the runtime layer the
same way planner and coder are. qa/loop.py reads
AGENT_PROVIDER_QA_REVIEWER / AGENT_PROVIDER_QA_FIXER and
AGENT_RUNTIME_MODE_QA_REVIEWER / AGENT_RUNTIME_MODE_QA_FIXER before
constructing a session and calls resolve_runtime_mode_with_fallback so the
runtime decision is persisted as an artifact under
spec_dir/artifacts/runtime_fallback_qa_*.json.
Execution still requires the Claude Agent SDK surface (multi-turn tool loop,
Electron MCP for E2E, recovery hooks). A non-Claude provider or a
non-full_autonomous runtime now fails fast with a clear capability error
referencing this roadmap, instead of silently falling back to Claude. The
fail-fast contract makes AUTO_CODE_AUTONOMY_<PROVIDER>_ALLOWED_PHASES
overrides legible: operators can opt a provider into qa_fixing once the
underlying capability work (Phase 1.1 MCP execution and Phase 1.4 native
tool loop in docs/roadmap/non-claude-provider-autonomy.md) lands.
Current plan status:
| Area | Status | Current boundary |
|---|---|---|
| Runtime foundation | Done | Provider/runtime modes, compatibility, fail-fast, fallback, and CLI runner routing exist. |
| Provider reliability | Mostly done | Provider e2e, negative fixtures, run history, live task-family evidence, and promotion gates are wired. |
| MCP Bridge v1 | Mostly done | External MCP bridge, permissions metadata, health, schemas, and audit artifacts exist; custom lifecycle hardening continues separately. |
| Generic Edit v2 core | Mostly done | Transactions, batches, recovery checkpoints, resume preflight, repair/rollback metadata, and rich artifacts are wired. |
| Direct API autonomous runtime core | Done in this layer | direct_api_autonomous can run coder full-coder requirements when env and history gates pass; QA phases are resolved through the runtime layer but execution is still Claude-only pending Phase 1 capability work. |
| Subagent Orchestrator v2 | Partial | Read-only child contexts exist; mutating subagents remain blocked behind transaction-boundary and merge-protocol gates. |
| CLI full runtime class | Partial | Codex CLI and generic CLI profiles exist; additional runners need deeper runner-specific contracts. |
| Frontend control plane | Partial | Runtime diagnostics consume the matrices; richer artifact viewers and inline incompatibility warnings remain. |
| Policy/evals | Partial | Policy/eval matrices and comparative history exist; broader provider eval suites still need expansion. |
Use global non-Claude provider overrides carefully. A full build may still enter
planner, QA, or tool-dependent phases that require full_autonomous; those
phases will fail fast with a capability error instead of attempting an unsafe
fallback.
Runtime fallback is opt-in:
AI_ENGINE_PROVIDER=openai
AUTO_CODE_RUNTIME_MODE=full_autonomous
AUTO_CODE_RUNTIME_FALLBACK=true
OPENAI_API_KEY=sk-...This does not make a direct provider a full autonomous runtime. Auto Code
resolves the provider/runtime capability set and degrades to the first
compatible limited mode, usually generic_edit, instead of falling back into an
impossible direct-provider full-autonomous session.
Runtime fallback artifacts also include runner_candidates. This is a
capability snapshot for CLI-runner routing: it records runner candidates for the
requested runtime mode, the selected runtime mode, and any compatible degraded
modes. It does not automatically switch a direct provider session to a different
CLI runner; that remains a separate runner-router decision.
The --runtime-modes command also exposes a runtime_fallback_matrix payload.
It shows the fail-fast selected mode, opt-in fallback selected mode, compatible
degraded modes, and runner candidates for each provider/runtime pair.
The same payload now includes two policy/eval contracts:
runtime_policy_matrixstates which runtime each provider may use for planner, coder, QA reviewer, and QA fixer phases, whether fallback is allowed, and which CLI full-runtime candidates are required when a direct provider cannot satisfy a full-autonomous phase. For direct API providers, it also carries the autonomous-readiness policy gate, autonomous-promotion gate, and recommendation derived from provider e2e history.runtime_eval_matrixlists the smoke/eval cases that must stay green before a runtime/provider path can be treated as full autonomous: provider e2e, generic-edit recovery, MCP bridge contracts, subagent orchestrator artifacts, and CLI full-runtime artifacts.
The Electron provider settings screen consumes the same JSON payload through
the provider:runtime:diagnostics IPC channel. Its runtime control plane panel
shows the selected provider/runtime pair's MCP bridge status, required MCP
action, bridged servers, native-required servers, fallback-selected runtime,
runner candidates, and subagent support strategy. Treat this UI as a live view
of the backend compatibility contract rather than a separate frontend-only
matrix.
CLI runner routing is also opt-in:
AI_ENGINE_PROVIDER=openai
AUTO_CODE_RUNTIME_MODE=full_autonomous
AUTO_CODE_CLI_RUNNER_ROUTER=true
CODEX_HOME=/path/to/codex-profileWhen enabled, Auto Code may route a direct non-Claude full_autonomous request
to a wired CLI runner instead of failing fast or degrading to a limited runtime.
The first wired route is codex_cli, and it is selected only when the Codex CLI
provider is available. Limited runtime modes such as generic_edit,
patch_proposal, and analysis_only stay on the configured direct provider.
Applied runner routes are persisted as runtime_runner_route_*.json artifacts.
Codex CLI runs capture JSONL events and a normalized
codex_cli_timeline.json artifact with bounded stdout/stderr budgets; if a CLI
exceeds the capture limit, Auto Code terminates it and records
output_truncated in the Codex CLI result artifact.
The shared CLI runner core now also exists independently of Codex CLI. It can
wrap a configured runner command, pass prompts through stdin or configured
prompt args, cancel the active process, parse Codex/OpenAI-style JSONL events,
and persist runner-specific <runner>_events.jsonl,
<runner>_timeline.json, <runner>_stdout.txt, and <runner>_result.json
artifacts. cli_runner_contract_matrix therefore distinguishes runners with a
configurable generic core (generic_core_configurable and
generic_jsonl_core) from runners that are fully wired. Runner-specific resume
and command specialization are still tracked separately, so generic core
availability does not yet make every planned CLI a production route.
The --runtime-modes --json payload also includes
runtime_subagent_matrix. It reports whether each provider/runtime pair has
native subagents, can use Auto Code's orchestrated child-session fallback, or is
blocked for subagent work. Orchestrated subagents currently use isolated child
contexts with a read-only merge policy, bounded attempts, and per-child
artifacts; this is useful for parallel exploration but is not Claude SDK Task
tool parity.
Keep this table current when advancing direct-provider autonomy. It separates what is already implemented from what still blocks OpenAI, Gemini, OpenRouter, LiteLLM, ZhipuAI, and Ollama from being treated as full autonomous coding providers.
Last updated: 2026-05-22.
| Area | Current status | Done | Remaining |
|---|---|---|---|
| Runtime foundation | Done | Runtime modes, capability checks, fail-fast behavior, runtime fallback diagnostics, Codex CLI as the first wired non-Claude full autonomous CLI path. | Keep compatibility metadata in sync as new CLI runners become wired. |
| Generic autonomous runtime for API providers | Partial | generic_edit supports JSON and native tool-call loops, local file/patch/shell actions, transaction summaries, MCP bridge calls, bounded read-only subagents, native-tool JSON fallback, and provider smoke diagnostics. Provider e2e history now also emits provider_autonomous_promotion_gate, a per-case promotion contract for required reliability cases and required e2e run modes. |
Prove direct providers across real models/gateways with repeated live e2e promotion-gate passes before marking any direct API provider full autonomous. |
| Generic Edit v2 core | Partial, strong core | Transaction groups, explicit begin_batch / commit_batch / abort_batch, batch-linked recovery outcomes, per-batch recovery policy, staged mutation metadata, isolated staged workspace materialization/restoration, commit-time staged postimage apply, batch boundary guards, pre-execution staged isolation guards for opaque open-batch mutations, pre-commit staged baseline drift guards, staged guard status/drift-path timeline events, staged workspace materialized/restored/batch-id recent events, mutation snapshots, committed snapshot ids, commit operation ids, rollback/repair actions, resumable session state, recovery checkpoints, drift guards, corrupt/missing/incomplete artifact preflight blockers, corrupt/invalid mutation-snapshot artifact health reasons, isolated staged snapshot integrity blockers, recovery-plan artifact health checks, artifact manifest transaction batches, manifest/checkpoint/event-count consistency checks, manifest recovery-timeline/resume-policy drift guards, trace/session/manifest counter drift checks, unified resume artifact consistency across trace, checkpoint, session state, manifest, mutation snapshots, and transaction batch state, and rich runtime events are implemented. |
Harden non-happy-path recovery further for richer UI-driven repair/rollback workflows and broader staged-overlay edge cases. |
| Provider reliability | Strong partial | generic_edit, mini_pipeline, transaction_batch_probe, and provider_e2e validations are wired. Provider e2e diagnostics include negative fixtures, live fault probes, automatic live task-family calibration derived from real child runs, reliability, live-fault/task-family history evidence, recent-run history timelines, granular e2e and reliability case pass-rate metrics, live-fault case coverage metrics, live task-family coverage metrics, actual per-run cost accounting when token usage is observed, fixed-token per-run estimates when usage telemetry is missing, fixed benchmark cost estimates when actual cost is unavailable, quality/stability/safety/cost trend deltas from recent runs, freshness gating for stale provider-smoke evidence, the provider_autonomous_readiness scorecard, structured recommendation reasons, settings UI surfaces, and a runtime policy gate that blocks direct-provider coder/fixer autonomy until the evidence is strong enough. The gate now requires enough stable history, fresh latest evidence, full live-fault coverage, and full live task-family coverage, not just a green latest run. See Provider reliability implementation details. |
Use real live-account fault probe data to calibrate provider-specific recommendations and broaden trend signals across more provider-specific task families. |
| MCP Bridge v1 | Strong partial | Local MCP bridge status, Context7 external execution, server health, bridge plans, unavailable-tool observations, readiness metadata for Graphiti, Linear, Electron, Puppeteer, and custom stdio/http servers, and mcp_bridge_permission_matrix for local/external/custom permission gates are represented. The runtime enforces RuntimeMcpToolPolicy before execution, writes audit artifacts, classifies mutating tools, normalizes MCP tool results into text, content, structured_content, and is_error, classifies live tools/list and bridged tools/call lifecycle failures by stage/kind, and exposes whether strict AUTO_CODE_MCP_ALLOWED_PERMISSIONS allowlists are configured. |
Generalize live execution coverage across all registered external servers, normalize arbitrary live schemas continuously, and keep hardening external session reuse plus per-server execution smoke. |
| Subagent Orchestrator v2 | Partial | Orchestrated read-only child sessions have isolated prompt envelopes, explicit child context ids per attempt, bounded retries, cancellation, per-child artifacts, attempt history, read-only merge plans, and runtime_subagent_mutation_policy now exposes the gates blocking mutating children until transactional merge is ready. |
Add transactional boundaries for mutating child sessions, conflict-aware merge protocol, parent-approved apply/abort, child artifact viewer polish, then move the mutation policy from blocked to enabled. |
| CLI runtimes as full runtime class | Partial, stronger core | Codex CLI is wired through a full-autonomous route with event/result artifacts and runner routing diagnostics. CLI profile discovery exists for additional runners, cli_runner_contract_matrix tracks run, cancel, resume, artifacts, event parser, and cost/account metadata for every candidate, and the generic CLI core now supplies configurable run/cancel/artifact/event parsing for planned runners. |
Add runner-specific command builders, resume semantics, and live smoke/e2e coverage for Aider, OpenCode, Goose, Gemini CLI, Qwen Code, and other viable CLIs so they can move from generic-core partial to ready. |
| Frontend runtime control plane | Strong partial | Provider settings show runtime diagnostics, provider smoke/e2e status, live task-family diagnostics, granular provider e2e/reliability/live-fault/live-task history coverage, autonomous-readiness requirements including stable runs, consecutive passes, freshness, live-fault coverage, and live task-family coverage, the autonomous promotion gate with missing reliability/e2e blockers, MCP status and permission gates, Generic Edit recovery/batch evidence, policy/eval rows with comparative cost estimates, CLI runner status, and mutating-subagent gates. See Frontend control plane implementation details. | Add richer history charts, deeper provider controls, and deeper artifact drilldowns into one operator-grade surface. |
| Policy and evals | Strong partial | Runtime recommendations, compatibility diagnostics, runtime_policy_matrix for planner/coder/QA phase selection, direct-provider autonomous readiness gates, runtime_eval_matrix for provider e2e, generic-edit recovery, MCP bridge contracts, subagent orchestrator artifacts, CLI full-runtime artifacts, runtime_eval_history from persisted provider smoke evidence, and runtime_comparative_eval_matrix for Claude/Codex/OpenAI/Gemini/Ollama quality/cost/safety comparison are implemented. Comparative rows now prefer the stricter provider e2e/live-task coverage score for quality, use the stricter reliability/live-fault coverage score for safety, expose score source ids, include stability score, latest pricing model, recorded actual cost from provider smoke token usage when available, use fixed-token cost estimates as fallback, and surface quality/stability/safety/cost trend ids plus deltas for the UI and text control plane. |
Calibrate quality/safety scoring against broader live eval tasks and add richer trend visualizations. |
The practical rule remains: a direct API provider is not full_autonomous until
it can reliably run the whole tool loop, preserve transactional recovery state,
execute the required MCP/subagent surfaces, and pass the provider reliability
suite without relying on Claude SDK semantics. The --runtime-modes payload
turns the provider smoke history into an autonomous-readiness policy gate; if
that gate is blocked, direct-provider coder and QA fixer policies stay limited
even when generic_edit can still run.
generic_editvalidates the live tool loop, native tools, JSON fallback, and provider limitation handling.generic_editreports recovery status, resume policy, transaction batches, boundary guards, staged drift guards, and commit operations.mini_pipelineruns the planner/coder/test/reviewer flow with checkpoint preflight and resume recovery.mini_pipelinereportsrecovery_loop_status.transaction_batch_probeisolates the transaction-batch contract.provider_e2eruns the direct-provider e2e suite.provider_e2eadds transaction-batch coverage plus provider-specific unsupported-tool and gateway/model negative fixtures.provider_e2eautomatically derives live task-family calibration forsingle_file_edit,multi_step_edit,recovery_resume, andtransaction_batchingfrom its realgeneric_edit,mini_pipeline, andtransaction_batch_probechild runs.- The provider e2e suite covers OpenAI, Google/Gemini, OpenRouter, LiteLLM, ZhipuAI, and Ollama.
- Provider e2e merges child results into
provider_e2e_suite. - Provider e2e records negative fixtures in
provider_e2e_negative_fixtures. - Provider e2e reports aggregate readiness through
provider_reliability. - The
provider_autonomous_readinessscorecard reports recommendation, structured recommendation reasons, blockers, warnings, structured requirements, missing requirement ids, evidence, and next actions derived from e2e, reliability, history, live fault probe data, and live task-family data. - Compact evidence is persisted in
.auto-Codex/provider-smoke-history.json. - Recent-run trends, window counts, and pass/fail streaks are reported from provider smoke history.
- Quality and stability percentages come from total and recent pass rates, granular e2e case pass-rate, live task-family coverage, reliability case pass-rate, and live-fault case coverage percentages from the required provider negative cases.
- Cost history records actual per-run usage when smoke diagnostics include token usage, falls back to fixed-token estimates when usage telemetry is missing, aggregates total/latest tokens and cost per provider, and surfaces those totals in non-JSON CLI output.
- Provider history computes recent-run quality, stability, safety, and cost trend ids plus deltas so operator surfaces can distinguish improving, degrading, stable, and insufficient-data evidence.
- Provider autonomous readiness requires fresh latest provider-smoke evidence;
stale
last_run_attimestamps surfaceprovider_history_stale,history_stale, and thefresh_provider_historyrequirement, withrerun_provider_e2eas the next action. - Provider autonomous promotion emits
provider_autonomous_promotion_gatebeside the readiness scorecard. The gate requires the readiness scorecard to reachfull_autonomous_candidate, every required reliability case to have case-levelpassedevidence, and every required provider e2e run mode (generic_edit,mini_pipeline,transaction_batch_probe,unsupported_tools_probe, andgateway_model_probe) to pass. This keeps aggregate counters from promoting a provider without explicit tool-loop, tool-result, recovery, transaction-batch, unsupported-tool, and gateway/model-limit evidence. - Provider smoke history persists the latest promotion gate status together
with required, passed, and missing reliability/e2e evidence. Runtime
diagnostics read those fields back into
runtime_policy_matrixandruntime_capability_matrix;full_autonomous_readyis true for direct API providers only when the persisted promotion gate is ready, not merely when aggregate readiness reachesfull_autonomous_candidate. - Degrading quality, stability, or safety trends now feed the autonomous readiness gate as evidence-stability warnings, keeping direct-provider coder and fixer policy limited until the trend recovers.
- Comparative eval rows prefer granular provider e2e case pass-rate for
quality, use the stricter reliability/live-fault coverage score for safety,
expose score source ids, prefer recorded actual provider cost, carry trend
ids/deltas, and fall back to the latest observed
last_modelusing a fixed 10k input / 2k output token benchmark and backend pricing metadata. - A compact
recent_runstimeline captures the latest accumulated runs, including status, runtime mode, model, reliability, provider e2e, and live-fault probe plus live task-family status when available. - Autonomous readiness requires at least three stable recent runs, a matching consecutive pass streak, and complete live fault coverage across the required provider negative cases plus complete live task-family coverage across the required edit/recovery/batch task families.
- Runtime diagnostics read the same persisted history and attach
autonomous_policy_gate, readiness status, recommendation, recommendation reasons, blockers, warnings, structured requirements, missing requirement ids, evidence, and next actions to the runtime policy and capability matrices. - Direct-provider coder and QA fixer policies are downgraded to the readiness recommendation while the autonomous gate is blocked.
- Live fault probes are enabled with
AUTO_CODE_PROVIDER_E2E_LIVE_FAULT_PROBES=trueor=1. - A truthy opt-in must be paired with captured provider error fixture env vars.
- Fixture env vars cover unsupported-tool and gateway/model cases.
- Provider history persists live fault probe status, enabled/pass counts, and covered live-fault cases.
- Repeated real-account e2e runs can influence readiness recommendations over time.
- Live task-family calibration is automatic for
provider_e2e: it maps the real child-run outcomes to required task families. generic_editcoverssingle_file_edit.mini_pipelinecoversmulti_step_editand, when its recovery loop passes,recovery_resume.transaction_batch_probecoverstransaction_batching.AUTO_CODE_PROVIDER_E2E_LIVE_TASKS=trueor=1remains available as a manual fixture override for externally captured task-family status.- When the manual fixture override is enabled, it must be paired with status env vars for every required task family, either provider-specific or generic.
- Provider-specific status env vars use this shape:
AUTO_CODE_PROVIDER_E2E_LIVE_<PROVIDER>_<TASK>_STATUS. - Generic fallback status env vars use this shape:
AUTO_CODE_PROVIDER_E2E_LIVE_<TASK>_STATUS. - Required task suffixes are
SINGLE_FILE_EDIT,MULTI_STEP_EDIT,RECOVERY_RESUME, andTRANSACTION_BATCHING. - Status value
passedcovers the task family. Any other configured value is treated as a failed family, and a missing status while the opt-in is enabled produces a configuration-blocked diagnostic. - Provider history persists the latest live task-family status, enabled/pass counts, covered families, failed families, and live task-family coverage percentage.
- The direct-provider autonomous-readiness gate requires live task-family
coverage before it can become
full_autonomous_candidate.
- The settings UI surfaces suite status and negative fixtures.
- The settings UI surfaces live fault probes and provider reliability.
- The settings UI surfaces live task-family calibration and provider history coverage for edit/recovery/batch families.
- The settings UI surfaces provider autonomous readiness recommendations, recommendation reasons, blockers, warnings, missing requirements, compact structured requirement evidence, evidence, and next actions.
- The settings UI surfaces autonomous policy gate, recommendation text, and recommendation reasons in runtime governance diagnostics, including the same compact structured requirement evidence in policy/capability rows.
- The settings UI surfaces autonomous promotion status and the missing
reliability/e2e evidence inside runtime policy and capability rows, matching
the backend gate that blocks direct-provider
full_autonomous_ready. - The CLI
--runtime-modestext tables also include autonomy requirement summaries beside the policy/capability recommendations. - The settings UI surfaces comparative eval quality, stability, safety, pricing model, score sources, recorded actual cost when available, and estimated benchmark cost fallback alongside provider quality/cost/safety status.
- The settings UI surfaces transaction-batch evidence.
- The settings UI surfaces history evidence, provider trend rows, granular e2e/reliability case pass-rate coverage, and recent-run timelines.
- Runtime diagnostics show provider smoke results.
- Runtime diagnostics show the tool-loop contract.
- Runtime diagnostics show transaction batches and committed snapshots.
- Runtime diagnostics show commit operations and resume policy.
- MCP diagnostics show bridge status.
- MCP diagnostics show permission gates.
- Generic Edit diagnostics show artifact transaction batches.
- Generic Edit diagnostics show staged batch mutation, path, and lifecycle counts.
- Generic Edit diagnostics show batch lifecycle rows.
- Generic Edit diagnostics show boundary blockers and required next actions.
- Generic Edit diagnostics show resolution strategies.
- Generic Edit diagnostics show staged workspace materialized/restored event metadata.
- Generic Edit diagnostics show recovery timeline and open-batch resume state.
- Provider diagnostics show e2e negative fixtures.
- Live fault probes are displayed beside the e2e suite evidence.
- Run history includes trend rows, granular e2e/reliability coverage, and recent-run timelines.
- The autonomous promotion gate is displayed with the missing reliability cases, missing e2e run modes, and readiness requirement blockers that still prevent a direct API provider from being treated as full autonomous.
- Runtime governance diagnostics show policy/eval rows.
- Capability readiness rows include blockers and warnings.
- The direct-provider autonomous policy gate and readiness recommendation are visible in the governance panel.
- Structured missing readiness requirements appear next to runtime eval history.
- Comparative eval rows include quality/safety score sources.
- CLI runner contract status and mutating-subagent gates complete the operator view.
generic_edit mode asks the model to return one JSON object per iteration. Auto
Code executes the requested local actions, sends observations back to the model,
and repeats until the model returns finish or the runtime reaches its
iteration limit.
When the provider session exposes native tool calls, generic_edit uses those
tool calls by default and falls back to the JSON action loop only when the first
native tool-call request is rejected before any local action runs.
Supported actions:
{
"thought": "short planning note",
"actions": [
{ "tool": "stat_path", "path": "src/app.py" },
{ "tool": "list_files", "path": "src", "recursive": false, "max_entries": 100 },
{ "tool": "search_text", "query": "function_name", "path": "src", "recursive": true, "max_matches": 25 },
{ "tool": "read_file", "path": "relative/path.py", "max_chars": 12000 },
{ "tool": "read_file_range", "path": "relative/path.py", "start_line": 40, "max_lines": 120 },
{ "tool": "read_many_files", "paths": ["relative/path.py", "relative/other.py"], "max_chars_per_file": 8000 },
{ "tool": "begin_batch", "batch_id": "batch-1", "description": "Update focused files as one recovery unit." },
{ "tool": "write_file", "path": "relative/path.py", "content": "..." },
{ "tool": "replace_text", "path": "relative/path.py", "old": "old_name", "new": "new_name", "count": 1 },
{ "tool": "delete_file", "path": "relative/path.py" },
{ "tool": "move_file", "source": "relative/path.py", "destination": "relative/new-name.py", "overwrite": false },
{ "tool": "apply_patch", "patch": "unified diff" },
{ "tool": "run_command", "command": "pytest tests/test_file.py -q", "timeout": 60 },
{ "tool": "commit_batch", "batch_id": "batch-1", "summary": "Focused edits applied and ready for verification." },
{ "tool": "abort_batch", "batch_id": "batch-1", "reason": "The staged mutation is no longer needed." },
{ "tool": "rollback_transaction", "transaction_id": "json_actions-1", "snapshot_ids": ["mutation-1"] },
{ "tool": "repair_mutation", "transaction_id": "json_actions-1", "paths": ["relative/path.py"], "note": "Inspected and repaired the partial mutation." },
{ "tool": "git_status", "path": ".", "include_untracked": true },
{ "tool": "git_diff", "path": "relative/path.py", "max_chars": 8000 },
{ "tool": "run_subagents", "tasks": [{ "id": "inspect-api", "role": "explorer", "prompt": "Inspect the API layer and report relevant files." }] },
{ "tool": "finish", "summary": "what changed", "tests": ["commands run"], "risks": [] }
]
}The action contract is defined once in
apps/backend/agents/runtime/local_actions.py as the local action manifest.
The JSON-loop prompt is rendered from that manifest, and future provider-native
function-calling adapters should use the same schemas instead of duplicating
tool definitions.
OpenAI-compatible sessions expose a native tool-call bridge that can send these
schemas as function tools and append tool results back to provider history.
Direct OpenAI, Ollama, OpenRouter, LiteLLM, and ZhipuAI sessions use that
bridge when the routed model/gateway supports tools; if the first native
tool-call request is rejected, generic_edit falls back to the JSON action
loop and records native_tool_fallbacks in the result and summary artifacts so
the gateway/model limitation is visible.
The native tool-call parser accepts the common gateway shapes Auto Code sees in
practice: OpenAI Chat Completions tool_calls, OpenAI Responses output
function-call blocks, Anthropic-style tool_use content blocks, Bedrock-style
toolUse blocks, Gemini functionCall parts, gateway choices[].message
envelopes, nested response/message content parts, and streaming delta
fragments with chunked JSON arguments.
Z.AI through Claude Code is intentionally not routed through this generic edit
contract. That path uses an Anthropic-compatible endpoint with a Claude
Code-compatible CLI runtime and is represented by the zai_claude_code runner
profile.
Auto Code validates and executes these actions locally:
- file paths use the same workspace-relative sensitive-path checks as
patch_proposal; - path metadata inspection is bounded and never reads file contents;
- file listings are bounded, skip sensitive/heavy directories, and return only path metadata;
- text search is literal, bounded, skips sensitive/heavy directories, and returns compact path/line excerpts;
- single-file and multi-file reads are bounded and use the same sensitive-path checks;
- patches use
git apply --check --whitespace=nowarnbefore applying; - commands pass the existing security allowlist/validator layer;
- commands run without a shell and do not support pipes, redirection, or command chaining;
- git status and diff inspection use fixed git subcommands, workspace-relative path scoping, and bounded output instead of arbitrary shell commands;
- runtime subagents run as bounded read-only child sessions for analysis, exploration, review, or comparison work; they do not receive Claude SDK Task tool parity or independent mutating runtime privileges, and each child attempt has an isolated context envelope, explicit child context id, bounded retry metadata, per-child result artifact, and a timeout/cancellation guard;
- JSON and native tool-call batches stop after the first failed local action, so later actions in the same batch do not run against a partially failed transaction;
- traces, summaries, safe action timelines, and per-action observations are saved
as
generic_edit_trace.json,generic_edit_result.json,generic_edit_timeline.json,generic_edit_events.jsonl,generic_edit_observations.jsonl,generic_edit_artifact_manifest.json, andgeneric_edit_summary.md; - transaction summaries include tool sequences, affected/mutated paths, partial-failure ids, the last partial-failure path set, whether recovery was resolved by a later covered inspection/repair or workspace verification transaction, and any unresolved partial failures;
- transaction batches link batch ids to transaction ids, staged mutation ids,
staged path metadata, mutation snapshots, committed mutation snapshot ids,
commit operation ids, transaction groups, unresolved groups, per-batch
recovery policies, boundary errors, and recovery outcomes.
commit_batchis rejected while the batch still has unresolved recovery groups, mutating actions aftercommit_batch/abort_batchin the same provider turn are rejected before any action in that turn mutates the workspace, opaque workspace commands such asrun_commandare rejected while a batch is open because they cannot produce staged mutation snapshots, open-batch file mutations are isolated by temporarily materializing staged postimages for the current action and restoring the real workspace baseline after the action,commit_batchvalidates the staged baseline before applying active staged postimages and blocksstaged_batch_drift/ unverifiable staged state,finishis rejected while a batch is still open, and open-batch state plus batch-boundary guard status, boundary reasons, staged drift status, staged materialize/restore events, and suggested recovery actions are preserved in the recovery checkpoint, artifact manifest timeline, provider smoke diagnostics, and the frontend runtime diagnostics rows. Provider smoke now reports the boundary preferred strategy, required action kinds, and resolution strategies from the manifest recovery timeline instead of only exposing the raw boundary reason, and includes staged guard statuses, drift paths forstaged_batch_driftevents, committed snapshot ids, and commit operation ids; - interrupted or partial runs persist
generic_edit_session_state.json,generic_edit_recovery_checkpoint.json,generic_edit_mutation_snapshots.json, andgeneric_edit_transaction_groups.json. The read-only resume preflight blocks corrupt checkpoints, corrupt manifests, missing required artifacts, mismatched session/checkpoint/manifest policies, stale trace/session/manifest counters, manifest event-count drift against the persisted event stream, missing checkpoint snapshot references, incomplete referenced mutation snapshots, isolated staged snapshots whose workspace baseline was not restored or whose baseline preimages are missing, trace mismatches, transaction batch drift between trace, checkpoint, session state, manifest, and mutation snapshots, corrupt or non-canonical required event streams, recovery-plan artifacts, and transaction-group artifacts, transaction-group count/unresolved-id drift against the checkpoint, manifest recovery-timeline boundary blockers whose required actions are absent from the checkpoint resume policy, and workspace drift before a resumed run mutates files; finishis rejected when a previous partial-failure transaction remains unresolved, so limited runtimes cannot report success after a partially applied mutating batch.
This mode is intentionally not full autonomous parity. It exposes the local
action loop, provider-native tool calls when available, bounded runtime
subagents when wired by the caller, and Auto Code's local MCP bridge for
built-in tools. When AUTO_CODE_EXTERNAL_MCP_CLIENT is enabled (or the
operator sets AUTO_CODE_AUTONOMY=safe/bold which flips it on
automatically), the provider-neutral external MCP client executes
tools/list plus tools/call against every registered external server
— Context7, Graphiti, Linear, Electron, Puppeteer, and custom servers —
through the same bridge. Per-server connectivity is validated by
agents.runtime.mcp_execution_smoke.mcp_execution_smoke(server, project_dir). It does not expose Claude SDK session
lifecycle behavior. MCP support artifacts include per-server statuses
such as local_bridge, external_bridge_required, native_required, and
unsupported, so non-Claude runs can explain exactly which requested MCP
servers are available, which are ready for the provider-neutral external MCP
client, and which remain native-runtime-only. MCP support artifacts also include
a bridge_plan with ready, partial, or blocked status, native-required
servers, external-bridge-required servers, local bridged servers, external
bridged servers, unsupported servers, and the next runtime action needed.
External MCP server statuses carry a redacted external_client health object
with transport, command/url hints, enablement flags, missing configuration,
whether the server is ready_to_connect, whether this layer supports execution,
and the executable tool names when enabled. Context7 uses that health contract
to expose mcp__context7__resolve-library-id and
mcp__context7__get-library-docs; other external servers still require
tool-execution wiring before parity.
Bridged local MCP tools also carry explicit permission/audit metadata in
tool_policies, and each bridged call appends a redacted
mcp_bridge_audit.jsonl event with the tool, permission, mutation flag, action,
status, and result summary. If a provider emits an unavailable MCP tool call such as
mcp__context7__resolve-library-id, generic_edit records a structured
observation with the server name, support strategy, runtime path, and server
status instead of collapsing the failure into a generic unknown-tool error.
OpenAI-compatible tool-call parsing normalizes direct message objects, gateway
choices[].message envelopes, streaming delta envelopes, and content/part
blocks used by OpenAI, LiteLLM, OpenRouter, Gemini-like, and Anthropic-like
responses.
patch_proposal mode asks the model to return one JSON object with this shape:
{
"summary": "Short description of the proposed change",
"files": [
{
"path": "relative/path.py",
"operation": "modify",
"patch": "unified diff for this file"
}
],
"tests": ["suggested verification command"],
"risks": ["known limitation or follow-up"]
}Auto Code validates and applies the proposal locally:
- rejects absolute paths and parent traversal;
- rejects sensitive paths such as
.git,.env,.env.local,.env.production,.mcp.json, and.claude; - runs
git apply --check --whitespace=nowarnbefore applying; - saves patch artifacts in the spec artifacts directory:
patch_proposal.jsonfor the raw provider proposal;patch.difffor the unified diff;patch_result.jsonfor structured status, file, test, and risk metadata;patch_summary.mdfor a human-readable review summary;
- does not automatically run model-suggested test commands.
Each runtime advertises capabilities such as tools, mcp, filesystem_edits,
shell_commands, structured_output, and workspace_access. Each agent phase
declares what it needs. If a provider/runtime pair cannot satisfy those
requirements, Auto Code reports a capability error before the session runs.
Examples:
- A non-Claude provider in
full_autonomousmode fails before tool-dependent coding starts. - A non-Claude provider in
analysis_onlymode is allowed only for phases that need text completion. During coding, Auto Code saves the text output toartifacts/analysis_only_*.md, leaves the subtask pending, and stops instead of pretending implementation succeeded. - A non-Claude provider in
patch_proposalmode can modify files only through a validated unified diff. - A non-Claude provider in
generic_editmode can modify files through Auto Code's local action loop. Direct OpenAI, Ollama, OpenRouter, and LiteLLM sessions use provider-native tool calls when available; other sessions can use the JSON action loop. The mode can use the local Auto Code MCP bridge and now reports Context7 as an external bridged server with executable tool names when the provider-neutral external MCP client is explicitly enabled. Remaining external servers report readiness throughexternal_clienthealth metadata, andmcp_bridge_permission_matrixreports bridge permission coverage, mutating permissions, audit artifacts, and strict allowlist status for local, external, and configured custom MCP servers. MCP tool results are normalized intonormalized_resultin observations/audit data so UI and recovery layers can inspect text, typed content, structured payloads, and error state without reparsing raw provider output. External MCP contract smoke also recordsfailure_stageandfailure_kindfor livetools/listfailures so custom server lifecycle errors can be diagnosed without parsing free-form messages. Parallel read-only work can userun_subagentswhen the caller wires aRuntimeSubagentOrchestratorsession factory. Local action batches halt after the first failed action and persist transaction/recovery metadata before the next provider iteration. Recovery is only marked resolved after a subsequent successful inspection or repair action, not byfinishalone.
This runtime engine is an integration boundary, not a generic replacement for
the Claude Agent SDK. Claude keeps the full native SDK surface. Codex CLI is the
first wired non-Claude full-autonomous CLI runtime. Direct providers use
analysis_only, patch_proposal, or generic_edit; they do not receive
general external MCP tool-execution parity or mutable subagent parity through
the direct chat adapter. The only external MCP execution path in this layer is
the explicitly enabled Context7 stdio bridge.
CLI runner profiles for Claude Code, Z.AI via Claude Code, Gemini CLI, Aider, Cursor, CodeRabbit CLI, GitHub Copilot CLI, OpenCode, Goose, Amp, Qwen Code, DeepV Code, and a generic CLI pool are exposed for diagnostics and routing planning. Planned runners can use the shared generic CLI process/artifact core when a command is explicitly configured, but runtime routing only applies to wired runners. Profile visibility and generic-core availability do not imply that every listed CLI already has production command mapping, resume support, or provider-specific smoke coverage.
Non-Claude subagents are represented as orchestrated read-only child runtime sessions, not Claude SDK Task tool parity. Their artifacts include aggregate child-session summaries, per-child result artifacts, attempt histories, and a read-only merge plan so status dashboards can show complete/error/cancelled counts without re-parsing every child result.
apps/backend/agents/runtime/- runtime capabilities, requirements, session engine, and adapters.apps/backend/agents/runtime/local_actions.py- reusable local action executor used by generic edit's JSON and provider-native tool-call loops.apps/backend/agents/runtime/mcp_bridge.py- local MCP bridge status, external MCP client readiness, permission policy, and audit artifacts for direct-provider runtimes.apps/backend/agents/runtime/cli_profiles.py- CLI runner profile registry, executable detection, and selection diagnostics.apps/backend/agents/runtime/runner_router.py- opt-in routing from impossible direct-provider full-autonomous requests to wired CLI runners.apps/backend/agents/runtime/artifacts.py- shared analysis-only artifact persistence.apps/backend/agents/runtime/compatibility.py- user-facing provider/runtime compatibility metadata.apps/backend/agents/runtime/subagents.py- provider-neutral subagent support policy and child-session orchestrator.apps/backend/agents/coder.py- runtime selection for planning/coding phases.apps/backend/cli/analysis_commands.py- non-mutating provider analysis CLI.apps/backend/cli/runtime_commands.py- runtime compatibility CLI.apps/backend/agents/planner.py- runtime selection for follow-up planning.apps/backend/core/providers/- provider adapters and provider factory.apps/backend/cli/provider_smoke_commands.py- opt-in live provider smoke checks.tests/test_agent_runtime.py- runtime capability and patch proposal tests.