This document proposes how Flow should evolve from "helpful CLI + local skills" into a Codex-first control plane where the user stays inside Codex and Flow handles routing, memory, execution, and learning behind the scenes.
Target state:
- the user speaks natural intent in Codex
- Flow resolves references, routes workflows, fetches secure context, and runs the right tool/task
- Codex sees only the smallest useful context for the current turn
- repeated phrasing becomes reusable system knowledge without turning every repo preamble into a wall of rules
Example desired behavior:
document itresolves to the docs write flow- a pasted Linear URL is unrolled before planning
continue the last deploy investigationfinds the right session/worktree- the user does not need to remember
forge doc,forge linear inspect, or repo-specific wrappers
Current Flow has strong building blocks but they are still separate:
- task skills are generated and reloaded for Codex
- sessions are stored and recoverable
- env storage is becoming secure enough for org use
- router telemetry already exists
- repo-specific systems like Forge can mine aliases and inject lean workflow rules
But the user still pays too much cognitive cost:
- wrappers like
Land repo-specific launchers carry logic outside Flow - repo preambles grow whenever a new shortcut is taught
- skill learning is mostly manual
- URL/reference unrolling is repo-specific instead of generic
- Codex app-server connections are process-per-query in some paths
The result is "good pieces, weak control plane".
- Flow is the control plane; repo tools remain domain executors.
- Skills stay thin; runtime resolution carries the real behavior.
- Reference unrolling is deterministic first, model-assisted only if needed.
- Learning produces suggestions, not prompt bloat.
- No default context should be paid for behavior that is not active.
- task-synced Codex skill metadata in src/skills.rs and src/skills.rs
- Codex skill cache reload in src/skills.rs
- configurable Codex wrapper transport in src/commit.rs
- multi-provider session recovery and copy flows in src/ai.rs
- router telemetry hooks in src/rl_signals.rs
- current Codex session resolver direction in codex-openai-session-resolver.md
These are enough to start. The missing work is unification.
Flow should target current upstream Codex directly.
That means:
- prefer wrapper transport + config over patching Codex
- use stable upstream surfaces like normal user skill roots,
skills/list, andthread/* - treat newer upstream features such as
skills/list perCwdExtraUserRootsand in-process app-server clients as accelerators, not prerequisites - keep repo-specific behavior in Flow or repo executors, not in a private Codex fork
Add a Flow-managed warm control layer, either as an extension of ai-taskd, a
focused jd, or a lighter in-process broker where that is enough for the
current upstream Codex client surface.
Responsibilities:
- maintain repo-scoped Codex app-server sessions
- cache recent threads, active skills, and repo metadata
- expose fast local RPC for lookup, runtime-skill injection, and doctor output
- resolve references before they reach Codex as plain text
- own the "what extra context is actually needed for this turn?" decision
This should absorb behavior that currently lives in wrappers like L.
Promote Forge-style phrase aliasing into Flow as a generic feature.
Each intent has:
- canonical name
- phrase aliases
- optional repo/path scope
- resolver/action target
- confidence policy
- evidence counters for suggested future aliases
Examples:
doc-itlinear-referencesession-recoverreview-intent-comment
Intent matching must stay deterministic and cheap.
Flow should ship a generic resolver layer for pasted references:
- Linear issue URLs
- Linear project URLs
- GitHub PR / issue URLs
- repo file paths
- commit SHAs
- saved Flow session names or IDs
Resolvers return structured payloads, not prose. Repo-local executors like Forge can register resolver commands for domain-specific expansion.
Split Codex knowledge into two layers:
- baseline skills: always available, minimal repo guidance
- runtime skills: ephemeral, injected only when a matched intent or resolver requires them
Examples:
- user says
document it- inject tiny docs-routing runtime skill
- user pastes a Linear URL
- inject tiny linear-unrolled runtime context
- user asks to recover recent work
- inject session-recovery runtime context only for that request
Runtime skills should expire automatically and be bounded by a strict budget.
Use router telemetry plus transcript mining to propose:
- new aliases
- new reference patterns
- candidate runtime skills
- stale skills that should be removed
Important:
- do not auto-install every observed phrase
- require evidence thresholds
- prefer suggested changes that collapse multiple variants into one canonical intent
Add a small command family around the new control plane:
f codex open [query]
f codex resolve "<text-or-url>" [--json]
f codex runtime
f codex runtime show
f codex runtime clear
f codex teach suggest
f codex teach accept <intent-or-suggestion-id>
f codex teach reject <intent-or-suggestion-id>
f codex doctor
f codex daemon start|stop|statusIntended behavior:
f codex openreplaces personal wrappers likeLf codex resolveshows what Flow would unroll or route before Codex sees itf codex runtime showexplains which runtime skills/context are activef codex teach suggestpresents evidence-backed alias/intent suggestionsf codex doctorexposes repo path, active app-server connection, runtime budget, skill count, and recent resolver hits
Proposed flow.toml additions:
[codex]
control_plane = "daemon"
warm_app_server = true
runtime_skill_budget_chars = 1200
auto_resolve_references = true
auto_learn = "suggest-only"
[codex.session]
open_command = "codex"
prefer_last_active = true
repo_scoped_lookup = true
[[codex.intent]]
name = "doc-it"
phrases = ["doc it", "document it", "write this down", "save this in docs"]
resolver = "docs.route_write"
scope = ["repo", "personal"]
[[codex.intent]]
name = "session-recover"
phrases = ["what was i doing", "recover recent context", "continue the work"]
resolver = "session.recover"
[[codex.reference_resolver]]
name = "linear"
match = ["https://linear.app/*/issue/*", "https://linear.app/*/project/*"]
command = "forge linear inspect {{ref}} --json"
inject_as = "linear"
[[codex.reference_resolver]]
name = "docs"
match = ["doc it", "document it"]
command = "forge doc route --title {{title}} --json"
inject_as = "docs"Also add a personal/global config file for user-specific phrase preferences:
~/.config/flow/codex-intents.toml
Use this for personal language variants that should not live in repo config.
jd should own:
- app-server lifecycle
- repo session caches
- runtime skill activation/deactivation
- resolver execution
- secure env lookups for active workflows
- bounded prompt-context assembly
- suggestion generation from telemetry/history
- compatibility with existing
f skills reloadandf ai codex ...flows
It should not:
- replace repo-specific executors like Forge
- run opaque model-based routing in the hot path
- inject large transcript summaries into every turn
The runtime layer needs hard limits:
- baseline repo guidance stays small
- runtime additions must fit a bounded char/token budget
- each resolved intent/reference should justify its own inclusion
- unused runtime skills expire quickly
Budget policy should prefer:
- structured resolver output
- one tiny runtime skill
- one short recovery summary
- nothing else
Inputs:
- router telemetry
- accepted/overridden task choices
- resolver hits
- successful tool invocations
- session transcript mining
Outputs:
- proposed alias additions
- proposed resolver registrations
- dead-skill cleanup suggestions
- better default repo baselines
Approval model:
- repo suggestions require explicit accept
- personal suggestions can default to personal scope
- org/shared suggestions should stay gated
Forge should remain the Prom executor for Prom-specific workflows.
Flow should absorb the generic pieces Forge proved useful:
- intent aliasing
- reference unrolling
- thin runtime teaching
- lean docs workflow activation
That means:
- Prom keeps
forge linear inspect,forge doc, and similar domain commands - Flow becomes the generic router that decides when to call them
- move
L-style session open/recover behavior intof codex open - make repo-scoped Codex session resolution first-class
- expose a
doctorview for current skill/runtime state
- add
jdwith persistent app-server connection per repo - keep recent thread cache and skills cache warm
- remove process-per-query overhead for session lookup/reload paths
- add config-backed intent aliases
- add generic reference resolver interface
- ship built-ins for session recovery, docs routing, and Linear URLs
- inject temporary runtime skills/context instead of growing repo preambles
- enforce runtime budget caps
- surface active runtime state in
f codex runtime show
- mine telemetry + sessions for candidate aliases and resolver patterns
- generate suggestions only after evidence thresholds
- add accept/reject workflow
- reuse the same intent/resolver plane for Claude and Cursor transcript-backed workflows where useful
- keep Codex as the first-class interactive target
The highest-value first slice is:
f codex openjdwith warm repo-scoped app-serverf codex resolve- config-backed intents
- built-in resolvers for:
- docs intents
- Linear URLs
- session recovery prompts
f codex runtime show
Why this first:
- it removes the most command-memory burden immediately
- it uses Flow’s existing app-server + skills + session foundations
- it keeps the prompt surface thin
- it gives a concrete place to move personal wrapper logic
- p50
f codex openlatency - number of user prompts that required remembering a repo command
- average runtime-context bytes injected per turn
- resolver hit rate
- accepted suggestion rate
- count of active baseline skills versus runtime skills
- full semantic agent routing in the hot path
- unbounded transcript mining into prompt context
- replacing repo executors with Flow clones
- auto-learning every phrase without evidence or approval
The target system is not "more AGENTS text" and not "more commands for the user to remember".
It is:
- thin baseline repo guidance
- a warm Flow Codex control daemon
- deterministic intent/reference resolution
- ephemeral runtime skills
- evidence-backed learning with approval
That is how Flow becomes truly Codex-first while keeping context cost low.