Version: 0.1.0 (OKR Baseline) Updated Date (date -u '+%Y-%m-%dT%H:%M:%SZ'): 2025-11-16T16:44:47Z Status: ✅ OFFICIAL - Production Architecture Specification Authors: ReCodeAgent Team
| Property | Value |
|---|---|
| Document Type | Architecture Design Specification |
| OKR Version | 0.1.0 (MVP Baseline) |
| Target Release | 2025-11-22 (5-Day DEV Cooking Cycle) |
| Based On | ReCode Paper (arXiv:2510.23564v2) + Codex CLI Integration Analysis |
| Supersedes | .artifacts/PROJECT_INIT_20251116/IMPLEMENTATION_ROADMAP.md |
| Related Docs | dev-spec/roadmap/ROADMAP_V2_2025111602.md |
| Scope | Full Production: Real ALFWorld IPC, Complete JSONL (13+ events), Integrated Executor |
ReCodeAgent is a production implementation of the ReCode research paradigm (arXiv:2510.23564v2) that achieves universal granularity control through recursive code generation. This architecture specification defines a high-performance Rust Core + Codex CLI integrated system that:
- Unifies Plans and Actions: Represents both abstract planning (placeholder functions) and concrete execution (primitive actions) in a single code representation
- Enables Dynamic Granularity Control: LLM policy adaptively decides when to plan abstractly vs. commit to specific actions
- Achieves 10-100x Performance: Rust-based orchestration with zero-cost abstractions, <1ms DFS traversal, <50ms AST parsing
- Maintains Production Quality: Deterministic execution, comprehensive JSONL event capture, type-safe state management
Traditional LLM-based agents suffer from fixed granularity:
- ReAct agents: Locked at fine-grained step-by-step execution, no strategic foresight
- Planner-based agents: Rigid plan-execute separation, cannot adapt dynamically
ReCode solves this by treating plans as high-level placeholder functions that recursively refine into finer-grained components until reaching executable primitive actions. This creates an infinite decision space where the agent dynamically controls its reasoning granularity.
┌─────────────────────────────────────────────────────────┐
│ Rust Orchestrator Engine (Week 1-4) │
│ - Persistent Codex thread management │
│ - DFS tree traversal with state machine │
│ - AST-aware code parsing (tree-sitter) │
│ - 10-100x performance vs Python prototype │
└─────────────────────────────────────────────────────────┘
↓ JSONL Events
┌─────────────────────────────────────────────────────────┐
│ Codex CLI Executor (codex exec --json) │
│ - LLM policy for placeholder expansion │
│ - Sandbox: workspace-write, approval: never │
│ - Event stream: command_execution, file_change, etc. │
└─────────────────────────────────────────────────────────┘
↓ Primitive Actions
┌─────────────────────────────────────────────────────────┐
│ Environment Adapters (Week 2) │
│ - ALFWorld (household tasks) │
│ - ScienceWorld (lab experiments) │
│ - CodeEnv (real codebase operations) [Future] │
└─────────────────────────────────────────────────────────┘
We model LLM-based agent interaction as a simplified decision process:
Where:
-
$\mathcal{S}$ : State space -
$\mathcal{A}$ : Primitive action space (executable operations likerun('go to cabinet 1')) -
$\mathcal{O}$ : Observation space -
$T: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}$ : Transition function -
$R: \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$ : Reward function
Beyond primitive actions, we introduce plan space
- Contains high-level intentions requiring decomposition
- Example:
prepare_breakfast(),find_and_take(obj, locations) - Cannot execute directly, must refine into
$\mathcal{A}$ or intermediate$\mathcal{P}$ elements
Decision space:
Decisions form a natural hierarchy from coarse to fine:
TASK-SPECIFICATION (coarsest)
↓
solve("prepare breakfast", observation)
↓
prepare_breakfast() → get_ingredients() + cook_meal()
↓
get_ingredients() → open_refrigerator() + take_eggs()
↓
open_refrigerator() → run('go to refrigerator 1') + run('open refrigerator 1')
↓
PRIMITIVE ACTIONS (finest)
ReCode represents this hierarchy as Python code:
- Plans: Undefined placeholder functions (e.g.,
obj_ID = find_and_take(obj, locations)) - Actions: Environment-specific primitives (e.g.,
obs = run('go to cabinet 1'))
Algorithm 1: ReCode Core Loop
Input: Task T, Policy π (LLM), Environment E, Current Node c
Procedure ReCode(T, π, E, c):
if c is None:
o_0 ← Reset(E) # Initialize environment
c ← Text2Code(T, o_0) # Root: solve(instruction, observation)
end if
code_block ← π(c) # LLM expands current placeholder
for each code_unit u in code_block:
if IsPrimitive(u): # Executable action
Execute(u, E) # Run in environment
else: # Placeholder function
ReCode(T, π, E, u) # Recursive expansion
end if
end for
end procedureKey Properties:
- Unified Representation: Plans and actions both expressed as Python function calls
- Dynamic Granularity: Policy decides when to stop planning and commit to actions
- Context Propagation: Unified variable namespace persists across recursion levels
- Deterministic Execution: Primitive actions execute in environment, placeholders trigger expansion
Codex CLI supports two execution patterns:
1. Non-Interactive Mode (codex exec)
- Single-turn execution: prompt → LLM response → exit
- Default:
--sandbox read-only, no file edits or network - With
--full-auto:--sandbox workspace-write,--ask-for-approval never - Limitation: No conversation context between invocations
2. Session Resume Mode (codex exec resume <THREAD_ID>)
- Preserves conversation context from previous turn
- Maintains thread state in
~/.codex/sessions/<thread_id>/ - Critical for ReCode: Enables variable/action history across recursive expansions
Codex outputs structured events via --json flag:
| Event Type | Data | ReCode Usage |
|---|---|---|
thread.started |
thread_id |
Capture for resume capability |
turn.started |
- | Track turn boundaries |
item.completed (agent_message) |
text |
Extract <think> and <execute> blocks |
item.completed (command_execution) |
command, stdout, stderr, exit_code |
Populate ExecutionContext.actions, detect NeedExpansion |
item.completed (file_change) |
path, diff |
Track file edits (future CodeEnv) |
item.completed (reasoning) |
text |
Observability/debugging |
turn.completed |
usage (tokens) |
Cost tracking |
turn.failed |
error |
Error handling |
Architecture Requirement: Parse ALL 8 event types (not just agent_message)
For ReCode agent autonomy:
codex exec --json \
--sandbox workspace-write \ # Allow file edits in current directory
--ask-for-approval never \ # Auto-approve all operations
"Expand placeholder: find_and_take(obj, locations)"Security Considerations:
workspace-write: Sandboxed to current Git repository +/tmp- Network access: Disabled by default in workspace-write mode
- Trust boundary: Mark repository as trusted after security review
┌──────────────────────────────────────────────────────────────────┐
│ CLI Layer (src/main.rs) │
│ $ recode solve "implement binary tree traversal" │
│ $ recode tree show <task-id> │
│ $ recode exec resume <thread-id> "fix the bug" │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ Orchestrator Layer (src/orchestrator/) │
│ ┌────────────────────────────────────────────────────┐ │
│ │ OrchestratorEngine │ │
│ │ - run() → DFS tree traversal │ │
│ │ - expand_node() → call Codex for placeholder │ │
│ │ - process_node() → execute or detect expansion │ │
│ └────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ Tree Management (src/tree/) │
│ ┌────────────────────┐ ┌──────────────────────┐ │
│ │ CodeTree │ │ ExecutionContext │ │
│ │ - DFS iterator │ │ - variables: Map │ │
│ │ - add_child() │ │ - actions: Vec │ │
│ │ - get_node() │ │ - observations: Vec │ │
│ └────────────────────┘ └──────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ Codex Integration (src/codex/) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ CodexThreadManager │ │
│ │ - start_thread(prompt) → thread_id │ │
│ │ - resume_thread(prompt) → reuse thread_id │ │
│ │ - build_command() → codex exec --json --sandbox ... │ │
│ │ - execute_and_parse() → spawn + parse JSONL │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ CodexEventBus │ │
│ │ - ingest(jsonl_line) → parse event │ │
│ │ - route to ExecutionContext updaters │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ Execution Layer (src/execution/) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ PythonExecutor │ │
│ │ - execute(code, context) → spawn python3 -c │ │
│ │ - build_script() → inject run() + variables │ │
│ │ - parse_success() → extract [RECODE_VARS] │ │
│ │ - parse_failure() → detect [RECODE_NEED_EXPANSION] │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ EnvironmentAdapter (trait) │ │
│ │ - run(action: &str) → Result<String> │ │
│ │ - reset() → Result<String> │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ Analysis Layer (src/analysis/) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ AstCodeSplitter │ │
│ │ - split(code) → Vec<String> (AST top-level nodes) │ │
│ │ - find_placeholders(code) → Vec<String> │ │
│ │ - tree-sitter Python parser │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ Infrastructure (Codex CLI, tokio, tree-sitter) │
└──────────────────────────────────────────────────────────────────┘
Represents a single node in the decision tree.
use serde::{Deserialize, Serialize};
use uuid::Uuid;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum NodeState {
Pending, // Waiting to be processed
NeedsExpansion, // Detected as placeholder, awaiting LLM expansion
Expanded, // LLM expanded, child nodes created
Executing, // Primitive action being executed
Completed, // Successfully completed
Failed, // Execution failed
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CodeNode {
pub id: Uuid,
pub parent_id: Option<Uuid>,
pub children: Vec<Uuid>,
pub state: NodeState,
pub code: String,
pub thought: Option<String>, // LLM reasoning from <think> block
pub result: Option<String>, // Execution output
pub error: Option<String>,
pub depth: usize,
pub created_at: i64,
pub updated_at: i64,
}
impl CodeNode {
pub fn new_root(code: String, depth: usize) -> Self {
Self {
id: Uuid::new_v4(),
parent_id: None,
children: Vec::new(),
state: NodeState::Pending,
code,
thought: None,
result: None,
error: None,
depth,
created_at: chrono::Utc::now().timestamp(),
updated_at: chrono::Utc::now().timestamp(),
}
}
pub fn new_child(parent_id: Uuid, code: String, depth: usize) -> Self {
Self {
id: Uuid::new_v4(),
parent_id: Some(parent_id),
children: Vec::new(),
state: NodeState::Pending,
code,
thought: None,
result: None,
error: None,
depth,
created_at: chrono::Utc::now().timestamp(),
updated_at: chrono::Utc::now().timestamp(),
}
}
}State Machine:
Pending → NeedsExpansion → Expanded → Pending (for children)
Pending → Executing → Completed
Pending → Executing → Failed
Maintains the unified variable namespace across recursion levels.
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct ExecutionContext {
pub variables: HashMap<String, serde_json::Value>,
pub actions: Vec<String>,
pub observations: Vec<String>,
}
impl ExecutionContext {
pub fn new() -> Self {
Self::default()
}
pub fn set_variable(&mut self, key: String, value: serde_json::Value) {
if !Self::is_reserved(&key) {
self.variables.insert(key, value);
}
}
pub fn merge_variables(&mut self, other: HashMap<String, serde_json::Value>) {
for (k, v) in other {
self.set_variable(k, v);
}
}
pub fn add_action(&mut self, action: &str) {
self.actions.push(action.to_string());
}
pub fn add_observation(&mut self, obs: &str) {
self.observations.push(obs.to_string());
}
pub fn formatted_variables(&self) -> String {
if self.variables.is_empty() {
return "(No Variables)".to_string();
}
self.variables
.iter()
.map(|(k, v)| {
let type_str = Self::infer_type(v);
let value_str = serde_json::to_string(v).unwrap_or_default();
let display_value = if value_str.len() > 100 {
format!("{}...", &value_str[..97])
} else {
value_str
};
format!("- {} ({}): {}", k, type_str, display_value)
})
.collect::<Vec<_>>()
.join("\n")
}
fn infer_type(value: &serde_json::Value) -> &'static str {
match value {
serde_json::Value::String(_) => "str",
serde_json::Value::Number(_) => "number",
serde_json::Value::Bool(_) => "bool",
serde_json::Value::Array(_) => "list",
serde_json::Value::Object(_) => "dict",
serde_json::Value::Null => "NoneType",
}
}
fn is_reserved(key: &str) -> bool {
key.starts_with('_') || matches!(key, "run" | "re" | "json" | "sys" | "os")
}
}Context Propagation:
- Parent nodes establish variables (e.g.,
obj = 'apple') - Child nodes inherit parent context via serialization into LLM prompt
- Child execution updates shared namespace (new variables, actions, observations)
Manages persistent Codex CLI threads for context preservation.
use anyhow::Result;
use serde::Deserialize;
use std::path::PathBuf;
use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::Command;
#[derive(Debug, Clone, Copy)]
pub enum SandboxMode {
ReadOnly,
WorkspaceWrite,
DangerFullAccess,
}
impl SandboxMode {
fn as_arg(&self) -> &'static str {
match self {
Self::ReadOnly => "read-only",
Self::WorkspaceWrite => "workspace-write",
Self::DangerFullAccess => "danger-full-access",
}
}
}
#[derive(Debug, Clone, Copy)]
pub enum ApprovalMode {
Never,
OnRequest,
OnFailure,
Untrusted,
}
impl ApprovalMode {
fn as_arg(&self) -> &'static str {
match self {
Self::Never => "never",
Self::OnRequest => "on-request",
Self::OnFailure => "on-failure",
Self::Untrusted => "untrusted",
}
}
}
pub struct CodexThreadManager {
thread_id: Option<String>,
base_dir: PathBuf,
sandbox_mode: SandboxMode,
approval_mode: ApprovalMode,
}
impl CodexThreadManager {
pub fn new(base_dir: PathBuf) -> Self {
Self {
thread_id: None,
base_dir,
sandbox_mode: SandboxMode::WorkspaceWrite,
approval_mode: ApprovalMode::Never,
}
}
pub fn with_sandbox(mut self, mode: SandboxMode) -> Self {
self.sandbox_mode = mode;
self
}
pub fn with_approval(mut self, mode: ApprovalMode) -> Self {
self.approval_mode = mode;
self
}
pub fn has_active_thread(&self) -> bool {
self.thread_id.is_some()
}
pub fn get_thread_id(&self) -> Option<&String> {
self.thread_id.as_ref()
}
/// Start a new thread (first turn)
pub async fn start_thread(&mut self, prompt: &str) -> Result<CodexTurn> {
let mut cmd = self.build_command(prompt, false)?;
let turn = self.execute_and_parse(&mut cmd).await?;
if let Some(id) = &turn.thread_id {
self.thread_id = Some(id.clone());
tracing::info!("Started Codex thread: {}", id);
}
Ok(turn)
}
/// Resume existing thread (subsequent turns)
pub async fn resume_thread(&mut self, prompt: &str) -> Result<CodexTurn> {
if self.thread_id.is_none() {
anyhow::bail!("Cannot resume: no active thread");
}
let mut cmd = self.build_command(prompt, true)?;
self.execute_and_parse(&mut cmd).await
}
fn build_command(&self, prompt: &str, resume: bool) -> Result<Command> {
let mut cmd = Command::new("codex");
cmd.arg("exec")
.arg("--json")
.arg(format!("--sandbox={}", self.sandbox_mode.as_arg()))
.arg(format!("--ask-for-approval={}", self.approval_mode.as_arg()))
.current_dir(&self.base_dir);
if resume {
cmd.arg("resume");
cmd.arg(self.thread_id.as_ref().unwrap());
}
cmd.arg(prompt);
Ok(cmd)
}
async fn execute_and_parse(&self, cmd: &mut Command) -> Result<CodexTurn> {
let mut child = cmd.stdout(std::process::Stdio::piped()).spawn()?;
let stdout = child.stdout.take().unwrap();
let reader = BufReader::new(stdout);
let mut lines = reader.lines();
let mut turn = CodexTurn::default();
while let Some(line) = lines.next_line().await? {
if let Ok(event) = serde_json::from_str::<RawCodexEvent>(&line) {
turn.ingest_event(event);
}
}
child.wait().await?;
Ok(turn)
}
}
#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
enum RawCodexEvent {
#[serde(rename = "thread.started")]
ThreadStarted { thread_id: String },
#[serde(rename = "item.completed")]
ItemCompleted { item: RawThreadItem },
#[serde(rename = "turn.completed")]
TurnCompleted { usage: TokenUsage },
#[serde(rename = "turn.failed")]
TurnFailed { error: String },
#[serde(rename = "error")]
Error { message: String },
#[serde(other)]
Other,
}
#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
enum RawThreadItem {
#[serde(rename = "agent_message")]
AgentMessage { text: String },
#[serde(rename = "command_execution")]
CommandExecution {
command: String,
aggregated_output: String,
exit_code: i32,
},
#[serde(rename = "file_change")]
FileChange {
path: String,
#[serde(default)]
diff: String,
},
#[serde(rename = "reasoning")]
Reasoning { text: String },
#[serde(other)]
Other,
}
#[derive(Debug, Default)]
pub struct CodexTurn {
pub thread_id: Option<String>,
pub agent_messages: Vec<String>,
pub command_executions: Vec<CommandExecution>,
pub file_changes: Vec<FileChange>,
pub reasoning_traces: Vec<String>,
pub token_usage: Option<TokenUsage>,
pub error: Option<String>,
}
impl CodexTurn {
fn ingest_event(&mut self, event: RawCodexEvent) {
match event {
RawCodexEvent::ThreadStarted { thread_id } => {
self.thread_id = Some(thread_id);
}
RawCodexEvent::ItemCompleted { item } => match item {
RawThreadItem::AgentMessage { text } => {
self.agent_messages.push(text);
}
RawThreadItem::CommandExecution {
command,
aggregated_output,
exit_code,
} => {
self.command_executions.push(CommandExecution {
command,
stdout: aggregated_output,
exit_code,
});
}
RawThreadItem::FileChange { path, diff } => {
self.file_changes.push(FileChange { path, diff });
}
RawThreadItem::Reasoning { text } => {
self.reasoning_traces.push(text);
}
RawThreadItem::Other => {}
},
RawCodexEvent::TurnCompleted { usage } => {
self.token_usage = Some(usage);
}
RawCodexEvent::TurnFailed { error } => {
self.error = Some(error);
}
RawCodexEvent::Error { message } => {
self.error = Some(message);
}
RawCodexEvent::Other => {}
}
}
}
#[derive(Debug)]
pub struct CommandExecution {
pub command: String,
pub stdout: String,
pub exit_code: i32,
}
#[derive(Debug)]
pub struct FileChange {
pub path: String,
pub diff: String,
}
#[derive(Debug, Deserialize)]
pub struct TokenUsage {
pub input_tokens: u64,
pub output_tokens: u64,
}Critical Design Decisions:
- Persistent Thread:
thread_idstored in struct, reused viaresume - Comprehensive Event Parsing: ALL JSONL events captured (not just
agent_message) - Sandbox Configuration: Explicit
--sandbox workspace-writeallows file edits - Auto-Approval:
--ask-for-approval neverfor autonomous agent operation
Executes Python code blocks with real environment bindings.
use anyhow::Result;
use std::collections::HashMap;
use std::sync::Arc;
use tokio::process::Command;
use super::env_adapter::EnvironmentAdapter;
use crate::tree::ExecutionContext;
pub struct PythonExecutor {
env_adapter: Arc<dyn EnvironmentAdapter>,
}
impl PythonExecutor {
pub fn new(env: Arc<dyn EnvironmentAdapter>) -> Self {
Self { env_adapter: env }
}
pub async fn execute(
&self,
code: &str,
context: &ExecutionContext,
) -> Result<ExecutionResult> {
let script = self.build_script(code, context)?;
let output = Command::new("python3")
.arg("-c")
.arg(&script)
.output()
.await?;
if output.status.success() {
self.parse_success(&output.stderr)
} else {
self.parse_failure(&output.stderr, code)
}
}
fn build_script(&self, code: &str, context: &ExecutionContext) -> Result<String> {
let vars_setup = context
.variables
.iter()
.map(|(k, v)| format!("{} = {}", k, serde_json::to_string(v).unwrap()))
.collect::<Vec<_>>()
.join("\n");
let code_escaped = code.replace("\\", "\\\\").replace("\"", "\\\"");
Ok(format!(
r#"
import re, json, sys
# Context variables
{vars_setup}
# run() implementation (calls Rust env adapter via marker)
def run(action: str) -> str:
print(f"[RECODE_ACTION] {{action}}", file=sys.stderr)
# TODO: IPC with Rust EnvironmentAdapter
return f"Executed: {{action}}"
# User code
try:
{code}
# Export variables
_exported = {{k: v for k, v in locals().items() if not k.startswith('_') and k not in ['run', 're', 'json', 'sys']}}
print('[RECODE_VARS]' + json.dumps(_exported), file=sys.stderr)
except NameError as e:
match = re.search(r"name '(.+?)' is not defined", str(e))
if match and f"{{match.group(1)}}(" in """{code_escaped}""":
print(f"[RECODE_NEED_EXPANSION] {{match.group(1)}}", file=sys.stderr)
raise
"#,
vars_setup = vars_setup,
code = code
))
}
fn parse_success(&self, stderr: &[u8]) -> Result<ExecutionResult> {
let stderr_str = String::from_utf8_lossy(stderr);
let mut variables = HashMap::new();
if let Some(idx) = stderr_str.find("[RECODE_VARS]") {
let json_str = &stderr_str[idx + 13..];
if let Some(end) = json_str.find('\n') {
if let Ok(vars) = serde_json::from_str::<HashMap<String, serde_json::Value>>(&json_str[..end]) {
variables = vars;
}
}
}
let actions = stderr_str
.lines()
.filter_map(|line| {
if line.contains("[RECODE_ACTION]") {
Some(line.replace("[RECODE_ACTION]", "").trim().to_string())
} else {
None
}
})
.collect();
Ok(ExecutionResult {
success: true,
variables,
actions,
error: None,
})
}
fn parse_failure(&self, stderr: &[u8], code: &str) -> Result<ExecutionResult> {
let stderr_str = String::from_utf8_lossy(stderr);
if let Some(idx) = stderr_str.find("[RECODE_NEED_EXPANSION]") {
let func_name = stderr_str[idx + 23..]
.lines()
.next()
.unwrap_or("")
.trim();
return Ok(ExecutionResult {
success: false,
error: Some(format!("NeedExpansion: `{}` needs to be expanded.", func_name)),
variables: HashMap::new(),
actions: vec![],
});
}
Ok(ExecutionResult {
success: false,
error: Some(stderr_str.to_string()),
variables: HashMap::new(),
actions: vec![],
})
}
}
#[derive(Debug)]
pub struct ExecutionResult {
pub success: bool,
pub variables: HashMap<String, serde_json::Value>,
pub actions: Vec<String>,
pub error: Option<String>,
}Critical Features:
- Local Execution: Runs Python via
tokio::process::Command(not via Codex) - Real run() Binding: Injected function calls
EnvironmentAdapter(future: IPC) - Variable Export: Parses
[RECODE_VARS]from stderr - NeedExpansion Detection: Catches
NameErrorfor undefined function calls
Parses Python code into AST nodes for correct splitting.
use anyhow::Result;
use tree_sitter::{Language, Parser, Node};
extern "C" {
fn tree_sitter_python() -> Language;
}
pub struct AstCodeSplitter {
parser: Parser,
}
impl AstCodeSplitter {
pub fn new() -> Result<Self> {
let mut parser = Parser::new();
parser
.set_language(unsafe { tree_sitter_python() })
.map_err(|e| anyhow::anyhow!("Failed to set language: {}", e))?;
Ok(Self { parser })
}
pub fn split(&mut self, code: &str) -> Result<Vec<String>> {
let tree = self
.parser
.parse(code, None)
.ok_or_else(|| anyhow::anyhow!("Failed to parse code"))?;
let root = tree.root_node();
let mut blocks = Vec::new();
for i in 0..root.child_count() {
if let Some(child) = root.child(i) {
let block = &code[child.byte_range()];
blocks.push(block.to_string());
}
}
Ok(blocks)
}
pub fn find_placeholders(&mut self, code: &str) -> Result<Vec<String>> {
let tree = self.parser.parse(code, None).unwrap();
let mut placeholders = Vec::new();
let mut cursor = tree.walk();
self.visit_node(&mut cursor, code, &mut placeholders);
Ok(placeholders)
}
fn visit_node(
&self,
cursor: &mut tree_sitter::TreeCursor,
code: &str,
placeholders: &mut Vec<String>,
) {
let node = cursor.node();
if node.kind() == "call" {
if let Some(func_node) = node.child_by_field_name("function") {
let func_name = &code[func_node.byte_range()];
if !Self::is_builtin(func_name) && func_name != "run" {
placeholders.push(func_name.to_string());
}
}
}
if cursor.goto_first_child() {
loop {
self.visit_node(cursor, code, placeholders);
if !cursor.goto_next_sibling() {
break;
}
}
cursor.goto_parent();
}
}
fn is_builtin(name: &str) -> bool {
matches!(
name,
"print" | "len" | "range" | "enumerate" | "zip" | "list" | "dict" | "str" | "int"
)
}
}Critical Features:
- AST-Based Splitting: Each top-level statement becomes a code block (not blank lines)
- Multi-Line Support: Correctly handles multi-line expressions, loops, conditionals
- Placeholder Detection: Identifies undefined function calls via AST traversal
Objective: Persistent thread management + comprehensive event parsing + deterministic execution
-
CodexThreadManager (
src/codex/thread_manager.rs)start_thread(prompt)→ capturesthread_idresume_thread(prompt)→ reusesthread_id- Parse all 8 JSONL event types
- Test: Integration test showing
thread_idpersists across 3 turns
-
ExecutionContext (
src/tree/context.rs)add_action(command)populated fromcommand_executioneventsadd_observation(stdout)populated from command outputformatted_variables()serializes for LLM prompts- Test: Verify variables + actions + observations all tracked
-
CodeNode & CodeTree (
src/tree/node.rs,src/tree/tree.rs)- State machine: Pending → NeedsExpansion → Expanded → Completed/Failed
- DFS iterator
- Test: Build 3-level tree, verify DFS order
-
PythonExecutor (
src/execution/python_executor.rs)execute(code, context)→ spawnpython3 -c- Inject real
run()function - Parse
[RECODE_VARS],[RECODE_ACTION],[RECODE_NEED_EXPANSION] - Test: Execute code with variables, verify export
-
EnvironmentAdapter (
src/execution/env_adapter.rs)- Trait:
run(action) → Result<String> - Mock adapter for testing
- Test: Mock adapter returns expected observations
- Trait:
-
ALFWorld Adapter (
src/environments/alfworld.rs)- Subprocess integration with Python ALFWorld
run(action)callsenv.step(action)- Test: Run
go to cabinet 1, verify observation
Acceptance Criteria:
- ✅
thread_idreused across 3 recursive calls (integration test) - ✅
command_executionevents populateExecutionContext.actions - ✅ ALFWorld task executes real commands
Objective: AST-based code splitting + complete DFS orchestration loop
-
AstCodeSplitter (
src/analysis/code_splitter.rs)split(code)→ Vec of top-level AST nodesfind_placeholders(code)→ Vec of undefined functions- Test: Multi-line expressions, loops, conditionals
-
PromptBuilder (
src/codex/prompt_builder.rs)build_expansion_prompt(node, context)→ formatted string- Include few-shot examples from
.dev-docs/agents/recode/resources/ - Format variables as
- name (type): value - Test: Verify prompt contains all required sections
-
OrchestratorEngine (
src/orchestrator/engine.rs)run()→ DFS traversal with state machineexpand_node()→ call Codex + parse response + create childrenprocess_node()→ execute or detect expansion- Error handling with retry logic
- Test: 5-level deep tree, verify all nodes visited
-
End-to-End Integration Test
- Run full ALFWorld task (pick_and_place_simple)
- Verify tree grows to 3+ levels
- Verify task succeeds with score >0
- Benchmark: DFS traversal <10ms for 100-node tree
Acceptance Criteria:
- ✅ AST parsing: all test cases pass (multi-line, loops, conditionals)
- ✅ E2E test: solve 1 ALFWorld task with recursive expansion
- ✅ Performance: DFS <10ms (100 nodes)
Objective: Expose Rust core via RPC, TypeScript client library
-
RpcServer (
src/api/rpc_server.rs)- TCP listener on
localhost:9000 - Handle:
Solve,GetTree,ResumeTask,GetStatus - Serialize
CodeTreeto JSON - Test: Call via curl, verify response
- TCP listener on
-
TypeScript Client (
clients/typescript/)RecodeClient.solve(task, context)RecodeClient.getTree(taskId)- Stream events via WebSocket
- Test: TS integration test calls Rust backend
-
CLI Tool (
src/bin/recode.rs)recode solve "task description"recode tree show <task-id>recode exec --json "task"(for CI)- Test: CLI completes 1 task end-to-end
Acceptance Criteria:
- ✅ RPC server handles concurrent requests
- ✅ TS client successfully calls Rust backend
- ✅ CLI completes ALFWorld task
Objective: Meet 10-100x speedup targets
-
Benchmarks (
benches/)- DFS traversal: 1000 nodes <1ms
- AST parsing: 5000 lines <50ms
- Memory: 10-layer tree <50MB
- Tool: Criterion for statistical analysis
-
Optimization
- Arena allocator for
CodeNode(reduce heap allocations) - LRU cache for LLM responses
- Concurrent expansion of independent siblings (Tokio tasks)
- Tool: Flamegraph analysis
- Arena allocator for
Acceptance Criteria:
- ✅ All performance targets met (DFS <1ms, AST <50ms, Memory <50MB)
- ✅ Benchmarks tracked in CI
-
Integration Tests (
tests/integration/)- ALFWorld: pick_and_place, clean, heat_then_cool
- ScienceWorld: boil_water, grow_plant
- WebShop: search_product, add_to_cart
- Error recovery: retry on failure
-
CI Pipeline (
.github/workflows/ci.yml)- Run
cargo test+cargo clippy+cargo bench - Run integration tests in Docker
- Publish binaries on release
- Run
Acceptance Criteria:
- ✅ All tests pass in CI
- ✅ Coverage >80%
-
Release Preparation
- Version bump to
0.1.0-alpha - CHANGELOG.md
- GitHub release with binaries
- Version bump to
-
Docker Image
Dockerfilewith Rust + Python + Codex CLI- Publish to Docker Hub
-
Documentation
- README.md with quick start
- Examples: simple_task, alfworld_task, custom_env
- Architecture diagram
Acceptance Criteria:
- ✅
cargo install recode-coreworks - ✅ Docker:
docker run recode/core solve "task" - ✅ README comprehensive, examples runnable
| Metric | Current (Python) | Target (Rust) | Measurement Method |
|---|---|---|---|
| DFS Traversal (1000 nodes) | ~100ms | <1ms | Criterion benchmark |
| AST Parsing (5000 lines) | ~500ms | <50ms | Criterion benchmark |
| Memory (10-layer tree) | ~500MB | <50MB | Valgrind / heaptrack |
| Thread Creation | N/A (new process each time) | Persistent (reuse) | Integration test |
| Event Capture Rate | 12.5% (1/8 types) | 100% (8/8 types) | Unit test |
| Primitive Action Execution | 0% (mock) | 100% (real) | E2E test |
Default Settings:
- Sandbox:
workspace-write(allows edits in Git repo +/tmp) - Approval:
never(auto-approve for agent autonomy) - Network: Disabled in
workspace-writemode
Trust Boundaries:
- Repository must be marked as trusted (security review required)
- All file edits confined to current Git repository
- Commands executed in sandboxed environment (macOS: Seatbelt, Linux: Landlock)
Graceful Degradation:
- LLM failures: Retry up to
max_retrytimes (default: 5) - Sandbox violations: Log error, mark node as Failed
- Recursion limit: Max depth 10 (prevent infinite loops)
Observability:
- Structured logging via
tracing - All JSONL events captured for debugging
- Execution tree serialized for post-mortem analysis
Replace ALFWorld/ScienceWorld with real coding environment:
- File system operations (read, write, edit)
- Git operations (commit, branch, diff)
- Bash command execution
- Test runner integration
Extend to handle images, diagrams, UI screenshots:
codex exec --image <path>for visual inputs- Placeholder functions for image analysis tasks
Scale to large task trees:
- Parallel expansion of independent siblings
- Distributed state management (Redis/PostgreSQL)
- Load balancing across multiple Codex instances
- ReCode Paper: arXiv:2510.23564v2 - "ReCode: Unify Plan and Action for Universal Granularity Control"
- Cognitive Science: Prinz (1997), Koechlin & Summerfield (2003), Badre & D'Esposito (2009)
- Codex CLI Docs:
.knowledge/codex/docs/exec.md,sandbox.md,authentication.md - Codex SDK:
.knowledge/codex/sdk/typescript/README.md - Python Prototype:
.dev-docs/agents/recode/agent.py,utils.py,executor.py
- Analysis:
ARCHITECTURE_GAPS_ANALYSIS.md(comprehensive gap analysis) - Roadmap:
dev-spec/roadmap/ROADMAP_V2_2025111602.md(8-week plan) - Summary:
EXEC_SUMMARY_OPTIMIZATIONS.md(executive brief)
[package]
name = "recode-core"
version = "0.1.0"
edition = "2021"
rust-version = "1.75"
authors = ["ReCodeAgent Team"]
description = "Production implementation of ReCode recursive code generation paradigm"
repository = "https://github.com/your-org/recode-agent"
license = "MIT"
[dependencies]
# Async runtime
tokio = { version = "1.35", features = ["full"] }
# CLI & Config
clap = { version = "4.4", features = ["derive"] }
# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
# Error handling
anyhow = "1.0"
thiserror = "1.0"
# IDs & Time
uuid = { version = "1.6", features = ["v4", "serde"] }
chrono = { version = "0.4", features = ["serde"] }
# AST Parsing
tree-sitter = "0.20"
tree-sitter-python = "0.20"
# Logging
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
# Performance
lru = "0.12"
[dev-dependencies]
criterion = "0.5"
tempfile = "3.8"
[[bench]]
name = "tree_traversal"
harness = false
[[bench]]
name = "ast_parsing"
harness = false
[profile.release]
lto = true
codegen-units = 1
opt-level = 3recode-core/
├── Cargo.toml
├── src/
│ ├── main.rs # CLI entrypoint
│ ├── lib.rs # Library exports
│ │
│ ├── orchestrator/ # Core orchestration
│ │ ├── mod.rs
│ │ ├── engine.rs # OrchestratorEngine
│ │ ├── scheduler.rs # Concurrent task scheduling
│ │ └── state.rs # NodeState state machine
│ │
│ ├── tree/ # CodeNode tree structures
│ │ ├── mod.rs
│ │ ├── node.rs # CodeNode definition
│ │ ├── tree.rs # CodeTree with DFS iterator
│ │ ├── context.rs # ExecutionContext
│ │ └── arena.rs # Arena allocator
│ │
│ ├── codex/ # Codex CLI integration
│ │ ├── mod.rs
│ │ ├── thread_manager.rs # CodexThreadManager
│ │ ├── event_bus.rs # JSONL event parser
│ │ ├── prompt_builder.rs # Expansion prompts
│ │ └── auth.rs # Authentication
│ │
│ ├── execution/ # Code execution
│ │ ├── mod.rs
│ │ ├── python_executor.rs # PythonExecutor
│ │ └── env_adapter.rs # EnvironmentAdapter trait
│ │
│ ├── analysis/ # Code analysis
│ │ ├── mod.rs
│ │ ├── code_splitter.rs # AstCodeSplitter
│ │ └── need_expansion.rs # NeedExpansion detector
│ │
│ ├── environments/ # Environment adapters
│ │ ├── mod.rs
│ │ ├── alfworld.rs # ALFWorld adapter
│ │ ├── sciworld.rs # ScienceWorld adapter
│ │ └── mock.rs # Mock environment
│ │
│ └── api/ # RPC server
│ ├── mod.rs
│ └── rpc_server.rs # JSON-RPC server
│
├── tests/
│ ├── integration/
│ │ ├── thread_persistence.rs
│ │ ├── alfworld_task.rs
│ │ └── ast_parsing.rs
│ └── fixtures/
│ └── prompts/
│
├── benches/
│ ├── tree_traversal.rs
│ └── ast_parsing.rs
│
└── examples/
├── simple_task.rs
├── alfworld_task.rs
└── custom_env.rs
Document Status: ✅ OFFICIAL v0.1.0 Approval: Lead Architect Implementation Start: 2025-11-18 (Week 1) Expected Completion: 2026-01-17 (Week 8)