Skip to content

Latest commit

 

History

History
1389 lines (1111 loc) · 48.8 KB

File metadata and controls

1389 lines (1111 loc) · 48.8 KB

ReCodeAgent Architecture Design Document

Version: 0.1.0 (OKR Baseline) Updated Date (date -u '+%Y-%m-%dT%H:%M:%SZ'): 2025-11-16T16:44:47Z Status: ✅ OFFICIAL - Production Architecture Specification Authors: ReCodeAgent Team


DOCUMENT CONTROL

Property Value
Document Type Architecture Design Specification
OKR Version 0.1.0 (MVP Baseline)
Target Release 2025-11-22 (5-Day DEV Cooking Cycle)
Based On ReCode Paper (arXiv:2510.23564v2) + Codex CLI Integration Analysis
Supersedes .artifacts/PROJECT_INIT_20251116/IMPLEMENTATION_ROADMAP.md
Related Docs dev-spec/roadmap/ROADMAP_V2_2025111602.md
Scope Full Production: Real ALFWorld IPC, Complete JSONL (13+ events), Integrated Executor

EXECUTIVE SUMMARY

ReCodeAgent is a production implementation of the ReCode research paradigm (arXiv:2510.23564v2) that achieves universal granularity control through recursive code generation. This architecture specification defines a high-performance Rust Core + Codex CLI integrated system that:

  1. Unifies Plans and Actions: Represents both abstract planning (placeholder functions) and concrete execution (primitive actions) in a single code representation
  2. Enables Dynamic Granularity Control: LLM policy adaptively decides when to plan abstractly vs. commit to specific actions
  3. Achieves 10-100x Performance: Rust-based orchestration with zero-cost abstractions, <1ms DFS traversal, <50ms AST parsing
  4. Maintains Production Quality: Deterministic execution, comprehensive JSONL event capture, type-safe state management

Core Innovation

Traditional LLM-based agents suffer from fixed granularity:

  • ReAct agents: Locked at fine-grained step-by-step execution, no strategic foresight
  • Planner-based agents: Rigid plan-execute separation, cannot adapt dynamically

ReCode solves this by treating plans as high-level placeholder functions that recursively refine into finer-grained components until reaching executable primitive actions. This creates an infinite decision space where the agent dynamically controls its reasoning granularity.

Architecture Highlights

┌─────────────────────────────────────────────────────────┐
│  Rust Orchestrator Engine (Week 1-4)                    │
│  - Persistent Codex thread management                   │
│  - DFS tree traversal with state machine                │
│  - AST-aware code parsing (tree-sitter)                 │
│  - 10-100x performance vs Python prototype              │
└─────────────────────────────────────────────────────────┘
                        ↓ JSONL Events
┌─────────────────────────────────────────────────────────┐
│  Codex CLI Executor (codex exec --json)                 │
│  - LLM policy for placeholder expansion                 │
│  - Sandbox: workspace-write, approval: never            │
│  - Event stream: command_execution, file_change, etc.   │
└─────────────────────────────────────────────────────────┘
                        ↓ Primitive Actions
┌─────────────────────────────────────────────────────────┐
│  Environment Adapters (Week 2)                          │
│  - ALFWorld (household tasks)                           │
│  - ScienceWorld (lab experiments)                       │
│  - CodeEnv (real codebase operations) [Future]          │
└─────────────────────────────────────────────────────────┘

1. ARCHITECTURE FOUNDATIONS

1.1 ReCode Methodology Primer

1.1.1 Decision Space Formulation

We model LLM-based agent interaction as a simplified decision process:

$$\mathcal{M} = \langle \mathcal{S}, \mathcal{A}, \mathcal{O}, T, R \rangle$$

Where:

  • $\mathcal{S}$: State space
  • $\mathcal{A}$: Primitive action space (executable operations like run('go to cabinet 1'))
  • $\mathcal{O}$: Observation space
  • $T: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}$: Transition function
  • $R: \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$: Reward function

Beyond primitive actions, we introduce plan space $\mathcal{P}$:

  • Contains high-level intentions requiring decomposition
  • Example: prepare_breakfast(), find_and_take(obj, locations)
  • Cannot execute directly, must refine into $\mathcal{A}$ or intermediate $\mathcal{P}$ elements

Decision space: $\mathcal{D} = \mathcal{A} \cup \mathcal{P}$

1.1.2 Granularity Hierarchy

Decisions form a natural hierarchy from coarse to fine:

TASK-SPECIFICATION (coarsest)
  ↓
solve("prepare breakfast", observation)
  ↓
prepare_breakfast() → get_ingredients() + cook_meal()
  ↓
get_ingredients() → open_refrigerator() + take_eggs()
  ↓
open_refrigerator() → run('go to refrigerator 1') + run('open refrigerator 1')
  ↓
PRIMITIVE ACTIONS (finest)

ReCode represents this hierarchy as Python code:

  • Plans: Undefined placeholder functions (e.g., obj_ID = find_and_take(obj, locations))
  • Actions: Environment-specific primitives (e.g., obs = run('go to cabinet 1'))

1.1.3 Recursive Expansion Algorithm

Algorithm 1: ReCode Core Loop

Input: Task T, Policy π (LLM), Environment E, Current Node c
Procedure ReCode(T, π, E, c):
    if c is None:
        o_0Reset(E)                    # Initialize environment
        cText2Code(T, o_0)             # Root: solve(instruction, observation)
    end if

    code_blockπ(c)                     # LLM expands current placeholder

    for each code_unit u in code_block:
        if IsPrimitive(u):                # Executable action
            Execute(u, E)                 # Run in environment
        else:                             # Placeholder function
            ReCode(T, π, E, u)            # Recursive expansion
        end if
    end for
end procedure

Key Properties:

  1. Unified Representation: Plans and actions both expressed as Python function calls
  2. Dynamic Granularity: Policy decides when to stop planning and commit to actions
  3. Context Propagation: Unified variable namespace persists across recursion levels
  4. Deterministic Execution: Primitive actions execute in environment, placeholders trigger expansion

1.2 Codex CLI Integration Model

1.2.1 Codex Execution Modes

Codex CLI supports two execution patterns:

1. Non-Interactive Mode (codex exec)

  • Single-turn execution: prompt → LLM response → exit
  • Default: --sandbox read-only, no file edits or network
  • With --full-auto: --sandbox workspace-write, --ask-for-approval never
  • Limitation: No conversation context between invocations

2. Session Resume Mode (codex exec resume <THREAD_ID>)

  • Preserves conversation context from previous turn
  • Maintains thread state in ~/.codex/sessions/<thread_id>/
  • Critical for ReCode: Enables variable/action history across recursive expansions

1.2.2 JSONL Event Stream Protocol

Codex outputs structured events via --json flag:

Event Type Data ReCode Usage
thread.started thread_id Capture for resume capability
turn.started - Track turn boundaries
item.completed (agent_message) text Extract <think> and <execute> blocks
item.completed (command_execution) command, stdout, stderr, exit_code Populate ExecutionContext.actions, detect NeedExpansion
item.completed (file_change) path, diff Track file edits (future CodeEnv)
item.completed (reasoning) text Observability/debugging
turn.completed usage (tokens) Cost tracking
turn.failed error Error handling

Architecture Requirement: Parse ALL 8 event types (not just agent_message)

1.2.3 Sandbox & Approval Configuration

For ReCode agent autonomy:

codex exec --json \
  --sandbox workspace-write \        # Allow file edits in current directory
  --ask-for-approval never \         # Auto-approve all operations
  "Expand placeholder: find_and_take(obj, locations)"

Security Considerations:

  • workspace-write: Sandboxed to current Git repository + /tmp
  • Network access: Disabled by default in workspace-write mode
  • Trust boundary: Mark repository as trusted after security review

2. SYSTEM ARCHITECTURE

2.1 Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│                      CLI Layer (src/main.rs)                     │
│  $ recode solve "implement binary tree traversal"                │
│  $ recode tree show <task-id>                                    │
│  $ recode exec resume <thread-id> "fix the bug"                  │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│              Orchestrator Layer (src/orchestrator/)              │
│  ┌────────────────────────────────────────────────────┐          │
│  │  OrchestratorEngine                                │          │
│  │  - run() → DFS tree traversal                      │          │
│  │  - expand_node() → call Codex for placeholder      │          │
│  │  - process_node() → execute or detect expansion    │          │
│  └────────────────────────────────────────────────────┘          │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│               Tree Management (src/tree/)                        │
│  ┌────────────────────┐  ┌──────────────────────┐                │
│  │  CodeTree          │  │  ExecutionContext    │                │
│  │  - DFS iterator    │  │  - variables: Map    │                │
│  │  - add_child()     │  │  - actions: Vec      │                │
│  │  - get_node()      │  │  - observations: Vec │                │
│  └────────────────────┘  └──────────────────────┘                │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│           Codex Integration (src/codex/)                         │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  CodexThreadManager                                  │        │
│  │  - start_thread(prompt) → thread_id                  │        │
│  │  - resume_thread(prompt) → reuse thread_id           │        │
│  │  - build_command() → codex exec --json --sandbox ... │        │
│  │  - execute_and_parse() → spawn + parse JSONL         │        │
│  └──────────────────────────────────────────────────────┘        │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  CodexEventBus                                       │        │
│  │  - ingest(jsonl_line) → parse event                  │        │
│  │  - route to ExecutionContext updaters                │        │
│  └──────────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│            Execution Layer (src/execution/)                      │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  PythonExecutor                                      │        │
│  │  - execute(code, context) → spawn python3 -c         │        │
│  │  - build_script() → inject run() + variables         │        │
│  │  - parse_success() → extract [RECODE_VARS]           │        │
│  │  - parse_failure() → detect [RECODE_NEED_EXPANSION]  │        │
│  └──────────────────────────────────────────────────────┘        │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  EnvironmentAdapter (trait)                          │        │
│  │  - run(action: &str) → Result<String>                │        │
│  │  - reset() → Result<String>                          │        │
│  └──────────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│          Analysis Layer (src/analysis/)                          │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  AstCodeSplitter                                     │        │
│  │  - split(code) → Vec<String> (AST top-level nodes)   │        │
│  │  - find_placeholders(code) → Vec<String>             │        │
│  │  - tree-sitter Python parser                         │        │
│  └──────────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│        Infrastructure (Codex CLI, tokio, tree-sitter)            │
└──────────────────────────────────────────────────────────────────┘

2.2 Core Components Specification

2.2.1 CodeNode (src/tree/node.rs)

Represents a single node in the decision tree.

use serde::{Deserialize, Serialize};
use uuid::Uuid;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum NodeState {
    Pending,           // Waiting to be processed
    NeedsExpansion,    // Detected as placeholder, awaiting LLM expansion
    Expanded,          // LLM expanded, child nodes created
    Executing,         // Primitive action being executed
    Completed,         // Successfully completed
    Failed,            // Execution failed
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CodeNode {
    pub id: Uuid,
    pub parent_id: Option<Uuid>,
    pub children: Vec<Uuid>,
    pub state: NodeState,
    pub code: String,
    pub thought: Option<String>,        // LLM reasoning from <think> block
    pub result: Option<String>,         // Execution output
    pub error: Option<String>,
    pub depth: usize,
    pub created_at: i64,
    pub updated_at: i64,
}

impl CodeNode {
    pub fn new_root(code: String, depth: usize) -> Self {
        Self {
            id: Uuid::new_v4(),
            parent_id: None,
            children: Vec::new(),
            state: NodeState::Pending,
            code,
            thought: None,
            result: None,
            error: None,
            depth,
            created_at: chrono::Utc::now().timestamp(),
            updated_at: chrono::Utc::now().timestamp(),
        }
    }

    pub fn new_child(parent_id: Uuid, code: String, depth: usize) -> Self {
        Self {
            id: Uuid::new_v4(),
            parent_id: Some(parent_id),
            children: Vec::new(),
            state: NodeState::Pending,
            code,
            thought: None,
            result: None,
            error: None,
            depth,
            created_at: chrono::Utc::now().timestamp(),
            updated_at: chrono::Utc::now().timestamp(),
        }
    }
}

State Machine:

Pending → NeedsExpansion → Expanded → Pending (for children)
Pending → Executing → Completed
Pending → Executing → Failed

2.2.2 ExecutionContext (src/tree/context.rs)

Maintains the unified variable namespace across recursion levels.

use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct ExecutionContext {
    pub variables: HashMap<String, serde_json::Value>,
    pub actions: Vec<String>,
    pub observations: Vec<String>,
}

impl ExecutionContext {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn set_variable(&mut self, key: String, value: serde_json::Value) {
        if !Self::is_reserved(&key) {
            self.variables.insert(key, value);
        }
    }

    pub fn merge_variables(&mut self, other: HashMap<String, serde_json::Value>) {
        for (k, v) in other {
            self.set_variable(k, v);
        }
    }

    pub fn add_action(&mut self, action: &str) {
        self.actions.push(action.to_string());
    }

    pub fn add_observation(&mut self, obs: &str) {
        self.observations.push(obs.to_string());
    }

    pub fn formatted_variables(&self) -> String {
        if self.variables.is_empty() {
            return "(No Variables)".to_string();
        }

        self.variables
            .iter()
            .map(|(k, v)| {
                let type_str = Self::infer_type(v);
                let value_str = serde_json::to_string(v).unwrap_or_default();
                let display_value = if value_str.len() > 100 {
                    format!("{}...", &value_str[..97])
                } else {
                    value_str
                };
                format!("- {} ({}): {}", k, type_str, display_value)
            })
            .collect::<Vec<_>>()
            .join("\n")
    }

    fn infer_type(value: &serde_json::Value) -> &'static str {
        match value {
            serde_json::Value::String(_) => "str",
            serde_json::Value::Number(_) => "number",
            serde_json::Value::Bool(_) => "bool",
            serde_json::Value::Array(_) => "list",
            serde_json::Value::Object(_) => "dict",
            serde_json::Value::Null => "NoneType",
        }
    }

    fn is_reserved(key: &str) -> bool {
        key.starts_with('_') || matches!(key, "run" | "re" | "json" | "sys" | "os")
    }
}

Context Propagation:

  • Parent nodes establish variables (e.g., obj = 'apple')
  • Child nodes inherit parent context via serialization into LLM prompt
  • Child execution updates shared namespace (new variables, actions, observations)

2.2.3 CodexThreadManager (src/codex/thread_manager.rs)

Manages persistent Codex CLI threads for context preservation.

use anyhow::Result;
use serde::Deserialize;
use std::path::PathBuf;
use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::Command;

#[derive(Debug, Clone, Copy)]
pub enum SandboxMode {
    ReadOnly,
    WorkspaceWrite,
    DangerFullAccess,
}

impl SandboxMode {
    fn as_arg(&self) -> &'static str {
        match self {
            Self::ReadOnly => "read-only",
            Self::WorkspaceWrite => "workspace-write",
            Self::DangerFullAccess => "danger-full-access",
        }
    }
}

#[derive(Debug, Clone, Copy)]
pub enum ApprovalMode {
    Never,
    OnRequest,
    OnFailure,
    Untrusted,
}

impl ApprovalMode {
    fn as_arg(&self) -> &'static str {
        match self {
            Self::Never => "never",
            Self::OnRequest => "on-request",
            Self::OnFailure => "on-failure",
            Self::Untrusted => "untrusted",
        }
    }
}

pub struct CodexThreadManager {
    thread_id: Option<String>,
    base_dir: PathBuf,
    sandbox_mode: SandboxMode,
    approval_mode: ApprovalMode,
}

impl CodexThreadManager {
    pub fn new(base_dir: PathBuf) -> Self {
        Self {
            thread_id: None,
            base_dir,
            sandbox_mode: SandboxMode::WorkspaceWrite,
            approval_mode: ApprovalMode::Never,
        }
    }

    pub fn with_sandbox(mut self, mode: SandboxMode) -> Self {
        self.sandbox_mode = mode;
        self
    }

    pub fn with_approval(mut self, mode: ApprovalMode) -> Self {
        self.approval_mode = mode;
        self
    }

    pub fn has_active_thread(&self) -> bool {
        self.thread_id.is_some()
    }

    pub fn get_thread_id(&self) -> Option<&String> {
        self.thread_id.as_ref()
    }

    /// Start a new thread (first turn)
    pub async fn start_thread(&mut self, prompt: &str) -> Result<CodexTurn> {
        let mut cmd = self.build_command(prompt, false)?;
        let turn = self.execute_and_parse(&mut cmd).await?;

        if let Some(id) = &turn.thread_id {
            self.thread_id = Some(id.clone());
            tracing::info!("Started Codex thread: {}", id);
        }

        Ok(turn)
    }

    /// Resume existing thread (subsequent turns)
    pub async fn resume_thread(&mut self, prompt: &str) -> Result<CodexTurn> {
        if self.thread_id.is_none() {
            anyhow::bail!("Cannot resume: no active thread");
        }

        let mut cmd = self.build_command(prompt, true)?;
        self.execute_and_parse(&mut cmd).await
    }

    fn build_command(&self, prompt: &str, resume: bool) -> Result<Command> {
        let mut cmd = Command::new("codex");
        cmd.arg("exec")
            .arg("--json")
            .arg(format!("--sandbox={}", self.sandbox_mode.as_arg()))
            .arg(format!("--ask-for-approval={}", self.approval_mode.as_arg()))
            .current_dir(&self.base_dir);

        if resume {
            cmd.arg("resume");
            cmd.arg(self.thread_id.as_ref().unwrap());
        }

        cmd.arg(prompt);

        Ok(cmd)
    }

    async fn execute_and_parse(&self, cmd: &mut Command) -> Result<CodexTurn> {
        let mut child = cmd.stdout(std::process::Stdio::piped()).spawn()?;

        let stdout = child.stdout.take().unwrap();
        let reader = BufReader::new(stdout);
        let mut lines = reader.lines();

        let mut turn = CodexTurn::default();

        while let Some(line) = lines.next_line().await? {
            if let Ok(event) = serde_json::from_str::<RawCodexEvent>(&line) {
                turn.ingest_event(event);
            }
        }

        child.wait().await?;
        Ok(turn)
    }
}

#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
enum RawCodexEvent {
    #[serde(rename = "thread.started")]
    ThreadStarted { thread_id: String },

    #[serde(rename = "item.completed")]
    ItemCompleted { item: RawThreadItem },

    #[serde(rename = "turn.completed")]
    TurnCompleted { usage: TokenUsage },

    #[serde(rename = "turn.failed")]
    TurnFailed { error: String },

    #[serde(rename = "error")]
    Error { message: String },

    #[serde(other)]
    Other,
}

#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
enum RawThreadItem {
    #[serde(rename = "agent_message")]
    AgentMessage { text: String },

    #[serde(rename = "command_execution")]
    CommandExecution {
        command: String,
        aggregated_output: String,
        exit_code: i32,
    },

    #[serde(rename = "file_change")]
    FileChange {
        path: String,
        #[serde(default)]
        diff: String,
    },

    #[serde(rename = "reasoning")]
    Reasoning { text: String },

    #[serde(other)]
    Other,
}

#[derive(Debug, Default)]
pub struct CodexTurn {
    pub thread_id: Option<String>,
    pub agent_messages: Vec<String>,
    pub command_executions: Vec<CommandExecution>,
    pub file_changes: Vec<FileChange>,
    pub reasoning_traces: Vec<String>,
    pub token_usage: Option<TokenUsage>,
    pub error: Option<String>,
}

impl CodexTurn {
    fn ingest_event(&mut self, event: RawCodexEvent) {
        match event {
            RawCodexEvent::ThreadStarted { thread_id } => {
                self.thread_id = Some(thread_id);
            }
            RawCodexEvent::ItemCompleted { item } => match item {
                RawThreadItem::AgentMessage { text } => {
                    self.agent_messages.push(text);
                }
                RawThreadItem::CommandExecution {
                    command,
                    aggregated_output,
                    exit_code,
                } => {
                    self.command_executions.push(CommandExecution {
                        command,
                        stdout: aggregated_output,
                        exit_code,
                    });
                }
                RawThreadItem::FileChange { path, diff } => {
                    self.file_changes.push(FileChange { path, diff });
                }
                RawThreadItem::Reasoning { text } => {
                    self.reasoning_traces.push(text);
                }
                RawThreadItem::Other => {}
            },
            RawCodexEvent::TurnCompleted { usage } => {
                self.token_usage = Some(usage);
            }
            RawCodexEvent::TurnFailed { error } => {
                self.error = Some(error);
            }
            RawCodexEvent::Error { message } => {
                self.error = Some(message);
            }
            RawCodexEvent::Other => {}
        }
    }
}

#[derive(Debug)]
pub struct CommandExecution {
    pub command: String,
    pub stdout: String,
    pub exit_code: i32,
}

#[derive(Debug)]
pub struct FileChange {
    pub path: String,
    pub diff: String,
}

#[derive(Debug, Deserialize)]
pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
}

Critical Design Decisions:

  1. Persistent Thread: thread_id stored in struct, reused via resume
  2. Comprehensive Event Parsing: ALL JSONL events captured (not just agent_message)
  3. Sandbox Configuration: Explicit --sandbox workspace-write allows file edits
  4. Auto-Approval: --ask-for-approval never for autonomous agent operation

2.2.4 PythonExecutor (src/execution/python_executor.rs)

Executes Python code blocks with real environment bindings.

use anyhow::Result;
use std::collections::HashMap;
use std::sync::Arc;
use tokio::process::Command;

use super::env_adapter::EnvironmentAdapter;
use crate::tree::ExecutionContext;

pub struct PythonExecutor {
    env_adapter: Arc<dyn EnvironmentAdapter>,
}

impl PythonExecutor {
    pub fn new(env: Arc<dyn EnvironmentAdapter>) -> Self {
        Self { env_adapter: env }
    }

    pub async fn execute(
        &self,
        code: &str,
        context: &ExecutionContext,
    ) -> Result<ExecutionResult> {
        let script = self.build_script(code, context)?;

        let output = Command::new("python3")
            .arg("-c")
            .arg(&script)
            .output()
            .await?;

        if output.status.success() {
            self.parse_success(&output.stderr)
        } else {
            self.parse_failure(&output.stderr, code)
        }
    }

    fn build_script(&self, code: &str, context: &ExecutionContext) -> Result<String> {
        let vars_setup = context
            .variables
            .iter()
            .map(|(k, v)| format!("{} = {}", k, serde_json::to_string(v).unwrap()))
            .collect::<Vec<_>>()
            .join("\n");

        let code_escaped = code.replace("\\", "\\\\").replace("\"", "\\\"");

        Ok(format!(
            r#"
import re, json, sys

# Context variables
{vars_setup}

# run() implementation (calls Rust env adapter via marker)
def run(action: str) -> str:
    print(f"[RECODE_ACTION] {{action}}", file=sys.stderr)
    # TODO: IPC with Rust EnvironmentAdapter
    return f"Executed: {{action}}"

# User code
try:
    {code}

    # Export variables
    _exported = {{k: v for k, v in locals().items() if not k.startswith('_') and k not in ['run', 're', 'json', 'sys']}}
    print('[RECODE_VARS]' + json.dumps(_exported), file=sys.stderr)
except NameError as e:
    match = re.search(r"name '(.+?)' is not defined", str(e))
    if match and f"{{match.group(1)}}(" in """{code_escaped}""":
        print(f"[RECODE_NEED_EXPANSION] {{match.group(1)}}", file=sys.stderr)
    raise
            "#,
            vars_setup = vars_setup,
            code = code
        ))
    }

    fn parse_success(&self, stderr: &[u8]) -> Result<ExecutionResult> {
        let stderr_str = String::from_utf8_lossy(stderr);

        let mut variables = HashMap::new();
        if let Some(idx) = stderr_str.find("[RECODE_VARS]") {
            let json_str = &stderr_str[idx + 13..];
            if let Some(end) = json_str.find('\n') {
                if let Ok(vars) = serde_json::from_str::<HashMap<String, serde_json::Value>>(&json_str[..end]) {
                    variables = vars;
                }
            }
        }

        let actions = stderr_str
            .lines()
            .filter_map(|line| {
                if line.contains("[RECODE_ACTION]") {
                    Some(line.replace("[RECODE_ACTION]", "").trim().to_string())
                } else {
                    None
                }
            })
            .collect();

        Ok(ExecutionResult {
            success: true,
            variables,
            actions,
            error: None,
        })
    }

    fn parse_failure(&self, stderr: &[u8], code: &str) -> Result<ExecutionResult> {
        let stderr_str = String::from_utf8_lossy(stderr);

        if let Some(idx) = stderr_str.find("[RECODE_NEED_EXPANSION]") {
            let func_name = stderr_str[idx + 23..]
                .lines()
                .next()
                .unwrap_or("")
                .trim();

            return Ok(ExecutionResult {
                success: false,
                error: Some(format!("NeedExpansion: `{}` needs to be expanded.", func_name)),
                variables: HashMap::new(),
                actions: vec![],
            });
        }

        Ok(ExecutionResult {
            success: false,
            error: Some(stderr_str.to_string()),
            variables: HashMap::new(),
            actions: vec![],
        })
    }
}

#[derive(Debug)]
pub struct ExecutionResult {
    pub success: bool,
    pub variables: HashMap<String, serde_json::Value>,
    pub actions: Vec<String>,
    pub error: Option<String>,
}

Critical Features:

  1. Local Execution: Runs Python via tokio::process::Command (not via Codex)
  2. Real run() Binding: Injected function calls EnvironmentAdapter (future: IPC)
  3. Variable Export: Parses [RECODE_VARS] from stderr
  4. NeedExpansion Detection: Catches NameError for undefined function calls

2.2.5 AstCodeSplitter (src/analysis/code_splitter.rs)

Parses Python code into AST nodes for correct splitting.

use anyhow::Result;
use tree_sitter::{Language, Parser, Node};

extern "C" {
    fn tree_sitter_python() -> Language;
}

pub struct AstCodeSplitter {
    parser: Parser,
}

impl AstCodeSplitter {
    pub fn new() -> Result<Self> {
        let mut parser = Parser::new();
        parser
            .set_language(unsafe { tree_sitter_python() })
            .map_err(|e| anyhow::anyhow!("Failed to set language: {}", e))?;
        Ok(Self { parser })
    }

    pub fn split(&mut self, code: &str) -> Result<Vec<String>> {
        let tree = self
            .parser
            .parse(code, None)
            .ok_or_else(|| anyhow::anyhow!("Failed to parse code"))?;

        let root = tree.root_node();
        let mut blocks = Vec::new();

        for i in 0..root.child_count() {
            if let Some(child) = root.child(i) {
                let block = &code[child.byte_range()];
                blocks.push(block.to_string());
            }
        }

        Ok(blocks)
    }

    pub fn find_placeholders(&mut self, code: &str) -> Result<Vec<String>> {
        let tree = self.parser.parse(code, None).unwrap();
        let mut placeholders = Vec::new();

        let mut cursor = tree.walk();
        self.visit_node(&mut cursor, code, &mut placeholders);

        Ok(placeholders)
    }

    fn visit_node(
        &self,
        cursor: &mut tree_sitter::TreeCursor,
        code: &str,
        placeholders: &mut Vec<String>,
    ) {
        let node = cursor.node();

        if node.kind() == "call" {
            if let Some(func_node) = node.child_by_field_name("function") {
                let func_name = &code[func_node.byte_range()];
                if !Self::is_builtin(func_name) && func_name != "run" {
                    placeholders.push(func_name.to_string());
                }
            }
        }

        if cursor.goto_first_child() {
            loop {
                self.visit_node(cursor, code, placeholders);
                if !cursor.goto_next_sibling() {
                    break;
                }
            }
            cursor.goto_parent();
        }
    }

    fn is_builtin(name: &str) -> bool {
        matches!(
            name,
            "print" | "len" | "range" | "enumerate" | "zip" | "list" | "dict" | "str" | "int"
        )
    }
}

Critical Features:

  1. AST-Based Splitting: Each top-level statement becomes a code block (not blank lines)
  2. Multi-Line Support: Correctly handles multi-line expressions, loops, conditionals
  3. Placeholder Detection: Identifies undefined function calls via AST traversal

3. IMPLEMENTATION PHASES

3.1 Phase 1: Foundations (Week 1-2)

Objective: Persistent thread management + comprehensive event parsing + deterministic execution

Week 1 Deliverables

  1. CodexThreadManager (src/codex/thread_manager.rs)

    • start_thread(prompt) → captures thread_id
    • resume_thread(prompt) → reuses thread_id
    • Parse all 8 JSONL event types
    • Test: Integration test showing thread_id persists across 3 turns
  2. ExecutionContext (src/tree/context.rs)

    • add_action(command) populated from command_execution events
    • add_observation(stdout) populated from command output
    • formatted_variables() serializes for LLM prompts
    • Test: Verify variables + actions + observations all tracked
  3. CodeNode & CodeTree (src/tree/node.rs, src/tree/tree.rs)

    • State machine: Pending → NeedsExpansion → Expanded → Completed/Failed
    • DFS iterator
    • Test: Build 3-level tree, verify DFS order

Week 2 Deliverables

  1. PythonExecutor (src/execution/python_executor.rs)

    • execute(code, context) → spawn python3 -c
    • Inject real run() function
    • Parse [RECODE_VARS], [RECODE_ACTION], [RECODE_NEED_EXPANSION]
    • Test: Execute code with variables, verify export
  2. EnvironmentAdapter (src/execution/env_adapter.rs)

    • Trait: run(action) → Result<String>
    • Mock adapter for testing
    • Test: Mock adapter returns expected observations
  3. ALFWorld Adapter (src/environments/alfworld.rs)

    • Subprocess integration with Python ALFWorld
    • run(action) calls env.step(action)
    • Test: Run go to cabinet 1, verify observation

Acceptance Criteria:

  • thread_id reused across 3 recursive calls (integration test)
  • command_execution events populate ExecutionContext.actions
  • ✅ ALFWorld task executes real commands

3.2 Phase 2: AST & Orchestration (Week 3-4)

Objective: AST-based code splitting + complete DFS orchestration loop

Week 3 Deliverables

  1. AstCodeSplitter (src/analysis/code_splitter.rs)

    • split(code) → Vec of top-level AST nodes
    • find_placeholders(code) → Vec of undefined functions
    • Test: Multi-line expressions, loops, conditionals
  2. PromptBuilder (src/codex/prompt_builder.rs)

    • build_expansion_prompt(node, context) → formatted string
    • Include few-shot examples from .dev-docs/agents/recode/resources/
    • Format variables as - name (type): value
    • Test: Verify prompt contains all required sections

Week 4 Deliverables

  1. OrchestratorEngine (src/orchestrator/engine.rs)

    • run() → DFS traversal with state machine
    • expand_node() → call Codex + parse response + create children
    • process_node() → execute or detect expansion
    • Error handling with retry logic
    • Test: 5-level deep tree, verify all nodes visited
  2. End-to-End Integration Test

    • Run full ALFWorld task (pick_and_place_simple)
    • Verify tree grows to 3+ levels
    • Verify task succeeds with score >0
    • Benchmark: DFS traversal <10ms for 100-node tree

Acceptance Criteria:

  • ✅ AST parsing: all test cases pass (multi-line, loops, conditionals)
  • ✅ E2E test: solve 1 ALFWorld task with recursive expansion
  • ✅ Performance: DFS <10ms (100 nodes)

3.3 Phase 3: API & Client (Week 5)

Objective: Expose Rust core via RPC, TypeScript client library

Week 5 Deliverables

  1. RpcServer (src/api/rpc_server.rs)

    • TCP listener on localhost:9000
    • Handle: Solve, GetTree, ResumeTask, GetStatus
    • Serialize CodeTree to JSON
    • Test: Call via curl, verify response
  2. TypeScript Client (clients/typescript/)

    • RecodeClient.solve(task, context)
    • RecodeClient.getTree(taskId)
    • Stream events via WebSocket
    • Test: TS integration test calls Rust backend
  3. CLI Tool (src/bin/recode.rs)

    • recode solve "task description"
    • recode tree show <task-id>
    • recode exec --json "task" (for CI)
    • Test: CLI completes 1 task end-to-end

Acceptance Criteria:

  • ✅ RPC server handles concurrent requests
  • ✅ TS client successfully calls Rust backend
  • ✅ CLI completes ALFWorld task

3.4 Phase 4: Performance (Week 6)

Objective: Meet 10-100x speedup targets

Week 6 Deliverables

  1. Benchmarks (benches/)

    • DFS traversal: 1000 nodes <1ms
    • AST parsing: 5000 lines <50ms
    • Memory: 10-layer tree <50MB
    • Tool: Criterion for statistical analysis
  2. Optimization

    • Arena allocator for CodeNode (reduce heap allocations)
    • LRU cache for LLM responses
    • Concurrent expansion of independent siblings (Tokio tasks)
    • Tool: Flamegraph analysis

Acceptance Criteria:

  • ✅ All performance targets met (DFS <1ms, AST <50ms, Memory <50MB)
  • ✅ Benchmarks tracked in CI

3.5 Phase 5: Testing & Release (Week 7-8)

Week 7 Deliverables

  1. Integration Tests (tests/integration/)

    • ALFWorld: pick_and_place, clean, heat_then_cool
    • ScienceWorld: boil_water, grow_plant
    • WebShop: search_product, add_to_cart
    • Error recovery: retry on failure
  2. CI Pipeline (.github/workflows/ci.yml)

    • Run cargo test + cargo clippy + cargo bench
    • Run integration tests in Docker
    • Publish binaries on release

Acceptance Criteria:

  • ✅ All tests pass in CI
  • ✅ Coverage >80%

Week 8 Deliverables

  1. Release Preparation

    • Version bump to 0.1.0-alpha
    • CHANGELOG.md
    • GitHub release with binaries
  2. Docker Image

    • Dockerfile with Rust + Python + Codex CLI
    • Publish to Docker Hub
  3. Documentation

    • README.md with quick start
    • Examples: simple_task, alfworld_task, custom_env
    • Architecture diagram

Acceptance Criteria:

  • cargo install recode-core works
  • ✅ Docker: docker run recode/core solve "task"
  • ✅ README comprehensive, examples runnable

4. PERFORMANCE TARGETS

Metric Current (Python) Target (Rust) Measurement Method
DFS Traversal (1000 nodes) ~100ms <1ms Criterion benchmark
AST Parsing (5000 lines) ~500ms <50ms Criterion benchmark
Memory (10-layer tree) ~500MB <50MB Valgrind / heaptrack
Thread Creation N/A (new process each time) Persistent (reuse) Integration test
Event Capture Rate 12.5% (1/8 types) 100% (8/8 types) Unit test
Primitive Action Execution 0% (mock) 100% (real) E2E test

5. SECURITY & SAFETY

5.1 Sandbox Configuration

Default Settings:

  • Sandbox: workspace-write (allows edits in Git repo + /tmp)
  • Approval: never (auto-approve for agent autonomy)
  • Network: Disabled in workspace-write mode

Trust Boundaries:

  1. Repository must be marked as trusted (security review required)
  2. All file edits confined to current Git repository
  3. Commands executed in sandboxed environment (macOS: Seatbelt, Linux: Landlock)

5.2 Error Handling

Graceful Degradation:

  • LLM failures: Retry up to max_retry times (default: 5)
  • Sandbox violations: Log error, mark node as Failed
  • Recursion limit: Max depth 10 (prevent infinite loops)

Observability:

  • Structured logging via tracing
  • All JSONL events captured for debugging
  • Execution tree serialized for post-mortem analysis

6. FUTURE EXTENSIONS

6.1 CodeEnv (Real Codebase Operations)

Replace ALFWorld/ScienceWorld with real coding environment:

  • File system operations (read, write, edit)
  • Git operations (commit, branch, diff)
  • Bash command execution
  • Test runner integration

6.2 Multi-Modal Support

Extend to handle images, diagrams, UI screenshots:

  • codex exec --image <path> for visual inputs
  • Placeholder functions for image analysis tasks

6.3 Distributed Execution

Scale to large task trees:

  • Parallel expansion of independent siblings
  • Distributed state management (Redis/PostgreSQL)
  • Load balancing across multiple Codex instances

7. REFERENCES

7.1 Academic

  • ReCode Paper: arXiv:2510.23564v2 - "ReCode: Unify Plan and Action for Universal Granularity Control"
  • Cognitive Science: Prinz (1997), Koechlin & Summerfield (2003), Badre & D'Esposito (2009)

7.2 Technical

  • Codex CLI Docs: .knowledge/codex/docs/exec.md, sandbox.md, authentication.md
  • Codex SDK: .knowledge/codex/sdk/typescript/README.md
  • Python Prototype: .dev-docs/agents/recode/agent.py, utils.py, executor.py

7.3 Project Documents

  • Analysis: ARCHITECTURE_GAPS_ANALYSIS.md (comprehensive gap analysis)
  • Roadmap: dev-spec/roadmap/ROADMAP_V2_2025111602.md (8-week plan)
  • Summary: EXEC_SUMMARY_OPTIMIZATIONS.md (executive brief)

APPENDIX A: CARGO.TOML

[package]
name = "recode-core"
version = "0.1.0"
edition = "2021"
rust-version = "1.75"
authors = ["ReCodeAgent Team"]
description = "Production implementation of ReCode recursive code generation paradigm"
repository = "https://github.com/your-org/recode-agent"
license = "MIT"

[dependencies]
# Async runtime
tokio = { version = "1.35", features = ["full"] }

# CLI & Config
clap = { version = "4.4", features = ["derive"] }

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

# Error handling
anyhow = "1.0"
thiserror = "1.0"

# IDs & Time
uuid = { version = "1.6", features = ["v4", "serde"] }
chrono = { version = "0.4", features = ["serde"] }

# AST Parsing
tree-sitter = "0.20"
tree-sitter-python = "0.20"

# Logging
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

# Performance
lru = "0.12"

[dev-dependencies]
criterion = "0.5"
tempfile = "3.8"

[[bench]]
name = "tree_traversal"
harness = false

[[bench]]
name = "ast_parsing"
harness = false

[profile.release]
lto = true
codegen-units = 1
opt-level = 3

APPENDIX B: PROJECT STRUCTURE

recode-core/
├── Cargo.toml
├── src/
│   ├── main.rs                      # CLI entrypoint
│   ├── lib.rs                       # Library exports
│   │
│   ├── orchestrator/                # Core orchestration
│   │   ├── mod.rs
│   │   ├── engine.rs                # OrchestratorEngine
│   │   ├── scheduler.rs             # Concurrent task scheduling
│   │   └── state.rs                 # NodeState state machine
│   │
│   ├── tree/                        # CodeNode tree structures
│   │   ├── mod.rs
│   │   ├── node.rs                  # CodeNode definition
│   │   ├── tree.rs                  # CodeTree with DFS iterator
│   │   ├── context.rs               # ExecutionContext
│   │   └── arena.rs                 # Arena allocator
│   │
│   ├── codex/                       # Codex CLI integration
│   │   ├── mod.rs
│   │   ├── thread_manager.rs        # CodexThreadManager
│   │   ├── event_bus.rs             # JSONL event parser
│   │   ├── prompt_builder.rs        # Expansion prompts
│   │   └── auth.rs                  # Authentication
│   │
│   ├── execution/                   # Code execution
│   │   ├── mod.rs
│   │   ├── python_executor.rs       # PythonExecutor
│   │   └── env_adapter.rs           # EnvironmentAdapter trait
│   │
│   ├── analysis/                    # Code analysis
│   │   ├── mod.rs
│   │   ├── code_splitter.rs         # AstCodeSplitter
│   │   └── need_expansion.rs        # NeedExpansion detector
│   │
│   ├── environments/                # Environment adapters
│   │   ├── mod.rs
│   │   ├── alfworld.rs              # ALFWorld adapter
│   │   ├── sciworld.rs              # ScienceWorld adapter
│   │   └── mock.rs                  # Mock environment
│   │
│   └── api/                         # RPC server
│       ├── mod.rs
│       └── rpc_server.rs            # JSON-RPC server
│
├── tests/
│   ├── integration/
│   │   ├── thread_persistence.rs
│   │   ├── alfworld_task.rs
│   │   └── ast_parsing.rs
│   └── fixtures/
│       └── prompts/
│
├── benches/
│   ├── tree_traversal.rs
│   └── ast_parsing.rs
│
└── examples/
    ├── simple_task.rs
    ├── alfworld_task.rs
    └── custom_env.rs

Document Status: ✅ OFFICIAL v0.1.0 Approval: Lead Architect Implementation Start: 2025-11-18 (Week 1) Expected Completion: 2026-01-17 (Week 8)