ReCodeAgent Architecture Design Document

Version: 0.1.0 (OKR Baseline) Updated Date (date -u '+%Y-%m-%dT%H:%M:%SZ'): 2025-11-16T16:44:47Z Status: ✅ OFFICIAL - Production Architecture Specification Authors: ReCodeAgent Team

DOCUMENT CONTROL

Property	Value
Document Type	Architecture Design Specification
OKR Version	0.1.0 (MVP Baseline)
Target Release	2025-11-22 (5-Day DEV Cooking Cycle)
Based On	ReCode Paper (arXiv:2510.23564v2) + Codex CLI Integration Analysis
Supersedes	.artifacts/PROJECT_INIT_20251116/IMPLEMENTATION_ROADMAP.md
Related Docs	dev-spec/roadmap/ROADMAP_V2_2025111602.md
Scope	Full Production: Real ALFWorld IPC, Complete JSONL (13+ events), Integrated Executor

EXECUTIVE SUMMARY

ReCodeAgent is a production implementation of the ReCode research paradigm (arXiv:2510.23564v2) that achieves universal granularity control through recursive code generation. This architecture specification defines a high-performance Rust Core + Codex CLI integrated system that:

Unifies Plans and Actions: Represents both abstract planning (placeholder functions) and concrete execution (primitive actions) in a single code representation
Enables Dynamic Granularity Control: LLM policy adaptively decides when to plan abstractly vs. commit to specific actions
Achieves 10-100x Performance: Rust-based orchestration with zero-cost abstractions, <1ms DFS traversal, <50ms AST parsing
Maintains Production Quality: Deterministic execution, comprehensive JSONL event capture, type-safe state management

Core Innovation

Traditional LLM-based agents suffer from fixed granularity:

ReAct agents: Locked at fine-grained step-by-step execution, no strategic foresight
Planner-based agents: Rigid plan-execute separation, cannot adapt dynamically

ReCode solves this by treating plans as high-level placeholder functions that recursively refine into finer-grained components until reaching executable primitive actions. This creates an infinite decision space where the agent dynamically controls its reasoning granularity.

Architecture Highlights

┌─────────────────────────────────────────────────────────┐
│  Rust Orchestrator Engine (Week 1-4)                    │
│  - Persistent Codex thread management                   │
│  - DFS tree traversal with state machine                │
│  - AST-aware code parsing (tree-sitter)                 │
│  - 10-100x performance vs Python prototype              │
└─────────────────────────────────────────────────────────┘
                        ↓ JSONL Events
┌─────────────────────────────────────────────────────────┐
│  Codex CLI Executor (codex exec --json)                 │
│  - LLM policy for placeholder expansion                 │
│  - Sandbox: workspace-write, approval: never            │
│  - Event stream: command_execution, file_change, etc.   │
└─────────────────────────────────────────────────────────┘
                        ↓ Primitive Actions
┌─────────────────────────────────────────────────────────┐
│  Environment Adapters (Week 2)                          │
│  - ALFWorld (household tasks)                           │
│  - ScienceWorld (lab experiments)                       │
│  - CodeEnv (real codebase operations) [Future]          │
└─────────────────────────────────────────────────────────┘

1. ARCHITECTURE FOUNDATIONS

1.1 ReCode Methodology Primer

1.1.1 Decision Space Formulation

We model LLM-based agent interaction as a simplified decision process:

$$\mathcal{M} = \langle \mathcal{S}, \mathcal{A}, \mathcal{O}, T, R \rangle$$

Where:

$\mathcal{S}$: State space
$\mathcal{A}$: Primitive action space (executable operations like run('go to cabinet 1'))
$\mathcal{O}$: Observation space
$T: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}$: Transition function
$R: \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$: Reward function

Beyond primitive actions, we introduce plan space $\mathcal{P}$:

Contains high-level intentions requiring decomposition
Example: prepare_breakfast(), find_and_take(obj, locations)
Cannot execute directly, must refine into $\mathcal{A}$ or intermediate $\mathcal{P}$ elements

Decision space: $\mathcal{D} = \mathcal{A} \cup \mathcal{P}$

1.1.2 Granularity Hierarchy

Decisions form a natural hierarchy from coarse to fine:

TASK-SPECIFICATION (coarsest)
  ↓
solve("prepare breakfast", observation)
  ↓
prepare_breakfast() → get_ingredients() + cook_meal()
  ↓
get_ingredients() → open_refrigerator() + take_eggs()
  ↓
open_refrigerator() → run('go to refrigerator 1') + run('open refrigerator 1')
  ↓
PRIMITIVE ACTIONS (finest)

ReCode represents this hierarchy as Python code:

Plans: Undefined placeholder functions (e.g., obj_ID = find_and_take(obj, locations))
Actions: Environment-specific primitives (e.g., obs = run('go to cabinet 1'))

1.1.3 Recursive Expansion Algorithm

Algorithm 1: ReCode Core Loop

Input: Task T, Policy π (LLM), Environment E, Current Node c
Procedure ReCode(T, π, E, c):
    if c is None:
        o_0 ← Reset(E)                    # Initialize environment
        c ← Text2Code(T, o_0)             # Root: solve(instruction, observation)
    end if

    code_block ← π(c)                     # LLM expands current placeholder

    for each code_unit u in code_block:
        if IsPrimitive(u):                # Executable action
            Execute(u, E)                 # Run in environment
        else:                             # Placeholder function
            ReCode(T, π, E, u)            # Recursive expansion
        end if
    end for
end procedure

Key Properties:

Unified Representation: Plans and actions both expressed as Python function calls
Dynamic Granularity: Policy decides when to stop planning and commit to actions
Context Propagation: Unified variable namespace persists across recursion levels
Deterministic Execution: Primitive actions execute in environment, placeholders trigger expansion

1.2 Codex CLI Integration Model

1.2.1 Codex Execution Modes

Codex CLI supports two execution patterns:

1. Non-Interactive Mode (codex exec)

Single-turn execution: prompt → LLM response → exit
Default: --sandbox read-only, no file edits or network
With --full-auto: --sandbox workspace-write, --ask-for-approval never
Limitation: No conversation context between invocations

2. Session Resume Mode (codex exec resume <THREAD_ID>)

Preserves conversation context from previous turn
Maintains thread state in ~/.codex/sessions/<thread_id>/
Critical for ReCode: Enables variable/action history across recursive expansions

1.2.2 JSONL Event Stream Protocol

Codex outputs structured events via --json flag:

Event Type	Data	ReCode Usage
`thread.started`	`thread_id`	Capture for resume capability
`turn.started`	-	Track turn boundaries
`item.completed` (agent_message)	`text`	Extract `<think>` and `<execute>` blocks
`item.completed` (command_execution)	`command`, `stdout`, `stderr`, `exit_code`	Populate `ExecutionContext.actions`, detect `NeedExpansion`
`item.completed` (file_change)	`path`, `diff`	Track file edits (future CodeEnv)
`item.completed` (reasoning)	`text`	Observability/debugging
`turn.completed`	`usage` (tokens)	Cost tracking
`turn.failed`	`error`	Error handling

Architecture Requirement: Parse ALL 8 event types (not just agent_message)

1.2.3 Sandbox & Approval Configuration

For ReCode agent autonomy:

codex exec --json \
  --sandbox workspace-write \        # Allow file edits in current directory
  --ask-for-approval never \         # Auto-approve all operations
  "Expand placeholder: find_and_take(obj, locations)"

Security Considerations:

workspace-write: Sandboxed to current Git repository + /tmp
Network access: Disabled by default in workspace-write mode
Trust boundary: Mark repository as trusted after security review

2. SYSTEM ARCHITECTURE

2.1 Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│                      CLI Layer (src/main.rs)                     │
│  $ recode solve "implement binary tree traversal"                │
│  $ recode tree show <task-id>                                    │
│  $ recode exec resume <thread-id> "fix the bug"                  │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│              Orchestrator Layer (src/orchestrator/)              │
│  ┌────────────────────────────────────────────────────┐          │
│  │  OrchestratorEngine                                │          │
│  │  - run() → DFS tree traversal                      │          │
│  │  - expand_node() → call Codex for placeholder      │          │
│  │  - process_node() → execute or detect expansion    │          │
│  └────────────────────────────────────────────────────┘          │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│               Tree Management (src/tree/)                        │
│  ┌────────────────────┐  ┌──────────────────────┐                │
│  │  CodeTree          │  │  ExecutionContext    │                │
│  │  - DFS iterator    │  │  - variables: Map    │                │
│  │  - add_child()     │  │  - actions: Vec      │                │
│  │  - get_node()      │  │  - observations: Vec │                │
│  └────────────────────┘  └──────────────────────┘                │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│           Codex Integration (src/codex/)                         │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  CodexThreadManager                                  │        │
│  │  - start_thread(prompt) → thread_id                  │        │
│  │  - resume_thread(prompt) → reuse thread_id           │        │
│  │  - build_command() → codex exec --json --sandbox ... │        │
│  │  - execute_and_parse() → spawn + parse JSONL         │        │
│  └──────────────────────────────────────────────────────┘        │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  CodexEventBus                                       │        │
│  │  - ingest(jsonl_line) → parse event                  │        │
│  │  - route to ExecutionContext updaters                │        │
│  └──────────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│            Execution Layer (src/execution/)                      │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  PythonExecutor                                      │        │
│  │  - execute(code, context) → spawn python3 -c         │        │
│  │  - build_script() → inject run() + variables         │        │
│  │  - parse_success() → extract [RECODE_VARS]           │        │
│  │  - parse_failure() → detect [RECODE_NEED_EXPANSION]  │        │
│  └──────────────────────────────────────────────────────┘        │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  EnvironmentAdapter (trait)                          │        │
│  │  - run(action: &str) → Result<String>                │        │
│  │  - reset() → Result<String>                          │        │
│  └──────────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│          Analysis Layer (src/analysis/)                          │
│  ┌──────────────────────────────────────────────────────┐        │
│  │  AstCodeSplitter                                     │        │
│  │  - split(code) → Vec<String> (AST top-level nodes)   │        │
│  │  - find_placeholders(code) → Vec<String>             │        │
│  │  - tree-sitter Python parser                         │        │
│  └──────────────────────────────────────────────────────┘        │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│        Infrastructure (Codex CLI, tokio, tree-sitter)            │
└──────────────────────────────────────────────────────────────────┘

2.2 Core Components Specification

2.2.1 CodeNode (src/tree/node.rs)

Represents a single node in the decision tree.

use serde::{Deserialize, Serialize};
use uuid::Uuid;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum NodeState {
    Pending,           // Waiting to be processed
    NeedsExpansion,    // Detected as placeholder, awaiting LLM expansion
    Expanded,          // LLM expanded, child nodes created
    Executing,         // Primitive action being executed
    Completed,         // Successfully completed
    Failed,            // Execution failed
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CodeNode {
    pub id: Uuid,
    pub parent_id: Option<Uuid>,
    pub children: Vec<Uuid>,
    pub state: NodeState,
    pub code: String,
    pub thought: Option<String>,        // LLM reasoning from <think> block
    pub result: Option<String>,         // Execution output
    pub error: Option<String>,
    pub depth: usize,
    pub created_at: i64,
    pub updated_at: i64,
}

impl CodeNode {
    pub fn new_root(code: String, depth: usize) -> Self {
        Self {
            id: Uuid::new_v4(),
            parent_id: None,
            children: Vec::new(),
            state: NodeState::Pending,
            code,
            thought: None,
            result: None,
            error: None,
            depth,
            created_at: chrono::Utc::now().timestamp(),
            updated_at: chrono::Utc::now().timestamp(),
        }
    }

    pub fn new_child(parent_id: Uuid, code: String, depth: usize) -> Self {
        Self {
            id: Uuid::new_v4(),
            parent_id: Some(parent_id),
            children: Vec::new(),
            state: NodeState::Pending,
            code,
            thought: None,
            result: None,
            error: None,
            depth,
            created_at: chrono::Utc::now().timestamp(),
            updated_at: chrono::Utc::now().timestamp(),
        }
    }
}

State Machine:

Pending → NeedsExpansion → Expanded → Pending (for children)
Pending → Executing → Completed
Pending → Executing → Failed

2.2.2 ExecutionContext (src/tree/context.rs)

Maintains the unified variable namespace across recursion levels.

use serde::{Deserialize, Serialize};
use std::collections::HashMap;

#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct ExecutionContext {
    pub variables: HashMap<String, serde_json::Value>,
    pub actions: Vec<String>,
    pub observations: Vec<String>,
}

impl ExecutionContext {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn set_variable(&mut self, key: String, value: serde_json::Value) {
        if !Self::is_reserved(&key) {
            self.variables.insert(key, value);
        }
    }

    pub fn merge_variables(&mut self, other: HashMap<String, serde_json::Value>) {
        for (k, v) in other {
            self.set_variable(k, v);
        }
    }

    pub fn add_action(&mut self, action: &str) {
        self.actions.push(action.to_string());
    }

    pub fn add_observation(&mut self, obs: &str) {
        self.observations.push(obs.to_string());
    }

    pub fn formatted_variables(&self) -> String {
        if self.variables.is_empty() {
            return "(No Variables)".to_string();
        }

        self.variables
            .iter()
            .map(|(k, v)| {
                let type_str = Self::infer_type(v);
                let value_str = serde_json::to_string(v).unwrap_or_default();
                let display_value = if value_str.len() > 100 {
                    format!("{}...", &value_str[..97])
                } else {
                    value_str
                };
                format!("- {} ({}): {}", k, type_str, display_value)
            })
            .collect::<Vec<_>>()
            .join("\n")
    }

    fn infer_type(value: &serde_json::Value) -> &'static str {
        match value {
            serde_json::Value::String(_) => "str",
            serde_json::Value::Number(_) => "number",
            serde_json::Value::Bool(_) => "bool",
            serde_json::Value::Array(_) => "list",
            serde_json::Value::Object(_) => "dict",
            serde_json::Value::Null => "NoneType",
        }
    }

    fn is_reserved(key: &str) -> bool {
        key.starts_with('_') || matches!(key, "run" | "re" | "json" | "sys" | "os")
    }
}

Context Propagation:

Parent nodes establish variables (e.g., obj = 'apple')
Child nodes inherit parent context via serialization into LLM prompt
Child execution updates shared namespace (new variables, actions, observations)

2.2.3 CodexThreadManager (src/codex/thread_manager.rs)

Manages persistent Codex CLI threads for context preservation.

use anyhow::Result;
use serde::Deserialize;
use std::path::PathBuf;
use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::Command;

#[derive(Debug, Clone, Copy)]
pub enum SandboxMode {
    ReadOnly,
    WorkspaceWrite,
    DangerFullAccess,
}

impl SandboxMode {
    fn as_arg(&self) -> &'static str {
        match self {
            Self::ReadOnly => "read-only",
            Self::WorkspaceWrite => "workspace-write",
            Self::DangerFullAccess => "danger-full-access",
        }
    }
}

#[derive(Debug, Clone, Copy)]
pub enum ApprovalMode {
    Never,
    OnRequest,
    OnFailure,
    Untrusted,
}

impl ApprovalMode {
    fn as_arg(&self) -> &'static str {
        match self {
            Self::Never => "never",
            Self::OnRequest => "on-request",
            Self::OnFailure => "on-failure",
            Self::Untrusted => "untrusted",
        }
    }
}

pub struct CodexThreadManager {
    thread_id: Option<String>,
    base_dir: PathBuf,
    sandbox_mode: SandboxMode,
    approval_mode: ApprovalMode,
}

impl CodexThreadManager {
    pub fn new(base_dir: PathBuf) -> Self {
        Self {
            thread_id: None,
            base_dir,
            sandbox_mode: SandboxMode::WorkspaceWrite,
            approval_mode: ApprovalMode::Never,
        }
    }

    pub fn with_sandbox(mut self, mode: SandboxMode) -> Self {
        self.sandbox_mode = mode;
        self
    }

    pub fn with_approval(mut self, mode: ApprovalMode) -> Self {
        self.approval_mode = mode;
        self
    }

    pub fn has_active_thread(&self) -> bool {
        self.thread_id.is_some()
    }

    pub fn get_thread_id(&self) -> Option<&String> {
        self.thread_id.as_ref()
    }

    /// Start a new thread (first turn)
    pub async fn start_thread(&mut self, prompt: &str) -> Result<CodexTurn> {
        let mut cmd = self.build_command(prompt, false)?;
        let turn = self.execute_and_parse(&mut cmd).await?;

        if let Some(id) = &turn.thread_id {
            self.thread_id = Some(id.clone());
            tracing::info!("Started Codex thread: {}", id);
        }

        Ok(turn)
    }

    /// Resume existing thread (subsequent turns)
    pub async fn resume_thread(&mut self, prompt: &str) -> Result<CodexTurn> {
        if self.thread_id.is_none() {
            anyhow::bail!("Cannot resume: no active thread");
        }

        let mut cmd = self.build_command(prompt, true)?;
        self.execute_and_parse(&mut cmd).await
    }

    fn build_command(&self, prompt: &str, resume: bool) -> Result<Command> {
        let mut cmd = Command::new("codex");
        cmd.arg("exec")
            .arg("--json")
            .arg(format!("--sandbox={}", self.sandbox_mode.as_arg()))
            .arg(format!("--ask-for-approval={}", self.approval_mode.as_arg()))
            .current_dir(&self.base_dir);

        if resume {
            cmd.arg("resume");
            cmd.arg(self.thread_id.as_ref().unwrap());
        }

        cmd.arg(prompt);

        Ok(cmd)
    }

    async fn execute_and_parse(&self, cmd: &mut Command) -> Result<CodexTurn> {
        let mut child = cmd.stdout(std::process::Stdio::piped()).spawn()?;

        let stdout = child.stdout.take().unwrap();
        let reader = BufReader::new(stdout);
        let mut lines = reader.lines();

        let mut turn = CodexTurn::default();

        while let Some(line) = lines.next_line().await? {
            if let Ok(event) = serde_json::from_str::<RawCodexEvent>(&line) {
                turn.ingest_event(event);
            }
        }

        child.wait().await?;
        Ok(turn)
    }
}

#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
enum RawCodexEvent {
    #[serde(rename = "thread.started")]
    ThreadStarted { thread_id: String },

    #[serde(rename = "item.completed")]
    ItemCompleted { item: RawThreadItem },

    #[serde(rename = "turn.completed")]
    TurnCompleted { usage: TokenUsage },

    #[serde(rename = "turn.failed")]
    TurnFailed { error: String },

    #[serde(rename = "error")]
    Error { message: String },

    #[serde(other)]
    Other,
}

#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
enum RawThreadItem {
    #[serde(rename = "agent_message")]
    AgentMessage { text: String },

    #[serde(rename = "command_execution")]
    CommandExecution {
        command: String,
        aggregated_output: String,
        exit_code: i32,
    },

    #[serde(rename = "file_change")]
    FileChange {
        path: String,
        #[serde(default)]
        diff: String,
    },

    #[serde(rename = "reasoning")]
    Reasoning { text: String },

    #[serde(other)]
    Other,
}

#[derive(Debug, Default)]
pub struct CodexTurn {
    pub thread_id: Option<String>,
    pub agent_messages: Vec<String>,
    pub command_executions: Vec<CommandExecution>,
    pub file_changes: Vec<FileChange>,
    pub reasoning_traces: Vec<String>,
    pub token_usage: Option<TokenUsage>,
    pub error: Option<String>,
}

impl CodexTurn {
    fn ingest_event(&mut self, event: RawCodexEvent) {
        match event {
            RawCodexEvent::ThreadStarted { thread_id } => {
                self.thread_id = Some(thread_id);
            }
            RawCodexEvent::ItemCompleted { item } => match item {
                RawThreadItem::AgentMessage { text } => {
                    self.agent_messages.push(text);
                }
                RawThreadItem::CommandExecution {
                    command,
                    aggregated_output,
                    exit_code,
                } => {
                    self.command_executions.push(CommandExecution {
                        command,
                        stdout: aggregated_output,
                        exit_code,
                    });
                }
                RawThreadItem::FileChange { path, diff } => {
                    self.file_changes.push(FileChange { path, diff });
                }
                RawThreadItem::Reasoning { text } => {
                    self.reasoning_traces.push(text);
                }
                RawThreadItem::Other => {}
            },
            RawCodexEvent::TurnCompleted { usage } => {
                self.token_usage = Some(usage);
            }
            RawCodexEvent::TurnFailed { error } => {
                self.error = Some(error);
            }
            RawCodexEvent::Error { message } => {
                self.error = Some(message);
            }
            RawCodexEvent::Other => {}
        }
    }
}

#[derive(Debug)]
pub struct CommandExecution {
    pub command: String,
    pub stdout: String,
    pub exit_code: i32,
}

#[derive(Debug)]
pub struct FileChange {
    pub path: String,
    pub diff: String,
}

#[derive(Debug, Deserialize)]
pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
}

Critical Design Decisions:

Persistent Thread: thread_id stored in struct, reused via resume
Comprehensive Event Parsing: ALL JSONL events captured (not just agent_message)
Sandbox Configuration: Explicit --sandbox workspace-write allows file edits
Auto-Approval: --ask-for-approval never for autonomous agent operation

2.2.4 PythonExecutor (src/execution/python_executor.rs)

Executes Python code blocks with real environment bindings.

use anyhow::Result;
use std::collections::HashMap;
use std::sync::Arc;
use tokio::process::Command;

use super::env_adapter::EnvironmentAdapter;
use crate::tree::ExecutionContext;

pub struct PythonExecutor {
    env_adapter: Arc<dyn EnvironmentAdapter>,
}

impl PythonExecutor {
    pub fn new(env: Arc<dyn EnvironmentAdapter>) -> Self {
        Self { env_adapter: env }
    }

    pub async fn execute(
        &self,
        code: &str,
        context: &ExecutionContext,
    ) -> Result<ExecutionResult> {
        let script = self.build_script(code, context)?;

        let output = Command::new("python3")
            .arg("-c")
            .arg(&script)
            .output()
            .await?;

        if output.status.success() {
            self.parse_success(&output.stderr)
        } else {
            self.parse_failure(&output.stderr, code)
        }
    }

    fn build_script(&self, code: &str, context: &ExecutionContext) -> Result<String> {
        let vars_setup = context
            .variables
            .iter()
            .map(|(k, v)| format!("{} = {}", k, serde_json::to_string(v).unwrap()))
            .collect::<Vec<_>>()
            .join("\n");

        let code_escaped = code.replace("\\", "\\\\").replace("\"", "\\\"");

        Ok(format!(
            r#"
import re, json, sys

# Context variables
{vars_setup}

# run() implementation (calls Rust env adapter via marker)
def run(action: str) -> str:
    print(f"[RECODE_ACTION] {{action}}", file=sys.stderr)
    # TODO: IPC with Rust EnvironmentAdapter
    return f"Executed: {{action}}"

# User code
try:
    {code}

    # Export variables
    _exported = {{k: v for k, v in locals().items() if not k.startswith('_') and k not in ['run', 're', 'json', 'sys']}}
    print('[RECODE_VARS]' + json.dumps(_exported), file=sys.stderr)
except NameError as e:
    match = re.search(r"name '(.+?)' is not defined", str(e))
    if match and f"{{match.group(1)}}(" in """{code_escaped}""":
        print(f"[RECODE_NEED_EXPANSION] {{match.group(1)}}", file=sys.stderr)
    raise
            "#,
            vars_setup = vars_setup,
            code = code
        ))
    }

    fn parse_success(&self, stderr: &[u8]) -> Result<ExecutionResult> {
        let stderr_str = String::from_utf8_lossy(stderr);

        let mut variables = HashMap::new();
        if let Some(idx) = stderr_str.find("[RECODE_VARS]") {
            let json_str = &stderr_str[idx + 13..];
            if let Some(end) = json_str.find('\n') {
                if let Ok(vars) = serde_json::from_str::<HashMap<String, serde_json::Value>>(&json_str[..end]) {
                    variables = vars;
                }
            }
        }

        let actions = stderr_str
            .lines()
            .filter_map(|line| {
                if line.contains("[RECODE_ACTION]") {
                    Some(line.replace("[RECODE_ACTION]", "").trim().to_string())
                } else {
                    None
                }
            })
            .collect();

        Ok(ExecutionResult {
            success: true,
            variables,
            actions,
            error: None,
        })
    }

    fn parse_failure(&self, stderr: &[u8], code: &str) -> Result<ExecutionResult> {
        let stderr_str = String::from_utf8_lossy(stderr);

        if let Some(idx) = stderr_str.find("[RECODE_NEED_EXPANSION]") {
            let func_name = stderr_str[idx + 23..]
                .lines()
                .next()
                .unwrap_or("")
                .trim();

            return Ok(ExecutionResult {
                success: false,
                error: Some(format!("NeedExpansion: `{}` needs to be expanded.", func_name)),
                variables: HashMap::new(),
                actions: vec![],
            });
        }

        Ok(ExecutionResult {
            success: false,
            error: Some(stderr_str.to_string()),
            variables: HashMap::new(),
            actions: vec![],
        })
    }
}

#[derive(Debug)]
pub struct ExecutionResult {
    pub success: bool,
    pub variables: HashMap<String, serde_json::Value>,
    pub actions: Vec<String>,
    pub error: Option<String>,
}

Critical Features:

Local Execution: Runs Python via tokio::process::Command (not via Codex)
Real run() Binding: Injected function calls EnvironmentAdapter (future: IPC)
Variable Export: Parses [RECODE_VARS] from stderr
NeedExpansion Detection: Catches NameError for undefined function calls

2.2.5 AstCodeSplitter (src/analysis/code_splitter.rs)

Parses Python code into AST nodes for correct splitting.

use anyhow::Result;
use tree_sitter::{Language, Parser, Node};

extern "C" {
    fn tree_sitter_python() -> Language;
}

pub struct AstCodeSplitter {
    parser: Parser,
}

impl AstCodeSplitter {
    pub fn new() -> Result<Self> {
        let mut parser = Parser::new();
        parser
            .set_language(unsafe { tree_sitter_python() })
            .map_err(|e| anyhow::anyhow!("Failed to set language: {}", e))?;
        Ok(Self { parser })
    }

    pub fn split(&mut self, code: &str) -> Result<Vec<String>> {
        let tree = self
            .parser
            .parse(code, None)
            .ok_or_else(|| anyhow::anyhow!("Failed to parse code"))?;

        let root = tree.root_node();
        let mut blocks = Vec::new();

        for i in 0..root.child_count() {
            if let Some(child) = root.child(i) {
                let block = &code[child.byte_range()];
                blocks.push(block.to_string());
            }
        }

        Ok(blocks)
    }

    pub fn find_placeholders(&mut self, code: &str) -> Result<Vec<String>> {
        let tree = self.parser.parse(code, None).unwrap();
        let mut placeholders = Vec::new();

        let mut cursor = tree.walk();
        self.visit_node(&mut cursor, code, &mut placeholders);

        Ok(placeholders)
    }

    fn visit_node(
        &self,
        cursor: &mut tree_sitter::TreeCursor,
        code: &str,
        placeholders: &mut Vec<String>,
    ) {
        let node = cursor.node();

        if node.kind() == "call" {
            if let Some(func_node) = node.child_by_field_name("function") {
                let func_name = &code[func_node.byte_range()];
                if !Self::is_builtin(func_name) && func_name != "run" {
                    placeholders.push(func_name.to_string());
                }
            }
        }

        if cursor.goto_first_child() {
            loop {
                self.visit_node(cursor, code, placeholders);
                if !cursor.goto_next_sibling() {
                    break;
                }
            }
            cursor.goto_parent();
        }
    }

    fn is_builtin(name: &str) -> bool {
        matches!(
            name,
            "print" | "len" | "range" | "enumerate" | "zip" | "list" | "dict" | "str" | "int"
        )
    }
}

Critical Features:

AST-Based Splitting: Each top-level statement becomes a code block (not blank lines)
Multi-Line Support: Correctly handles multi-line expressions, loops, conditionals
Placeholder Detection: Identifies undefined function calls via AST traversal

3. IMPLEMENTATION PHASES

3.1 Phase 1: Foundations (Week 1-2)

Objective: Persistent thread management + comprehensive event parsing + deterministic execution

Week 1 Deliverables

CodexThreadManager (src/codex/thread_manager.rs)
- start_thread(prompt) → captures thread_id
- resume_thread(prompt) → reuses thread_id
- Parse all 8 JSONL event types
- Test: Integration test showing thread_id persists across 3 turns
ExecutionContext (src/tree/context.rs)
- add_action(command) populated from command_execution events
- add_observation(stdout) populated from command output
- formatted_variables() serializes for LLM prompts
- Test: Verify variables + actions + observations all tracked
CodeNode & CodeTree (src/tree/node.rs, src/tree/tree.rs)
- State machine: Pending → NeedsExpansion → Expanded → Completed/Failed
- DFS iterator
- Test: Build 3-level tree, verify DFS order

Week 2 Deliverables

PythonExecutor (src/execution/python_executor.rs)
- execute(code, context) → spawn python3 -c
- Inject real run() function
- Parse [RECODE_VARS], [RECODE_ACTION], [RECODE_NEED_EXPANSION]
- Test: Execute code with variables, verify export
EnvironmentAdapter (src/execution/env_adapter.rs)
- Trait: run(action) → Result<String>
- Mock adapter for testing
- Test: Mock adapter returns expected observations
ALFWorld Adapter (src/environments/alfworld.rs)
- Subprocess integration with Python ALFWorld
- run(action) calls env.step(action)
- Test: Run go to cabinet 1, verify observation

Acceptance Criteria:

✅ thread_id reused across 3 recursive calls (integration test)
✅ command_execution events populate ExecutionContext.actions
✅ ALFWorld task executes real commands

3.2 Phase 2: AST & Orchestration (Week 3-4)

Objective: AST-based code splitting + complete DFS orchestration loop

Week 3 Deliverables

AstCodeSplitter (src/analysis/code_splitter.rs)
- split(code) → Vec of top-level AST nodes
- find_placeholders(code) → Vec of undefined functions
- Test: Multi-line expressions, loops, conditionals
PromptBuilder (src/codex/prompt_builder.rs)
- build_expansion_prompt(node, context) → formatted string
- Include few-shot examples from .dev-docs/agents/recode/resources/
- Format variables as - name (type): value
- Test: Verify prompt contains all required sections

Week 4 Deliverables

OrchestratorEngine (src/orchestrator/engine.rs)
- run() → DFS traversal with state machine
- expand_node() → call Codex + parse response + create children
- process_node() → execute or detect expansion
- Error handling with retry logic
- Test: 5-level deep tree, verify all nodes visited
End-to-End Integration Test
- Run full ALFWorld task (pick_and_place_simple)
- Verify tree grows to 3+ levels
- Verify task succeeds with score >0
- Benchmark: DFS traversal <10ms for 100-node tree

Acceptance Criteria:

✅ AST parsing: all test cases pass (multi-line, loops, conditionals)
✅ E2E test: solve 1 ALFWorld task with recursive expansion
✅ Performance: DFS <10ms (100 nodes)

3.3 Phase 3: API & Client (Week 5)

Objective: Expose Rust core via RPC, TypeScript client library

Week 5 Deliverables

RpcServer (src/api/rpc_server.rs)
- TCP listener on localhost:9000
- Handle: Solve, GetTree, ResumeTask, GetStatus
- Serialize CodeTree to JSON
- Test: Call via curl, verify response
TypeScript Client (clients/typescript/)
- RecodeClient.solve(task, context)
- RecodeClient.getTree(taskId)
- Stream events via WebSocket
- Test: TS integration test calls Rust backend
CLI Tool (src/bin/recode.rs)
- recode solve "task description"
- recode tree show <task-id>
- recode exec --json "task" (for CI)
- Test: CLI completes 1 task end-to-end

Acceptance Criteria:

✅ RPC server handles concurrent requests
✅ TS client successfully calls Rust backend
✅ CLI completes ALFWorld task

3.4 Phase 4: Performance (Week 6)

Objective: Meet 10-100x speedup targets

Week 6 Deliverables

Benchmarks (benches/)
- DFS traversal: 1000 nodes <1ms
- AST parsing: 5000 lines <50ms
- Memory: 10-layer tree <50MB
- Tool: Criterion for statistical analysis
Optimization
- Arena allocator for CodeNode (reduce heap allocations)
- LRU cache for LLM responses
- Concurrent expansion of independent siblings (Tokio tasks)
- Tool: Flamegraph analysis

Acceptance Criteria:

✅ All performance targets met (DFS <1ms, AST <50ms, Memory <50MB)
✅ Benchmarks tracked in CI

3.5 Phase 5: Testing & Release (Week 7-8)

Week 7 Deliverables

Integration Tests (tests/integration/)
- ALFWorld: pick_and_place, clean, heat_then_cool
- ScienceWorld: boil_water, grow_plant
- WebShop: search_product, add_to_cart
- Error recovery: retry on failure
CI Pipeline (.github/workflows/ci.yml)
- Run cargo test + cargo clippy + cargo bench
- Run integration tests in Docker
- Publish binaries on release

Acceptance Criteria:

✅ All tests pass in CI
✅ Coverage >80%

Week 8 Deliverables

Release Preparation
- Version bump to 0.1.0-alpha
- CHANGELOG.md
- GitHub release with binaries
Docker Image
- Dockerfile with Rust + Python + Codex CLI
- Publish to Docker Hub
Documentation
- README.md with quick start
- Examples: simple_task, alfworld_task, custom_env
- Architecture diagram

Acceptance Criteria:

✅ cargo install recode-core works
✅ Docker: docker run recode/core solve "task"
✅ README comprehensive, examples runnable

4. PERFORMANCE TARGETS

Metric	Current (Python)	Target (Rust)	Measurement Method
DFS Traversal (1000 nodes)	~100ms	<1ms	Criterion benchmark
AST Parsing (5000 lines)	~500ms	<50ms	Criterion benchmark
Memory (10-layer tree)	~500MB	<50MB	Valgrind / heaptrack
Thread Creation	N/A (new process each time)	Persistent (reuse)	Integration test
Event Capture Rate	12.5% (1/8 types)	100% (8/8 types)	Unit test
Primitive Action Execution	0% (mock)	100% (real)	E2E test

5. SECURITY & SAFETY

5.1 Sandbox Configuration

Default Settings:

Sandbox: workspace-write (allows edits in Git repo + /tmp)
Approval: never (auto-approve for agent autonomy)
Network: Disabled in workspace-write mode

Trust Boundaries:

Repository must be marked as trusted (security review required)
All file edits confined to current Git repository
Commands executed in sandboxed environment (macOS: Seatbelt, Linux: Landlock)

5.2 Error Handling

Graceful Degradation:

LLM failures: Retry up to max_retry times (default: 5)
Sandbox violations: Log error, mark node as Failed
Recursion limit: Max depth 10 (prevent infinite loops)

Observability:

Structured logging via tracing
All JSONL events captured for debugging
Execution tree serialized for post-mortem analysis

6. FUTURE EXTENSIONS

6.1 CodeEnv (Real Codebase Operations)

Replace ALFWorld/ScienceWorld with real coding environment:

File system operations (read, write, edit)
Git operations (commit, branch, diff)
Bash command execution
Test runner integration

6.2 Multi-Modal Support

Extend to handle images, diagrams, UI screenshots:

codex exec --image <path> for visual inputs
Placeholder functions for image analysis tasks

6.3 Distributed Execution

Scale to large task trees:

Parallel expansion of independent siblings
Distributed state management (Redis/PostgreSQL)
Load balancing across multiple Codex instances

7. REFERENCES

7.1 Academic

ReCode Paper: arXiv:2510.23564v2 - "ReCode: Unify Plan and Action for Universal Granularity Control"
Cognitive Science: Prinz (1997), Koechlin & Summerfield (2003), Badre & D'Esposito (2009)

7.2 Technical

Codex CLI Docs: .knowledge/codex/docs/exec.md, sandbox.md, authentication.md
Codex SDK: .knowledge/codex/sdk/typescript/README.md
Python Prototype: .dev-docs/agents/recode/agent.py, utils.py, executor.py

7.3 Project Documents

Analysis: ARCHITECTURE_GAPS_ANALYSIS.md (comprehensive gap analysis)
Roadmap: dev-spec/roadmap/ROADMAP_V2_2025111602.md (8-week plan)
Summary: EXEC_SUMMARY_OPTIMIZATIONS.md (executive brief)

APPENDIX A: CARGO.TOML

[package]
name = "recode-core"
version = "0.1.0"
edition = "2021"
rust-version = "1.75"
authors = ["ReCodeAgent Team"]
description = "Production implementation of ReCode recursive code generation paradigm"
repository = "https://github.com/your-org/recode-agent"
license = "MIT"

[dependencies]
# Async runtime
tokio = { version = "1.35", features = ["full"] }

# CLI & Config
clap = { version = "4.4", features = ["derive"] }

# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

# Error handling
anyhow = "1.0"
thiserror = "1.0"

# IDs & Time
uuid = { version = "1.6", features = ["v4", "serde"] }
chrono = { version = "0.4", features = ["serde"] }

# AST Parsing
tree-sitter = "0.20"
tree-sitter-python = "0.20"

# Logging
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

# Performance
lru = "0.12"

[dev-dependencies]
criterion = "0.5"
tempfile = "3.8"

[[bench]]
name = "tree_traversal"
harness = false

[[bench]]
name = "ast_parsing"
harness = false

[profile.release]
lto = true
codegen-units = 1
opt-level = 3

APPENDIX B: PROJECT STRUCTURE

recode-core/
├── Cargo.toml
├── src/
│   ├── main.rs                      # CLI entrypoint
│   ├── lib.rs                       # Library exports
│   │
│   ├── orchestrator/                # Core orchestration
│   │   ├── mod.rs
│   │   ├── engine.rs                # OrchestratorEngine
│   │   ├── scheduler.rs             # Concurrent task scheduling
│   │   └── state.rs                 # NodeState state machine
│   │
│   ├── tree/                        # CodeNode tree structures
│   │   ├── mod.rs
│   │   ├── node.rs                  # CodeNode definition
│   │   ├── tree.rs                  # CodeTree with DFS iterator
│   │   ├── context.rs               # ExecutionContext
│   │   └── arena.rs                 # Arena allocator
│   │
│   ├── codex/                       # Codex CLI integration
│   │   ├── mod.rs
│   │   ├── thread_manager.rs        # CodexThreadManager
│   │   ├── event_bus.rs             # JSONL event parser
│   │   ├── prompt_builder.rs        # Expansion prompts
│   │   └── auth.rs                  # Authentication
│   │
│   ├── execution/                   # Code execution
│   │   ├── mod.rs
│   │   ├── python_executor.rs       # PythonExecutor
│   │   └── env_adapter.rs           # EnvironmentAdapter trait
│   │
│   ├── analysis/                    # Code analysis
│   │   ├── mod.rs
│   │   ├── code_splitter.rs         # AstCodeSplitter
│   │   └── need_expansion.rs        # NeedExpansion detector
│   │
│   ├── environments/                # Environment adapters
│   │   ├── mod.rs
│   │   ├── alfworld.rs              # ALFWorld adapter
│   │   ├── sciworld.rs              # ScienceWorld adapter
│   │   └── mock.rs                  # Mock environment
│   │
│   └── api/                         # RPC server
│       ├── mod.rs
│       └── rpc_server.rs            # JSON-RPC server
│
├── tests/
│   ├── integration/
│   │   ├── thread_persistence.rs
│   │   ├── alfworld_task.rs
│   │   └── ast_parsing.rs
│   └── fixtures/
│       └── prompts/
│
├── benches/
│   ├── tree_traversal.rs
│   └── ast_parsing.rs
│
└── examples/
    ├── simple_task.rs
    ├── alfworld_task.rs
    └── custom_env.rs

Document Status: ✅ OFFICIAL v0.1.0 Approval: Lead Architect Implementation Start: 2025-11-18 (Week 1) Expected Completion: 2026-01-17 (Week 8)

FilesExpand file tree

RECODE_ARCHITECTURE_V0.1.0.md

Latest commit

History

RECODE_ARCHITECTURE_V0.1.0.md

File metadata and controls

ReCodeAgent Architecture Design Document

DOCUMENT CONTROL

EXECUTIVE SUMMARY

Core Innovation

Architecture Highlights

1. ARCHITECTURE FOUNDATIONS

1.1 ReCode Methodology Primer

1.1.1 Decision Space Formulation

1.1.2 Granularity Hierarchy

1.1.3 Recursive Expansion Algorithm

1.2 Codex CLI Integration Model

1.2.1 Codex Execution Modes

1.2.2 JSONL Event Stream Protocol

1.2.3 Sandbox & Approval Configuration

2. SYSTEM ARCHITECTURE

2.1 Architecture Overview

2.2 Core Components Specification

2.2.1 CodeNode (src/tree/node.rs)

2.2.2 ExecutionContext (src/tree/context.rs)

2.2.3 CodexThreadManager (src/codex/thread_manager.rs)

2.2.4 PythonExecutor (src/execution/python_executor.rs)

2.2.5 AstCodeSplitter (src/analysis/code_splitter.rs)

3. IMPLEMENTATION PHASES

3.1 Phase 1: Foundations (Week 1-2)

Week 1 Deliverables

Week 2 Deliverables

3.2 Phase 2: AST & Orchestration (Week 3-4)

Week 3 Deliverables

Week 4 Deliverables

3.3 Phase 3: API & Client (Week 5)

Week 5 Deliverables

3.4 Phase 4: Performance (Week 6)

Week 6 Deliverables

3.5 Phase 5: Testing & Release (Week 7-8)

Week 7 Deliverables

Week 8 Deliverables

4. PERFORMANCE TARGETS

5. SECURITY & SAFETY

5.1 Sandbox Configuration

5.2 Error Handling

6. FUTURE EXTENSIONS

6.1 CodeEnv (Real Codebase Operations)

6.2 Multi-Modal Support

6.3 Distributed Execution

7. REFERENCES

7.1 Academic

7.2 Technical

7.3 Project Documents

APPENDIX A: CARGO.TOML

APPENDIX B: PROJECT STRUCTURE