Skip to content

Latest commit

 

History

History
319 lines (220 loc) · 19.7 KB

File metadata and controls

319 lines (220 loc) · 19.7 KB

Skills vs Sub-Agents: Cross-Platform Analysis

How do modern AI coding assistants handle extensibility and task delegation? This document compares three platforms — Claude Code, OpenAI Codex CLI, and OpenClaw — through the lens of two fundamental patterns: skills (context injection) and sub-agents (context isolation).


1. The Two Patterns

At their core, AI coding assistants face a recurring architectural tension: should a new capability share the current conversation's context, or should it run in isolation with its own context window?

Skills (Context Injection): A skill loads additional instructions, templates, or prompts directly into the current conversation context. The model sees everything — the user's history, the skill's instructions, and all available tools — in a single context window. This is lightweight but adds to context pressure.

Sub-Agents (Context Isolation): A sub-agent spawns a separate execution context with its own conversation history. The parent provides a focused prompt, the sub-agent works independently, and returns a compressed result. This provides isolation but introduces coordination overhead.

Every platform implements some version of these two patterns, though they differ substantially in mechanics, naming, and emphasis.


2. Claude Code

Claude Code provides the clearest separation between these two patterns, making them explicit first-class tools.

Skills (Skill Tool)

Mechanism: The Skill tool loads a named skill definition (a prompt template, often backed by a SKILL.md file or a registered command) and injects its content directly into the current conversation turn. The model receives the skill's instructions inline and follows them within the same context window.

How it works:

  1. User or model invokes Skill(name="commit")
  2. The skill's prompt template is fetched and injected as a <command-name> block
  3. The model reads the injected instructions and executes them using all available tools
  4. Execution happens inline — no new process, no context boundary

Strengths:

  • Zero overhead: No process spawn, no context serialization. The skill activates instantly.
  • Full context access: The skill sees the entire conversation history, all prior tool results, and user preferences. This is critical for tasks like code review or commit message generation that depend on what just happened.
  • Immediate execution: Results appear in the same conversation turn with no handoff delay.
  • Composability: Skills can reference each other and build on prior context naturally.

Weaknesses:

  • Context pollution: Every skill injection consumes context window tokens, reducing space for the actual task.
  • No isolation: A poorly written skill can confuse the model or conflict with other instructions.
  • Sequential only: Skills execute in the current turn; you cannot run two skills in parallel.
  • No state reset: The model carries all accumulated context, which can lead to drift on long sessions.

Sub-Agents (Agent Tool)

Mechanism: The Agent tool spawns an isolated agent process with its own context window. The parent provides a natural-language prompt describing the task, the sub-agent works independently with access to the same tools (Read, Write, Edit, Grep, Glob, Bash), and returns a text summary when done.

How it works:

  1. Parent invokes Agent(prompt="Analyze the test coverage in src/") with an optional type (Explore, Plan, code-reviewer, etc.)
  2. A new agent process starts with a fresh context window containing only the provided prompt and system instructions
  3. The sub-agent calls tools, reads files, and builds its own understanding
  4. Upon completion, it returns a summary to the parent
  5. The parent receives only the summary — not the sub-agent's full conversation history

Strengths:

  • Context isolation: The sub-agent starts fresh, uncontaminated by the parent's accumulated context. This produces more focused results.
  • Parallelism: Multiple sub-agents can run concurrently (foreground + background), enabling parallel exploration or implementation.
  • Fresh context per task: Each sub-agent gets the full context window for its specific task, avoiding the "stuffed context" problem.
  • Resumability: Agents can be resumed by ID, enabling long-running workflows.
  • Specialization: Agent types (Explore, Plan, code-reviewer) carry role-specific system prompts that guide behavior.

Weaknesses:

  • Spawn overhead: Creating a new agent process and context takes time (seconds, not milliseconds).
  • Context handoff cost: The parent must describe the task well enough for the sub-agent to work without seeing the conversation history. Poor prompts lead to poor results.
  • Result compression: The sub-agent's full reasoning is compressed into a summary, losing nuance and detail.
  • Tool re-discovery: The sub-agent must re-read files the parent already read, duplicating work.

When to Use Which

Scenario Pattern Rationale
Generate a commit message Skill Needs full context of what just changed
Review a PR Skill Needs conversation context about the task
Explore an unfamiliar codebase Sub-agent Benefits from fresh context, isolation from current task
Implement a feature across many files Sub-agent Large task benefits from dedicated context window
Run parallel investigations Sub-agent Only agents support true parallelism
Quick template application (auth, scaffold) Skill Low overhead, needs project context
Deep analysis of a single module Sub-agent Dedicated focus without context noise

3. Codex CLI

OpenAI's Codex CLI (open-sourced mid-2025) takes a different approach. Written primarily in Rust with a TypeScript CLI frontend, it emphasizes sandboxed execution and hierarchical configuration over explicit skill/agent separation.

Architecture Overview

Codex CLI is structured around a session-task model rather than an explicit skill/agent dichotomy:

  • Session: The top-level conversation container. Owns the task lifecycle, manages authentication, and holds conversation history.
  • Tasks: Asynchronous units of work (SessionTask trait) that drive conversation turns. Each task receives a SessionTaskContext (auth, models) and a TurnContext (conversation metadata).
  • Agents: Configurable roles defined in AgentsToml with support for threading, nesting depth limits, and per-role configuration files.

Skills in Codex

Codex implements skills through a dedicated skills/ module with these components:

  • Skill Loader: Loads skill definitions from SKILL.md files (similar to Claude Code's approach)
  • Skill Manager: Handles skill lifecycle and coordination
  • Skill Injection: A dependency injection system (build_skill_injections) that resolves skill dependencies per turn
  • Implicit Invocation: Skills can be triggered automatically based on context, not just explicit commands — a notable difference from Claude Code where skill invocation is always explicit

Skills in Codex are turn-aware (resolve_skill_dependencies_for_turn), meaning the system dynamically determines which skills are relevant for each conversation turn and injects their context accordingly. This is more automated than Claude Code's explicit Skill tool invocation.

Key characteristics:

  • Skills have metadata (SkillMetadata), policies (SkillPolicy), and environment variable dependencies
  • Both explicit mentions and implicit path-based indexing trigger skill injection
  • Skills are rendered into the conversation context via render_skills_section

Agents in Codex

Codex supports hierarchical agent spawning with safety controls:

  • Thread Spawning: Agents can spawn child agents, controlled by max_threads and max_depth limits
  • Depth Guards: exceeds_thread_spawn_depth_limit and next_thread_spawn_depth prevent runaway recursion
  • Named Roles: Custom agent roles with human-facing descriptions, role-specific configs, and candidate nicknames
  • External Agent Config: Support for external agent definitions (external_agent_config.rs)

The agent model is hierarchical — root sessions sit at depth 0, and spawned agents increment the depth counter. This is structurally similar to Claude Code's sub-agent pattern but with explicit depth and thread limits baked into configuration.

Context Management

Codex takes a layered approach to context:

  • AGENTS.md Hierarchy: Loads and merges AGENTS.md files from three locations (global ~/.codex/, project root, current directory), truncating at 32 KiB per file
  • Memory System: Dedicated memories/ module with MemoriesConfig, memory_trace.rs, and persistent history (~/.codex/history.jsonl)
  • Tool Output Budget: Configurable tool_output_token_limit to manage context consumption from tool results

Sandboxing Model

A distinctive feature of Codex is its OS-level sandboxing with three autonomy tiers:

  • Suggest: Read-only, recommendations only
  • Auto-edit: Applies file patches without approval
  • Full-auto: Executes commands without approval

Sandboxing uses Apple Seatbelt on macOS and Docker with iptables on Linux, providing security guarantees that Claude Code handles through its permission system.

Mapping to Our Framework

Concept Claude Code Equivalent Notes
Skill injection Skill tool More automated (implicit invocation)
Agent threading Agent tool Hierarchical with depth limits
AGENTS.md hierarchy CLAUDE.md Similar layered config, different naming
Session tasks N/A (implicit) Codex makes task lifecycle explicit
Sandboxing tiers Permission system OS-level vs tool-level isolation
Memory system Conversation history Codex persists across sessions

4. OpenClaw

OpenClaw is a local-first personal AI assistant built in TypeScript/Node.js. Its architecture differs fundamentally from both Claude Code and Codex by introducing a gateway-centric design with WebSocket-based coordination.

Architecture Overview

OpenClaw's core architectural primitive is the Gateway — a WebSocket control plane running at ws://127.0.0.1:18789 that acts as the central nervous system for all agent coordination:

  • Gateway: WebSocket server that routes messages between agents, manages sessions, and enforces isolation
  • Workspace Configuration: Defines which agents are available and how they're routed
  • Sessions: Isolated conversation containers with their own history and tool access

Skills: ClawHub Registry

OpenClaw's skill equivalent is the ClawHub — a managed skill registry for extensions:

  • Managed Extensions: Skills are packaged as self-contained extensions registered in ClawHub
  • Installation Model: Skills are installed into a workspace, similar to package managers
  • Runtime Loading: Extensions are loaded at runtime and made available to agents through the gateway
  • Standardized Interface: Each skill exposes a consistent API that the gateway can route to

Unlike Claude Code's inline prompt injection or Codex's turn-aware injection, OpenClaw skills are standalone services that communicate through the gateway. This provides stronger isolation at the cost of higher communication overhead.

Agents: Session-Based Coordination

OpenClaw's agent model is built around sessions with explicit inter-agent communication tools:

  • sessions_list: Discover active agent sessions
  • sessions_send: Send messages between agent sessions
  • sessions_history: Read another session's conversation history

This is a fundamentally different model from Claude Code's sub-agents:

Aspect Claude Code Sub-Agent OpenClaw Session Agent
Creation Parent spawns child with prompt Gateway creates session from config
Communication One-way (prompt in, summary out) Bidirectional via session tools
Lifecycle Tied to parent task Independent, persists across interactions
Context sharing None (isolated) Explicit via sessions_history
Discovery Parent knows child ID Agents discover each other via sessions_list

Gateway Routing as Orchestration

The gateway serves as an implicit orchestrator:

  • Multi-Agent Routing: Workspace configuration determines which agent handles which type of request
  • Session Isolation: Each agent session is isolated by default; sharing requires explicit tool use
  • Message Passing: The gateway routes messages based on workspace rules, not explicit agent-to-agent calls

This is closer to a microservices architecture than the parent-child model used by Claude Code and Codex. Agents are peers coordinated by infrastructure rather than a hierarchical spawn tree.

Mapping to Our Framework

Concept Claude Code Equivalent Notes
ClawHub skills Skill tool Service-based vs prompt injection
Session agents Agent tool Peer-based vs parent-child
Gateway routing N/A Infrastructure-level orchestration
sessions_send Agent prompt Bidirectional vs unidirectional
sessions_history N/A Explicit context sharing (opt-in)
Workspace config CLAUDE.md Routing rules vs instructions

5. Comparison Matrix

Dimension Claude Code Codex CLI OpenClaw
Skill mechanism Explicit Skill tool injects prompt into context Turn-aware injection with implicit/explicit triggers ClawHub registry; skills are standalone services
Agent mechanism Agent tool spawns isolated child process Hierarchical thread spawning with depth limits Session-based peers coordinated via gateway
Context isolation Binary: shared (skill) or fully isolated (agent) Configurable depth; memory system bridges sessions Session-isolated by default; opt-in sharing via tools
Parallelism Foreground + background agents Thread pool with max_threads config Concurrent sessions via gateway routing
Composability Skills compose naturally; agents compose via chaining Skills auto-compose via dependency injection Agents compose via message passing
Orchestration model Parent-child (hierarchical) Parent-child with depth limits (hierarchical) Peer-to-peer via gateway (flat)
Persistence Conversation-scoped (agents resumable by ID) Cross-session memory and history Session-persisted, gateway-managed
Sandboxing Tool-level permissions OS-level sandbox (Seatbelt/Docker) Process-level session isolation
Configuration CLAUDE.md + skill definitions AGENTS.md hierarchy + AgentsToml + SkillsConfig Workspace config + ClawHub registry
Developer experience Simple: two explicit tools Richer config surface; more automated Service-oriented; higher setup cost

6. Key Insights

Insight 1: The Isolation-Context Trade-off is Universal

All three platforms grapple with the same fundamental tension: more isolation means less context, and more context means less isolation. They solve it differently:

  • Claude Code makes it a binary choice — skill (full context) or agent (zero context)
  • Codex adds a middle ground with memory systems that bridge sessions and turn-aware skill injection that manages context automatically
  • OpenClaw defaults to isolation but provides explicit tools (sessions_history) for opt-in context sharing

Insight 2: Implicit vs Explicit Invocation

Claude Code requires explicit invocation of both skills and agents. Codex introduces implicit skill invocation — the system determines which skills are relevant per turn and injects them automatically. This reduces developer cognitive load but makes behavior less predictable.

OpenClaw's gateway routing is also implicit — workspace configuration determines which agent handles a request without the user explicitly choosing.

Insight 3: Hierarchical vs Flat Orchestration

Claude Code and Codex use hierarchical orchestration (parent spawns child, child returns result to parent). OpenClaw uses a flat, peer-based model where agents communicate laterally through a gateway.

Hierarchical models are simpler to reason about but create bottlenecks at the parent. Flat models enable more complex multi-agent workflows but are harder to debug and reason about.

Insight 4: The Skill Packaging Spectrum

Skills exist on a spectrum from lightweight to heavyweight:

  1. Prompt templates (Claude Code): Just text injected into context. Zero overhead, zero isolation.
  2. Dependency-managed modules (Codex): Skills with metadata, policies, and environment dependencies. Medium overhead, some lifecycle management.
  3. Standalone services (OpenClaw): Skills as independent services behind a gateway. High overhead, full isolation.

The right choice depends on the complexity of the skill and the need for isolation.

Insight 5: Context Management is the Real Differentiator

The platforms differ most in how they manage context across boundaries:

  • Claude Code: No cross-boundary context. Sub-agents start fresh; skills see everything.
  • Codex: Memory traces and persistent history create continuity across sessions. Skills are turn-aware.
  • OpenClaw: Explicit context-sharing tools create a middle ground where agents can selectively access each other's history.

7. When to Use Which Pattern

Decision Framework

Is the task small and does it need the current conversation context?
├── YES → Use a Skill (context injection)
│   Examples: commit messages, quick scaffolding, template application
│
└── NO → Does the task need isolation or parallelism?
    ├── YES → Use a Sub-Agent (context isolation)
    │   Examples: codebase exploration, parallel implementation, deep analysis
    │
    └── NO → Does the task need multi-agent coordination?
        ├── YES → Use Gateway/Session pattern (OpenClaw-style)
        │   Examples: complex workflows, multi-role collaboration
        │
        └── NO → Consider whether the task should be a skill with
                  implicit invocation (Codex-style) for seamless UX

Pattern Selection by Task Characteristics

Task Characteristic Recommended Pattern Platform Example
Needs full conversation history Skill (injection) Claude Code Skill
Needs dedicated focus Sub-agent (isolation) Claude Code Agent
Should activate automatically Implicit skill Codex turn-aware injection
Requires multi-agent collaboration Session/gateway OpenClaw sessions_send
Must persist across sessions Memory-backed agent Codex memory system
Needs OS-level security Sandboxed execution Codex Seatbelt/Docker
High-frequency, low-overhead Prompt injection Claude Code Skill
Complex dependency chain Managed skill Codex skill dependencies

Platform Recommendations

Choose Claude Code's model when: You want simplicity and explicit control. The binary skill/agent split is easy to reason about and covers the vast majority of use cases. Best for individual developer workflows.

Choose Codex CLI's model when: You need richer automation. Implicit skill invocation and persistent memory reduce manual orchestration. Best for teams that want convention-over-configuration.

Choose OpenClaw's model when: You need multi-agent collaboration with selective context sharing. The gateway pattern scales to complex workflows. Best for building AI-assisted systems with multiple specialized agents.


Analysis based on publicly available source code and documentation as of March 2026.