TeaAgent Architecture

System Overview

TeaAgent is a thin governance-first agent harness. It does not implement its own LLM framework — it connects to model providers through adapters and enforces safety boundaries around tool execution.

┌─────────────────────────────────────────────────────────────┐
│                          CLI / TUI                           │
├─────────────────────────────────────────────────────────────┤
│                      ModelDecisionEngine                     │
│  (prompt assembly → JSON parsing → structured decisions)    │
├──────────────────────┬──────────────────────────────────────┤
│     AgentRunner       │          ChatAgentConfig             │
│  (decision loop,      │    (high-level convenience wrapper   │
│   budget, approval,   │     around AgentRunner + LLM)       │
│   audit)              │                                      │
├──────────────────────┴───────────┬──────────────────────────┤
│           ToolRegistry            │      ApprovalPolicy       │
│    (register, dispatch, validate) │  (5 permission modes)     │
├──────────────────────────────────┴──────────────────────────┤
│                     Workspace Tools                          │
│  read_file · write_file · apply_patch · edit_at_hash         │
│  run_shell_inspect · run_shell_mutate · list_files           │
  search_text · git_status                                    │
├─────────────────────────────────────────────────────────────┤
│              Multi-Agent Coordination Layer (Phase 4-5)       │
│  TaskCoordinator · AgentFactory · ToolPermissionManager       │
│  WorkflowEngine (polish mode, multi-step execution)          │
│  ContextBus (cross-sandbox Delta sharing)                     │
│  JITApprovalServer (remote SSE with timeout)                │
│  CentralizedApprovalQueue (aggregated subagent approvals)    │
├─────────────────────────────────────────────────────────────┤
│                      State Layer                             │
│  AuditLogger · RunStore · MemoryCatalog · UltraworkStore     │
├─────────────────────────────────────────────────────────────┤
│                  Infrastructure                              │
│  OAuth 2.1/DPoP · MCP HTTP/stdio · OTel · Graph RAG         │
│  Code Mode · LLM Conformance · Provability                   │
└─────────────────────────────────────────────────────────────┘

5-Loop Governance System (Completed Hardening)

The TeaAgent governance system has been hardened through a comprehensive 5-loop architecture that provides complete operational closure and security boundaries:

Loop 1: Tool Governance (CI Gate & Manifests)

ToolRegistry with security tier mapping: Tools now include security_tier annotations (Low, Medium, High, Critical) with automatic tier calculation based on annotations and capability manifests
Enhanced tool linting: Static validation checks for write-like keywords in read_only tool descriptions, capability manifest validation with tier mismatch warnings
AST-based fuzz checking: selftest.py includes static analysis to detect tools marked as read_only=True that contain write operations in their implementation
Capability manifest enforcement: Tools must declare capabilities (filesystem_write, network) with preflight warnings for undeclared capabilities

Loop 2: Coding Safety Loop (Plan Binding & Validation)

Strict plan-before-write enforcement: workspace-write mode now requires plan binding by default (user-approved strict immediate block)
PlanContract file target validation: Plans include approved file target lists with allows_file_write() method to prevent un-declared file modifications
Validation profile integration: Fast, Standard, and Strict validation profiles wired to WorkflowEngine with automatic rollback on strict validation failure
JIT rollback integration: Strict validation failures trigger automatic rollback via UndoJournal

Loop 3: Audit / Replay Loop (Tiered Logging & Integrity)

Tiered audit levels: L0 (Metrics-only), L1 (Metadata), L2 (Redacted Payload), L3 (Full Local Trace) with configurable filtering
Audit chain integrity verification: SHA-256 hash chain validation for trace import to prevent tampering
Per-project encryption support: L3 audits support per-project encryption keys for metadata leakage prevention
TUI run trace surface: Interactive run store management for trace, export, and replay operations

Loop 4: Memory / Failure Loop (Curation & Warning Injection)

Confidence-based blocking: Low-confidence failure cards never block execution automatically (enforced warning thresholds)
Enhanced CLI curation suite: teaagent memory failures review/prune/invalidate commands with confidence filtering
Custom invalidation rules: Per-project automated invalidation rules (e.g., auto-pruning when target files change)
Memory hygiene enforcement: TTL expiration rules and manual correction capabilities for memory poisoning

Loop 5: Swarm & Tournament Sandbox Hardening

Approval lineage tracing: Subagents carry parent-run IDs and inherit permission mode constraints with structured tracking
Fail-fast approval logic: Tournament/parallel mode halts immediately if any subagent requires human permission (user-approved)
Git worktree sandbox enforcement: Tournament runs require git worktree isolation as hard pre-condition for zero-contamination guarantees
Security-aware tournament scoring: Weighted comparator schema (tests 40%, performance 15%, lint 10%, diff size 10%, architectural fit 15%, security 10%)

Phase 4-5 Roadmap (Beta)

Core Phase 4 (consensus) and Phase 5 (sandbox routing/execution) modules are shipped with CLI, unit tests, and E2E acceptance. Optional hardening (async vote polling, WASM skill execution) is shipped; docker-smoke CI is advisory (see CONTRIBUTING.md). Remaining Beta work is native WASM modules and deeper tournament benchmarks. See backlog-priority.md.

Phase 4: Federated Swarm Consensus & Peer Attestations — Beta

ConsensusEngine, peer registry, voting mechanisms, and attestation trail
Swarm pre-approval gate when ConsensusConfig.enable_pre_approval matches task patterns
Async vote collection via ConsensusConfig.async_vote_collection + poll_until_resolved
CLI: teaagent consensus (teaagent/cli/_handlers/_consensus.py)
E2E: tests/acceptance/test_consensus_flow.py

Phase 5: Hardened Sandbox Virtualization — Beta

Docker resource limits via prepare_subagent_isolation (--cpus / --memory)
WASM runtime wrapper (teaagent/wasm_runtime.py) and skill routing (teaagent/skill_router.py)
Skill execution: teaagent/skill_executor.py, teaagent sandbox execute
isolation=auto on subagents with skill_path / skill_risk_level
Resource monitoring CLI (teaagent sandbox monitor)
E2E: tests/acceptance/test_sandbox_enhancement_flow.py

Phase 6: Skill writer, docker monitor, control plane — Beta

SkillWriter publish/review pipeline (teaagent/skill_writer.py)
Docker sandbox resource monitor + abort (teaagent/docker_sandbox.py)
Prompt tournament fitness scoring in SwarmManager (.teaagent/prompt_gene_pool.jsonl)
Control plane HTTP + dashboard (teaagent/control_plane_api.py, teaagent/html_dashboard/)
CLI: teaagent control-plane serve (workflow/focus/JIT SSE dashboard)

See backlog-priority.md for detailed task breakdowns and implementation status.

Component Layers

1. Decision Loop

AgentRunner is the core execution loop. It accepts a DecisionFn — any callable that takes a context dict and returns either a ToolRequest or FinalAnswer:

ToolRequest: name + arguments to dispatch through ToolRegistry.
FinalAnswer: content + metadata to return to the caller.

The loop enforces iteration limits, tool-call limits, and cost budgets on every iteration. Every decision and execution is recorded through AuditLogger.

ModelDecisionEngine implements the standard LLM path: it assembles a system prompt, appends tool metadata and memory, calls the LLM adapter, and parses the JSON response into Decision objects. ChatAgentConfig bundles all the configuration needed for a complete model-driven agent run.

2. Tool Governance

All tools are registered through ToolRegistry with:

Property	Purpose
`name`	Unique identifier (no spaces)
`description`	Human-readable purpose for prompt injection
`input_schema`	JSON Schema for argument validation
`output_schema`	JSON Schema for result validation
`annotations`	`read_only`, `destructive`, `idempotent`

ApprovalPolicy sits between the decision loop and tool execution. It checks annotations against the active PermissionMode before any destructive tool runs:

Mode	Read	Write	Shell Mutate	Destructive Approval
`read-only`	Yes	No	No	Blocked
`workspace-write`	Yes	Yes	No	Blocked
`prompt`	Yes	Yes	Conditional	Human-in-the-loop
`allow`	Yes	Yes	Yes	Session-scoped
`danger-full-access`	Yes	Yes	Yes	None

3. Audit and Observability

AuditLogger is the universal event sink. Every AgentRunner iteration, tool call, approval decision, and final result produces an AuditEvent:

Events are appended to a per-run JSONL file with fcntl.LOCK_EX and fsync.
Sensitive keys (api_key, token, secret, …) and tool argument values (content, command, …) are redacted before persistence.
String-level patterns (Bearer tokens, sk-* keys, query-param secrets) are also redacted.

Sinks plug into AuditLogger.add_sink():

InMemoryMetricsSink collects counters and histogram samples.
OTelAuditSink converts events into OpenTelemetry spans.
OTelMetricsSink converts events into OTel counters/histograms.

RunStore manages per-run audit files and provides listing, inspection, task replay, and heartbeat tracking for resumable agent runs.

4. Workspace Isolation

Workspace tools operate within a configurable root directory. Every tool goes through:

Path resolution — rejects ../, absolute paths, and symlink escapes.
Size enforcement — max_read_bytes, max_write_bytes, max_shell_output_bytes.
Shell classification — quote-aware scanning splits commands into inspect (safe: ls, cat, git status) and mutate (everything else).
Shell execution — inspect commands run with shell=False after allowlist argv validation; find -delete/-exec and git -c/--config are blocked.
Edit safety — apply_patch requires unique match; edit_at_hash uses CRC32 line anchors.

5. Code Mode

Restricted Python execution with AST allow-list validation:

Backend	Isolation Level
Child process (default)	`RLIMIT_CPU`, wall-clock timeout, advisory `RLIMIT_AS`
Container	Docker/Podman: `--network none`, `--read-only`, `--cap-drop=ALL`, non-root, tmpfs, CPU/memory/PID limits, streaming output cap, image digest pinning, image allowlist

Code Mode allows only a fixed set of AST nodes and builtin functions — no imports, no attributes, no arbitrary calls.

6. Governance Hardening (Tranche B)

Plan-before-Write Enforcement:

workspace-write mode now enforces plan-by-default for safety
--require-plan flag blocks destructive tools without a bound plan artifact
--skip-plan-check provides explicit override for power users
Implemented in teaagent/governance/plan_gate.py with strict defaults

Automated Memory Invalidation:

Conservative default rules prevent memory corruption:
- file_signature_change: invalidate when files change
- test_refactor: warn when test files are modified
- dependency_version_change: warn on dependency updates
Per-project customization via .teaagent/config.json
CLI command: teaagent memory failures auto-invalidate
Implemented in teaagent/memory/failure_card.py with signature tracking

Centralized Approval Queue:

Aggregates destructive tool requests from multiple subagents
Supports batch approval/deny with full lineage tracking
Prevents approval fatigue in tournament/swarm modes
Timeout handling and request lifecycle management
Implemented in teaagent/subagents/_approval_queue.py

Governance Fuzz Tests:

Comprehensive adversarial test suite in tests/test_governance_fuzz.py
Validates plan-before-write enforcement, memory invalidation, and approval queue security
Tests conservative defaults and path filtering
Integrated into CI governance gate

7. OAuth 2.1 / DPoP

OAuth21AuthorizationServer and OAuth21ResourceServer implement the authorization code grant with PKCE (S256) and optional DPoP proof-of-possession:

Authorization codes are one-time (consume-and-delete semantics).
Access tokens are HS256 JWTs with kid for key rotation.
DPoP nonces are consumed on validation (no replay).
DPoP proof jti values have short-lived replay caches.
SQLiteOAuthStore provides durable client/authorization-code/nonce storage with PBKDF2-SHA256 client-secret hashing.

8. MCP Transport

Two transports share the same handle_mcp_request() dispatch:

stdio: Standard JSON-RPC over stdin/stdout.
Streamable HTTP: POST /mcp (JSON-RPC), GET /mcp (SSE keepalive), DELETE /mcp (session teardown), OPTIONS /mcp (CORS preflight).

The HTTP server enforces:

Bearer token or OAuth 2.1 authentication for non-loopback binds.
Origin allowlist for browser-initiated requests.
Mcp-Session-Id session tracking.
Body size limits with 413 for oversized payloads.

9. LLM Integration

teaagent.llm provides a unified adapter layer (LLMAdapter) across 13 registered providers in PROVIDER_CONFIGS: claude, gpt, gemini, openrouter, ollama, vllm, opencodezen, opencodezen-go, mistral, deepseek, grok, workers-ai, and aigateway. Credential env vars are unique per provider key except shared CLOUDFLARE_API_TOKEN (workers-ai and aigateway) and shared OPENCODEZEN_API_KEY (both OpenCodeZen adapters). Each adapter implements chat() returning an LLMResponse. Features include:

Configurable exponential-backoff retry (LLMRetryConfig).
Cost budget pre-flight.
Streaming via stream=True and on_chunk callbacks.

10. External Federation Boundary (ANP Adapter)

TeaAgent treats ANP as an optional external federation surface through a bidirectional adapter boundary:

Inbound (ANP -> TeaAgent): ANPGovernedService normalizes network requests into AgentRunner tool execution and must still pass ToolRegistry, ApprovalPolicy, budget enforcement, and AuditLogger.
Outbound (TeaAgent -> ANP): selected tasks can be delegated to ANP peers through a typed client, then mapped back into internal result/audit models.

This keeps core runtime governance stable while enabling cross-organization agent interoperability. See ADR 0007 for scope and invariants.

Data Flow

User / CLI
  │
  ├─ task ───────────────────────────────────────► AgentRunner.run()
  │                                                  │
  │                                    ┌─────────────┴──────────────┐
  │                                    │  while iter < budget:       │
  │                                    │    decision = decide(ctx)   │
  │                                    │    if FinalAnswer → return  │
  │                                    │    if ToolRequest:          │
  │                                    │      policy.assert_allowed  │
  │                                    │      result = reg.execute   │
  │                                    │      ctx.observations.add   │
  │                                    │    audit.record(every step) │
  │                                    └────────────────────────────┘
  │
  └─ RunResult ◄──────────── final_answer, iterations, tool_calls, status

State Boundaries

Store	Medium	Locking	Purpose
`AuditLogger`	JSONL	`fcntl.LOCK_EX` + `fsync`	Per-run event log
`MemoryCatalog`	JSONL	`fcntl.LOCK_EX` + `fsync`	Workspace observations
`RunStore`	JSONL	`atomic_write_text` (lock + replace)	Run history and replay
`UltraworkStore`	JSONL	`atomic_write_text`	Worker lifecycle records
`SQLiteOAuthStore`	SQLite	WAL + `BEGIN IMMEDIATE`	OAuth clients/codes/nonces
`ContextBus`	SQLite	WAL; per-thread connections	Cross-agent Delta cards
`FederatedGraphSync`	JSON	none (single-writer file)	Graph sync state + exports

JSONL rows assume a single writer per workspace on a local or advisory-lock-safe filesystem. NFS multi-writer shared roots are unsupported — see ADR 0008.

All state is externalized to the filesystem. In-memory runner state is temporary only — every meaningful event persists to disk before the caller sees the result.

Extension Points

New tool: register through ToolRegistry with schemas and annotations.
New LLM provider: implement LLMAdapter.chat() returning LLMResponse.
New OAuth store: implement OAuthStore protocol (SQL, Redis, …).
New Code Mode backend: implement CodeModeBackend protocol.
New audit sink: call audit.add_sink(callback) with any AuditEvent → None.
New MCP transport: call handle_mcp_request(registry, payload).
New hook: register through HookRegistry with 8-event lifecycle.
New plugin: add to .teaagent/plugins/ with plugin.json.

Hook System (8-Event Lifecycle)

TeaAgent implements Claude Code compatible 8-event hook system:

Event	Trigger	Use Case
`SessionStart`	Before session begins	Initialize context, load configs
`UserPromptSubmit`	After user message	Log prompts, analyze intent
`PreToolUse`	Before tool execution	Permission checks, input validation
`PostToolUse`	After tool execution	Lint checks, test runs
`PreCompact`	Before context compression	Prepare for compaction
`Stop`	Before session stops	Save state, cleanup
`SubagentStop`	After subagent completes	Aggregate results
`SessionEnd`	After session ends	Finalize audit, memory flush

Built-in hooks:

permission_check_hook - Enforce Allow/Ask/Deny patterns
lint_check_hook - Run linter after file modifications
run_tests_hook - Run tests after code changes
mcp_tool_filter_hook - Filter MCP tools by allow/block lists

Three-Tier Memory System

Claude Code compatible memory hierarchy:

Tier	Location	Git-tracked	Use Case
Project	`.teaagent/memory.jsonl`	Yes	Team-shared context
Personal	`~/.config/teaagent/memory.jsonl`	No	User-specific notes
Auto-Memory	`.claude/MEMORY.md`	No	Persistent learnings

from teaagent.memory import MemoryHierarchy

mem = MemoryHierarchy(root="/path/to/project")
mem.project.add("Found a bug in auth module", tags=("bug", "auth"))
mem.personal.add("User prefers dark mode", tags=("preference",))

# Search across all tiers
results = mem.search_all("bug", limit=10)

Plugin System

Four extension points (Claude Code compatible):

1. Commands

Slash commands that add CLI functionality.

2. Agents

Custom subagents with specialized prompts and tool subsets.

3. Hooks

Lifecycle event handlers (see Hook System above).

4. MCP Servers

External tool integrations.

Discovery order (first match wins):

Project: <workspace>/.teaagent/plugins/
User: ~/.config/teaagent/plugins/
Built-in: teaagent/plugins/builtin/

Context Compaction

Automatic context compression at threshold levels (Claude Code traffic light):

Level	Token Usage	Behavior
Green	0-75%	Normal operation
Yellow	75-92%	User hints for session save
Red	92%+	Auto-triggered compaction

from teaagent.context import CompactionManager

manager = CompactionManager()
if manager.should_compact(token_count=180000):
    result = manager.check_and_compact(context, 180000)
    # Tokens saved: result.tokens_saved

Plan Mode

Read-only exploration mode for safe codebase analysis:

from teaagent.plan_mode import PlanMode, PlanModeState

plan = PlanMode()
plan.enable("Analyzing unfamiliar codebase")

# Tools blocked in plan mode:
# - workspace_write_file
# - workspace_apply_patch
# - shell

plan.add_note("Found authentication module at line 42")
plan.disable()

ACP (Agent Client Protocol)

IDE integration for VS Code, Zed, and JetBrains:

CLI/TUI → ACPServer → JSON-RPC over stdio → IDE

Methods:

initialize - Handshake and capability negotiation
tools/list - List available tools
tools/call - Execute a tool
completion - Request agent completion
tools/cancel - Cancel running tool

Mainstream Framework Comparison

TeaAgent is a governance-first agent harness. This section documents how it compares to the four mainstream coding-agent frameworks surveyed in scripts/refresh_agent_readme_survey.md, last refreshed 2026-05-24.

Feature Coverage Matrix

Capability	TeaAgent	Claude Code	Codex	OpenCode
Terminal-first CLI/TUI	✅	✅ `claude` CLI	✅ `codex` TUI	✅ `opencode` TUI
Multi-provider LLM	✅ 13 providers	❌ Anthropic only	❌ OpenAI only	✅ 11+ providers
Permission modes	✅ 5 modes	✅ deny/ask/allow	✅ 4 sandbox policies	✅ 3 approval modes
Hook system (8 events)	✅ Claude Code compatible	✅ 8+ events	✅ 7+ events	—
MCP server / client	✅ stdio + HTTP	✅	✅ codex-mcp-server	✅ stdio + SSE
Skills / plugins	✅ 4 extension points	✅ Plugin marketplace	✅ Skills + Plugins	—
Sub-agent isolation	✅ 3 modes (shared/worktree/container)	✅ 2 modes (shared/worktree)	✅ Thread manager	—
Context compaction	✅ 75-92% traffic light	✅ 98% auto	✅ History compaction	✅ Auto-compact
Three-tier memory	✅ Project/Personal/Auto	✅ CLAUDE.md files	✅ 2-phase extraction	—
Audit / governance	✅ JSONL hash chain	—	—	—
Undo / rollback	✅ Undo journal	—	—	✅ File history
Read-before-write mtime guard	✅ since v0.1.0	—	✅ (via Edit tool)	✅ mtime check
Protected paths (.git/.teaagent)	✅ default deny rules	—	✅ .git/.codex/.agents	—
IDE integration	✅ ACP + VS Code ext	✅ VS Code ext	✅ App Server (Zed/VS Code)	—
Session resume	✅ RunStore JSONL	✅ rollout files	✅ rollout + fork	✅ SQLite
OAuth / security	✅ OAuth 2.1 + DPoP	—	✅ OAuth for MCP	—
Telemetry	✅ OTEL spans + metrics	✅ OTEL spans	✅ OTEL	—
Cloud / background tasks	✅ Ultrawork + BackgroundRun	✅ background sessions	✅ Cloud Tasks	—
Acceptance test coverage	✅ See acceptance.md (pytest-collected AT, P0/P1/P2 tiers)	—	✅ test suite	—
Declarative agent definitions	✅ YAML/JSON/Markdown	✅ .claude/agents/*.md	✅ config.toml	—

TeaAgent Differentiators

Governance-first: Every tool call, decision, and error is recorded in an immutable JSONL audit log with hash-chain integrity. No other framework provides this level of auditability.
Multi-protocol surface: MCP + ACP + A2A + ANP — more integration protocols than any single mainstream framework.
Cross-provider: 13 LLM providers vs. 1 per vendor framework.
Policy-as-code: Declarative deny rules in policy.yaml that cannot be bypassed, even in danger-full-access mode.
Built-in undo: Automatic pre-write snapshots with user-facing teaagent agent undo command.

Alignment Gaps (Addressed)

These gaps identified in the 2026-05-24 competitive analysis have been closed:

LSP integration → Code analysis tools (code_definition, code_references, code_diagnostics, code_symbols, code_tree_sitter_relations) registered when code_analysis_enabled: true.
Read-before-write mtime guard → workspace_write_file accepts expected_mtime and rejects overwrites on concurrent modification.
Protected paths → .git/* and .teaagent/* are blocked by default through built-in FilePolicy deny rules.
Declarative sub-agent definitions → YAML/JSON/Markdown frontmatter files in .teaagent/subagents/ with isolation, background, disallowed_tools, and effort fields.

FilesExpand file tree

architecture.md

Latest commit

History