| Aspect | AIOS (AGI Research) | Agent Control Plane |
|---|---|---|
| Primary Focus | Efficiency (throughput, latency) | Safety (policy enforcement, audit) |
| Target Audience | Researchers, ML Engineers | Enterprise, Production Systems |
| Kernel Philosophy | Resource optimization | Security boundary |
| Failure Mode | Graceful degradation | Kernel panic on violation |
| Policy Enforcement | Optional/configurable | Mandatory, kernel-level |
| Paper Venue | COLM 2025 | ASPLOS 2026 (target) |
βββββββββββββββββββββββββββββββββββββββ
β AIOS Kernel β
βββββββββββββββββββββββββββββββββββββββ€
β βββββββββββ βββββββββββββββββββ β
β βSchedulerβ β Context Manager β β
β βββββββββββ βββββββββββββββββββ β
β βββββββββββ βββββββββββββββββββ β
β βMemory β β Tool Manager β β
β βManager β β β β
β βββββββββββ βββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββ
β β Access Control (Optional) ββ
β ββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
Focus: GPU utilization, FIFO/Round-Robin scheduling, context switching
βββββββββββββββββββββββββββββββββββββββ
β Kernel Space (Ring 0) β
β ββββββββββββββββββββββββββββββββββββ
β β Policy Engine (Mandatory) ββ
β ββββββββββββββββββββββββββββββββββββ
β βββββββββββ βββββββββββββββββββ β
β β Flight β β Signal β β
β βRecorder β β Dispatcher β β
β βββββββββββ βββββββββββββββββββ β
β βββββββββββ βββββββββββββββββββ β
β β VFS β β IPC Router β β
β β Manager β β β β
β βββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ€
β User Space (Ring 3) β
β ββββββββββββββββββββββββββββββββββββ
β β LLM Generation (Isolated) ββ
β β Tool Execution ββ
β β Agent Logic ββ
β ββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
Focus: Isolation, policy enforcement, audit trail, crash containment
| Feature | AIOS | Agent Control Plane |
|---|---|---|
| Scheduling | FIFO, Round-Robin, Priority | Policy-based, Safety-first |
| Context Switching | Performance optimized | Checkpoint + Rollback |
| Memory Model | Short-term + Long-term | VFS with mount points |
| Signal Handling | None | POSIX-style (SIGSTOP, SIGKILL, etc.) |
| Policy Violation | Log and continue | Kernel panic (0% tolerance) |
| Crash Isolation | Same process | Kernel survives user crashes |
| IPC | Function calls | Typed pipes with policy check |
| Audit | Logging | Flight recorder (black box) |
AIOS Approach:
"If an agent is slow, optimize it. If it fails, retry it."
Our Approach:
"If an agent violates policy, kill it immediately. No exceptions."
# AIOS: Efficiency-first
async def transfer_money(agent, amount):
# AIOS focuses on throughput
result = await agent.execute(f"Transfer ${amount}")
return result # Hope nothing went wrong
# Agent Control Plane: Safety-first
async def transfer_money(kernel, agent_ctx, amount):
# Policy check BEFORE execution
allowed = await agent_ctx.check_policy("transfer", f"amount={amount}")
if not allowed:
# Kernel panic - cannot proceed
raise PolicyViolation("Transfer exceeds limit")
# Execute with full audit trail
result = await agent_ctx.syscall(SyscallType.SYS_EXEC,
tool="transfer",
args={"amount": amount}
)
# Flight recorder has everything
return result| Concern | AIOS Answer | Our Answer |
|---|---|---|
| "What if agent goes rogue?" | "Monitor and intervene" | "Kernel panic, immediate termination" |
| "Can we audit all actions?" | "Logging available" | "Flight recorder - every syscall recorded" |
| "What about data exfiltration?" | "Access control optional" | "VFS mount points, policy per-path" |
| "Regulatory compliance?" | "Not primary focus" | "Built-in governance layer" |
| "Multi-tenant isolation?" | "Process-level" | "Kernel/User space separation" |
| Aspect | AIOS | Agent Control Plane |
|---|---|---|
| Novel Contribution | LLM Scheduling algorithms | Safety-first kernel design |
| ASPLOS Fit | Systems efficiency | OS abstractions for AI |
| eBPF Potential | Not explored | Network monitoring extension |
| Reproducibility | Benchmark suite | Differential auditing |
AIOS has no signal mechanism. Agents are black boxes.
Agent Control Plane implements POSIX-style signals:
class AgentSignal(IntEnum):
SIGSTOP = 1 # Pause for inspection (shadow mode)
SIGCONT = 2 # Resume execution
SIGINT = 3 # Graceful interrupt
SIGKILL = 4 # Immediate termination (non-maskable)
SIGTERM = 5 # Request graceful shutdown
SIGPOLICY = 8 # Policy violation (triggers SIGKILL)
SIGTRUST = 9 # Trust boundary crossed (triggers SIGKILL)Why this matters:
- SIGSTOP enables "shadow mode" - pause and inspect without termination
- SIGKILL is non-maskable - agents CANNOT ignore it
- SIGPOLICY is automatic on violation - 0% tolerance guarantee
Agent
βββ Short-term Memory (conversation buffer)
βββ Long-term Memory (persistent storage)
/
βββ mem/
β βββ working/ # Ephemeral scratchpad
β βββ episodic/ # Experience logs
β βββ semantic/ # Facts (vector store mount)
β βββ procedural/ # Learned skills
βββ state/
β βββ checkpoints/ # Snapshots for rollback
βββ tools/ # Tool interfaces
βββ policy/ # Read-only policy files
βββ ipc/ # Inter-process communication
Why VFS?
- Uniform interface: Same API for memory, state, tools
- Backend agnostic: Mount Pinecone, Redis, or file system
- Policy per-path:
/policyis read-only from user space - POSIX familiar: Engineers know this model
# AIOS - agents call each other directly
result = agent_b.process(agent_a.output)# Our approach - policy-enforced pipes
pipeline = (
research_agent
| PolicyCheckPipe(allowed_types=["ResearchResult"])
| summary_agent
)
result = await pipeline.execute(query)Why pipes?
- Type checking at pipe level (not runtime exceptions)
- Policy enforcement at every hop
- Backpressure prevents cascade failures
- Full audit trail through flight recorder
- Novel scheduling algorithms for LLMs
- Context switching performance
- Throughput benchmarks
- Novel OS abstractions for AI safety
- Kernel/User space separation for agent isolation
- POSIX-inspired primitives (signals, VFS, pipes)
- eBPF extension for network monitoring (future)
Key Differentiator: We are not competing on efficiency. We are defining the safety contract for enterprise AI agents.
βββββββββββββββββββββββββββββββββββββββββββ
β Agent Process β
βββββββββββββββββββββββββββββββββββββββββββ€
β HTTP Request to api.openai.com β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β eBPF Probe (Kernel Space) β β
β β - Monitor all network calls β β
β β - Block unauthorized endpoints β β
β β - Log payload hashes β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β Network Stack β
βββββββββββββββββββββββββββββββββββββββββββ
Why eBPF?
- Monitoring happens OUTSIDE Python runtime
- Cannot be bypassed by agent code
- Sub-millisecond overhead
- ASPLOS loves eBPF papers
| Use Case | Recommended |
|---|---|
| Research experiments | AIOS |
| Production enterprise | Agent Control Plane |
| Throughput benchmarks | AIOS |
| Compliance-heavy industries | Agent Control Plane |
| Multi-agent chaos | AIOS (let them fight) |
| Multi-agent governance | Agent Control Plane |
AIOS and Agent Control Plane are not competing - they solve different problems.
- AIOS: "How do we run 1000 agents efficiently?"
- Agent Control Plane: "How do we run 10 agents without any of them going rogue?"
For enterprise adoption, the second question matters more.