Skip to content

100monkeys-ai/aegis-orchestrator

AEGIS Orchestrator

The core runtime and orchestrator for Project AEGIS - a secure, serverless runtime environment for autonomous AI agents.

License Rust Documentation

Overview

The AEGIS Orchestrator is the control plane that manages agent lifecycle, enforces security policies, and provides runtime isolation through Docker (development) and Firecracker (production) micro-VMs.

Architecture

┌─────────────────────────────────────────────┐
│         AEGIS Orchestrator (Rust)            │
│  • Scheduling  • Security  • State Mgmt     │
└─────────────────────────────────────────────┘
                    │
    ┌───────────────┴───────────────┐
    ▼                               ▼
┌─────────┐                   ┌─────────┐
│ Docker  │                   │Firecracker│
│ Runtime │                   │  Runtime  │
└─────────┘                   └─────────┘

Components

Core (core/)

Pure domain logic implementing:

  • Agent lifecycle management
  • Runtime trait abstraction
  • Security policy engine
  • Swarm coordination

API (api/)

HTTP/gRPC server built with Axum for:

  • Agent deployment
  • Task execution
  • Status monitoring
  • Management operations

Runtimes

  • Docker (runtime-docker/): Development runtime using containers
  • Firecracker (runtime-firecracker/): Production runtime with micro-VMs

CLI (cli/)

Command-line tool for local development and agent management:

# Daemon management
aegis daemon start                    # Start daemon
aegis daemon stop                     # Stop daemon
aegis daemon status                   # Check status

# Agent management
aegis agent deploy agent.yaml         # Deploy agent
aegis agent list                      # List agents
aegis agent logs <agent-name>         # Stream agent logs
aegis agent remove <agent-id>         # Remove agent

# Task execution
aegis task execute <agent-name>       # Execute task
aegis task list                       # List executions
aegis task logs <execution-id>        # View execution logs
aegis task cancel <execution-id>      # Cancel execution

See CLI Reference for complete documentation.

Edge Node (edge-node/)

Lightweight binary for hybrid cloud/on-prem deployments.

Quick Start

Prerequisites

  • Rust 1.75+
  • Docker 24.0+
  • Ollama (for local LLM) or OpenAI API key
  • (Production) Linux with KVM support

Build

# Build the CLI and orchestrator
cargo build -p aegis-orchestrator

# Or build in release mode
cargo build --release -p aegis-orchestrator

Configuration

Create or edit aegis-config.yaml:

apiVersion: 100monkeys.ai/v1
kind: NodeConfig

metadata:
  name: "my-aegis-node"

spec:
  node:
    id: "my-node-001"
    type: "edge"

  llm_providers:
    - name: "local"
      type: "ollama"
      endpoint: "http://localhost:11434"
      enabled: true
      models:
        - alias: "default"
          model: "phi3:mini"
          capabilities: ["code", "reasoning"]
          context_window: 4096
          cost_per_1k_tokens: 0.0

  llm_selection:
    strategy: "prefer-local"
    default_provider: "local"

  observability:
    logging:
      level: "info"

See Node Config Reference and aegis-config.yaml for a complete example.

Debugging and Logging

The orchestrator uses structured logging via the tracing crate. Log levels: trace, debug, info, warn, error.

Set Log Level:

# Via environment variable (recommended for development)
export RUST_LOG=debug
cargo run -p aegis-orchestrator -- daemon start

# Via CLI flag
cargo run -p aegis-orchestrator -- daemon start --log-level debug

# Via config file (aegis-config.yaml)
spec:
  observability:
    logging:
      level: "debug"  # trace, debug, info, warn, error

Bootstrap.py Debugging:

When running at debug level, the orchestrator automatically:

  • Logs all stdout from bootstrap.py (the Python script inside agent containers)
  • Logs all stderr from bootstrap.py as warnings
  • Enables verbose mode in bootstrap.py (via AEGIS_BOOTSTRAP_DEBUG=true environment variable)

This is useful for tracing LLM connectivity issues, prompt delivery, or agent execution failures.

Example Debug Output:

# Start with debug logging
RUST_LOG=debug cargo run -p aegis-orchestrator -- daemon start

# In another terminal, execute an agent
cargo run -p aegis-orchestrator -- task execute my-agent --input "test"

# You'll see in the orchestrator logs:
# DEBUG aegis_orchestrator_core::infrastructure::runtime: Starting bootstrap.py execution container_id="abc123"
# DEBUG aegis_orchestrator_core::infrastructure::runtime: Bootstrap output: "Attempting to connect to Orchestrator at http://host.docker.internal:8088..."
# DEBUG aegis_orchestrator_core::infrastructure::runtime: Bootstrap output: "[BOOTSTRAP DEBUG] Bootstrap starting - execution_id=xxx, iteration=1"
# DEBUG aegis_orchestrator_core::infrastructure::runtime: Bootstrap output: "[BOOTSTRAP DEBUG] Received prompt (1234 chars)"

Troubleshooting Bootstrap Issues:

If agents fail to execute or you see connection errors:

  1. Enable debug logging: RUST_LOG=debug
  2. Check bootstrap.py output in orchestrator logs
  3. Verify AEGIS_ORCHESTRATOR_URL is reachable from inside containers

Running Locally

# Start the daemon
target/debug/aegis daemon start

# Check daemon status
target/debug/aegis daemon status

# Deploy demo agents
cd ../aegis-examples && aegis agent deploy ./agents/echo/agent.yaml
cd ../aegis-examples && aegis agent deploy ./agents/greeter/agent.yaml

# List deployed agents
target/debug/aegis agent list

# Execute a task
target/debug/aegis task execute echo --input "Hello Daemon"

# View agent logs
target/debug/aegis agent logs echo

# Stop the daemon
target/debug/aegis daemon stop

For detailed instructions, see Getting Started Guide.

Development

Project Structure

aegis-orchestrator/
├── core/              # Domain logic (DDD)
├── api/               # HTTP/gRPC server
├── runtime-docker/    # Docker adapter
├── runtime-firecracker/ # Firecracker adapter
├── security/          # Policy enforcement
├── cli/               # CLI tool
├── edge-node/         # Edge node binary
└── tests/             # Integration tests

Architecture Principles

  • Domain-Driven Design: Clear bounded contexts
  • Hexagonal Architecture: Pure domain core with infrastructure adapters
  • Type Safety: Leverage Rust's type system
  • Security First: Default-deny policies

Running Tests

# Unit tests
cargo test --lib

# Integration tests
cargo test --test '*'

# Specific component
cargo test -p aegis-core

Configuration Reference

See examples/ for sample configurations.

Security

The orchestrator enforces:

  • Isolation: Kernel-level (Firecracker) or namespace-based (Docker)
  • Network Control: DNS/IP allow-listing
  • Resource Limits: CPU, memory, execution time
  • Audit Trail: Immutable logging

For details, see Security Model.

Performance

  • Cold Start: <125ms (Firecracker)
  • Throughput: 1,000+ agents/second (target)
  • Memory: ~128MB per Firecracker VM

Documentation

Full documentation is available at docs.100monkeys.ai.

Section Description
Getting Started Install, configure, and run your first agent
Core Concepts Agents, executions, workflows, swarms, security model
Writing Agents Author and structure agent code
Deploying Agents Deploy agents with the CLI or API
LLM Providers Configure Ollama, OpenAI, and other LLM backends
Building Workflows Chain agents into multi-step workflows
Building Swarms Coordinate parallel agent swarms
Configuring Storage Persistent storage backends for agents
Local Testing Test agents locally before deploying
Architecture Execution engine, SMCP, storage gateway, event bus
Security Model Isolation, network control, secrets, audit trail
Deployment — Docker Run the orchestrator with Docker
Deployment — Firecracker Production micro-VM setup
Secrets Management OpenBao integration via secret-store ACL (Keymaster Pattern)
IAM Keycloak identity and access management
Configuration Reference NodeConfig YAML reference (aegis-config.yaml)
Agent Manifest Reference AgentManifest YAML field reference
Workflow Manifest Reference WorkflowManifest YAML field reference
CLI Reference Complete aegis CLI command reference
gRPC API aegis.runtime.v1 service methods and message types

References

License

AGPL-3.0. See LICENSE for details.

Related Repositories


Built with Rust for security, performance, and reliability.

About

100monkeys with AEGIS makes deterministic AI agents mainstream.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages