Skip to content

TylrDn/cugar-agent

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

370 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CUGAr Logo

CUGAR Agent (2025 Edition)

CI Tests License: Apache 2.0 Coverage

CUGAR Agent is a production-grade, modular agent stack that embraces 2025’s best practices for LangGraph/LangChain orchestration, LlamaIndex-powered RAG, CrewAI/AutoGen-style multi-agent patterns, and modern observability (Langfuse/OpenInference/Traceloop). The repository is optimized for rapid setup, reproducible demos, and safe extension into enterprise environments. Policy and change-management guardrails are maintained in AGENTS.md and must be reviewed before modifying agents or tools.

At a Glance

  • Composable agent graph: Planner → Tool/User executor → Memory+Observability hooks, wired for LangGraph.
  • RAG-ready: LlamaIndex loader/retriever scaffolding with pluggable vector stores (Chroma, Qdrant, Weaviate, Milvus).
  • Multi-agent: CrewAI/AutoGen-compatible patterns and coordination helpers.
  • Observability-first: Langfuse/OpenInference emitters, structured audit logs, profile-aware sandboxing.
  • Developer experience: Typer CLI, Makefile tasks, uv-based env management, Ruff/Black/isort + mypy, pytest+coverage, pre-commit.
  • Deployment: Dockerfile, GitHub Actions CI/CD, sample configs and .env.example for cloud/on-prem setups.

Recent updates (scaffolding)

  • Added Watsonx Granite provider stub with deterministic defaults and JSONL audit trail to simplify enterprise alignment.
  • Added Langflow component placeholders (planner, executor, guard, Granite LLM) to prep for flow export/import commands.
  • Added registry validation, sandbox profile starter, and documentation shells for security and guardrail mapping.

Architecture

                       ┌──────────────────────────┐
                       │        Controller        │
                       │ (policy + correlation ID)│
                       └────────────┬─────────────┘
                                    │
                           plan(goal, registry)
                                    │
┌──────────────┐          ┌─────────▼─────────┐          ┌────────────────────┐
│ Registry/CFG │──sandbox▶│    Planner        │──steps──▶│   Executor/Tools   │
│ (Hydra/Dyn)  │          │ (ReAct/Plan&Exec) │          │ (LCEL, MCP, HTTP)  │
└──────────────┘          └─────────┬─────────┘          └─────────┬──────────┘
                                    │                              │
                          traces + memory writes         Langfuse/OpenInference
                                    │                              │
                            ┌───────▼────────┐                   ┌─▼────────┐
                            │ Memory / RAG   │◀────context───────│  Clients │
                            │ (LlamaIndex)   │                   │ (CLI/API)│
                            └────────────────┘                   └──────────┘

For a role-by-role, mode-aware walkthrough of how the controller, planners, executors, and MCP tool packs fit together (plus configuration keys), see docs/agents/architecture.md. For an MCP + LangChain web stack overview that covers the FastAPI backend, Vue 3 frontend, streaming flows, and configuration surfaces, see docs/MCP_LANGCHAIN_OVERVIEW.md. A step-by-step stable local launch checklist (registry + sandbox + Langflow readiness) lives in docs/local_stable_launch.md.

Documentation

📘 System Execution Narrative - Complete request → response flow for contributor onboarding (3 entry points: CLI/FastAPI/MCP, 8 execution phases with security boundaries, observability integration, debugging tips, testing guidance)

🔧 FastAPI Role Clarification - Defines FastAPI as transport layer only (HTTP/SSE, auth, budget enforcement) vs orchestration (planning, coordination, execution) to prevent mixing concerns

⚙️ Orchestrator Interface and Semantics - Formal specification for orchestrator API with lifecycle callbacks, failure taxonomy, retry semantics, execution context, routing authority, and implementation patterns

🏢 Enterprise Workflow Examples - Comprehensive end-to-end workflow examples for typical enterprise use cases (customer onboarding, incident response, data pipelines) with planning, error recovery, HITL gates, and external API automation

📊 Observability and Debugging Guide - Comprehensive instrumentation guide with structured logging, distributed tracing (OpenTelemetry/LangFuse/LangSmith), metrics collection, error introspection, replayable traces, dashboards, and troubleshooting playbooks

🧪 Test Coverage Map - Comprehensive test coverage aligned with architectural components showing what's tested (orchestrator 80%, routing 85%, failures 90%) and critical gaps (tools 30%, memory 20%, config 0%, observability 0%) with priorities for additional testing

👋 Developer Onboarding Guide - Step-by-step walkthrough for newcomers: environment setup (15 min), first agent interaction (10 min), create custom tool (20 min), build custom agent (30 min), wire components together (15 min) with full working examples (calculator tool, math tutor agent, tutoring workflow)

Quickstart

# 1) Install (Python >=3.10)
uv sync --all-extras --dev
uv run playwright install --with-deps chromium

# 2) Configure environment
cp .env.example .env
# set OPENAI_API_KEY / LANGFUSE_SECRET / etc inside .env

# 3) Run demo agent locally
uv run cuga start demo

# 4) Try modular stack example
uv run python examples/run_langgraph_demo.py --goal "triage a support ticket"

Installation

  • Dependencies: uv (or pip), optional browsers for Playwright, optional vector DB service (Chroma/Weaviate/Qdrant/Milvus).
  • Development: uv sync --all-extras --dev installs dev + optional extras (memory, sandbox, groq, etc.).
  • Pre-commit: uv run pre-commit install then uv run pre-commit run --all-files.

Configuration

  • .env.example lists required variables for LLMs, tracing, and storage.
  • configs/ holds YAML/TOML profiles for agents, LangGraph graphs, memory backends, and observability.
  • registry.yaml and config/ house MCP/registry defaults; use scripts/verify_guardrails.py before shipping changes.

Guardrails & Change Management

  • Review AGENTS.md before altering planners, tools, or registry entries; it is the single source of truth for allowlists, sandbox expectations, budgets, and redaction.
  • Guardrail and registry changes are enforced by CI: scripts/verify_guardrails.py --base <branch> collects diffs and fails if README.md, PRODUCTION_READINESS.md, CHANGELOG.md, or todo1.md are not updated alongside guardrail changes or if ## vNext lacks a guardrail note.
  • Keep production checklists (PRODUCTION_READINESS.md) and security docs in sync with guardrail adjustments so downstream users understand the default policies and where to override them.
  • Developer checklist: ensure registry entries declare sandboxes + /workdir pinning for exec scopes, budget/observability env keys (AGENT_*, OTEL_*, LangFuse/LangSmith, Traceloop) are wired, docs/mcp/tiers.md is regenerated from docs/mcp/registry.yaml, and new/updated tests exercise planner ranking, import guardrails, and registry hot-swap determinism.

Agent Types

  • Planner: ReAct or Plan-and-Execute; emits steps with policy-aware cost/latency hints.
  • Tool Executor: LCEL/LangChain tools, MCP adapters, HTTP/OpenAPI runners with sandboxed registry resolution.
  • RAG/Data Agent: LlamaIndex loader+retriever (docs in rag/), vector memory connectors in memory/.
  • Coordinator: CrewAI/AutoGen-like orchestrator for multi-agent hand-offs.
  • Observer: Langfuse/OpenInference emitters with correlation IDs and redaction hooks.

See AGENTS.md for role details and USAGE.md for end-to-end flows.

RAG Setup

  • Drop documents into rag/sources/ or configure a remote store.
  • Choose a backend in configs/memory.yaml (chroma|qdrant|weaviate|milvus|local).
  • Run uv run python scripts/load_corpus.py --source rag/sources --backend chroma.
  • Query via uv run python examples/rag_query.py --query "How do I add a new MCP tool?".

Memory & State

  • memory/ exposes VectorMemory (in-memory fallback), summarization hooks, and profile-scoped stores.
  • State keys are namespaced by profile to preserve sandbox isolation.
  • Persistence is opt-in; see configs/memory.yaml and TESTING.md for guidance.

Observability

  • Langfuse client is wired via observability/langfuse.py with sampling + PII redaction hooks.
  • OpenInference/Traceloop emitters are optional and can be toggled per profile.
  • Structured audit logs live under logs/ when enabled; avoid committing artifacts.
  • Watsonx Granite calls validate credentials up front and append JSONL audit rows with timestamp, actor, parameters, and outcome for offline review.

Observability preview

  • The FastAPI orchestrator exposes a Prometheus-compatible metrics endpoint at /metrics (default port 8000). This endpoint exports golden-signal metrics such as cuga_requests_total, cuga_success_rate, cuga_latency_ms{percentile="p50|p95|p99"}, cuga_tool_error_rate, cuga_budget_warnings_total, and cuga_budget_exceeded_total.
  • Configure OpenTelemetry (OTLP) or console exporters via environment variables. Common envs:
    • OTEL_EXPORTER_OTLP_ENDPOINT — OTLP HTTP/gRPC endpoint for traces/metrics (optional; when unset the console exporter is used).
    • OTEL_SERVICE_NAME — service name to appear in traces (default: cuga-orchestrator).
    • OTEL_TRACES_EXPORTER / OTEL_METRICS_EXPORTER — exporter type (otlp, logging, none).

Example: curl the metrics endpoint locally

# If running the orchestrator locally on port 8000
curl -sS http://localhost:8000/metrics | head -n 80

# Expected sample lines (Prometheus format):
# cuga_requests_total 42
# cuga_success_rate 0.95
# cuga_latency_ms{percentile="p50"} 150.0
# cuga_latency_ms{percentile="p95"} 450.0
# cuga_tool_error_rate 0.02
# cuga_budget_warnings_total 3
# cuga_budget_exceeded_total 0

Multi-Agent & Coordination

  • agents/ outlines planner/worker/tool-user patterns and how to register them with CrewAI/AutoGen.
  • examples/multi_agent_dispatch.py demonstrates round-robin delegation with shared vector context.
  • Hand-offs carry correlation IDs and redacted summaries, not raw prompts.

Testing & Quality Gates

  • Run make lint test typecheck locally.
  • Pytest with coverage is configured (see TESTING.md).
  • CI (GitHub Actions) runs lint, type-check, tests, and guardrail verification on pushes/PRs.

Security Model

CUGAR Agent enforces security-first design with deny-by-default policies per AGENTS.md:

Core Security Principles

  1. Allowlist-First Tool Selection: Only explicitly allowed tools from cuga.modular.tools.* can execute
  2. Deny-by-Default Network: Network egress restricted to domain allowlist; localhost/private networks blocked by default
  3. Sandbox Isolation: All tool execution in isolated sandboxes (py/node slim|full, orchestrator profiles) with read-only mounts
  4. Budget Enforcement: Cost ceilings (default: 100 units/task) with warn or block policies
  5. Human-in-the-Loop Approval: High-risk operations (DELETE, FINANCIAL) require explicit approval before execution

Security Architecture

Request → Budget Guard → Tool Allowlist → Parameter Validation → Network Policy → Sandbox Execution
            ↓               ↓                    ↓                      ↓               ↓
       (ceiling=100)   (cuga.modular   (type/range/pattern)    (domain allowlist) (read-only)
                        .tools.* only)                          (no localhost)

Approval Flow (HITL):

  • Low-risk (READ): Auto-approved, logged
  • Medium-risk (WRITE): Auto-approved with audit trail
  • High-risk (DELETE, FINANCIAL): Requires human approval (5min timeout, reject on timeout)

Budget Policy:

  • AGENT_BUDGET_CEILING=100 (default): Max cost units per task
  • AGENT_BUDGET_POLICY=warn|block: Warn and continue, or block execution
  • AGENT_ESCALATION_MAX=2: Max approval escalations before admin approval required

See SECURITY.md for complete security controls and docs/security/GOVERNANCE.md for governance architecture.

Security & Safe Execution

CUGAR Agent enforces security-first design with deny-by-default policies per AGENTS.md § 4 Sandbox Expectations:

MCP & OpenAPI Governance

  • Policy Gates: HITL approval points for WRITE/DELETE/FINANCIAL actions (Slack send, file delete, stock orders)
  • Per-Tenant Capability Maps: 8 organizational roles (marketing/trading/engineering/support) with tool allowlists/denylists
  • Runtime Health Checks: Tool discovery ping, schema drift detection, cache TTLs to prevent huge cold-start lists
  • Layered Access Control: Tool registration → Tenant map → Tool-level restrictions → Rate limits

See docs/security/GOVERNANCE.md for complete governance architecture, configuration files, and integration patterns.

Eval/Exec Elimination

  • No eval/exec: All eval() and exec() calls eliminated from production code paths
  • AST-based expression evaluation: Use safe_eval_expression() from cuga.backend.tools_env.code_sandbox.safe_eval for mathematical expressions
    • Allowlisted operators: Add/Sub/Mul/Div/FloorDiv/Mod/Pow
    • Allowlisted functions: math.sin/cos/tan/sqrt/log/exp, abs/round/min/max/sum
    • Denies: assignments, imports, attribute access, eval/exec/import
  • SafeCodeExecutor: All code execution routed through SafeCodeExecutor or safe_execute_code() from cuga.backend.tools_env.code_sandbox.safe_exec
    • Import allowlist: Only cuga.modular.tools.* permitted
    • Import denylist: os/sys/subprocess/socket/pickle/eval/exec/compile
    • Restricted builtins: Safe operations (math/types/iteration) allowed; eval/exec/open/import denied
    • Filesystem deny-default: No file operations unless explicitly allowed
    • Timeout enforcement: 30s default, configurable
    • Audit trail: All imports/executions logged with trace_id

HTTP & Secrets Hardening

  • SafeClient wrapper: All HTTP requests MUST use SafeClient from cuga.security.http_client
    • Enforced timeouts: 10.0s read, 5.0s connect, 10.0s write, 10.0s total
    • Automatic retry: Exponential backoff (4 attempts max, 8s max wait)
    • URL redaction: Query params and credentials stripped from logs
  • Env-only secrets: Credentials MUST be loaded from environment variables
    • CI enforces .env.example parity validation (no missing keys)
    • Secret scanning: trufflehog + gitleaks on every push/PR
    • Hardcoded API keys/tokens trigger CI failure

Import & Sandbox Controls

  • Import restrictions: Dynamic imports limited to cuga.modular.tools.* namespace only
  • Profile isolation: Memory and tool access namespaced per profile; no cross-profile leakage
  • Sandbox profiles: All registry entries declare sandbox profile (py/node slim|full, orchestrator)
  • Read-only defaults: Mounts are read-only by default; /workdir pinning for exec scopes

See AGENTS.md for complete guardrail specifications and docs/security/ for detailed security controls.

Observability & Monitoring

CUGAR Agent provides production-grade observability with structured events, golden signals, and multi-backend export:

Observability Stack

  • Structured Events: plan_created, route_decision, tool_call_start/complete/error, budget_warning/exceeded, approval_requested/received/timeout
  • Golden Signals: Success rate (%), latency (P50/P95/P99), tool error rate (%), mean steps/task, approval wait time, budget utilization
  • Trace Propagation: trace_id flows through CLI → planner → worker → coordinator → tools with parent-child relationships
  • PII Redaction: Auto-redact sensitive keys (secret, token, password, api_key, credential, auth) before emission

Monitoring Endpoints

# Prometheus metrics endpoint (scrape target)
curl http://localhost:8000/metrics

# Expected metrics:
cuga_requests_total              # Total requests handled
cuga_success_rate               # % successful requests
cuga_latency_ms{percentile}     # P50/P95/P99 latency
cuga_tool_error_rate            # % failed tool calls
cuga_steps_per_task             # Mean planning steps
cuga_budget_warnings_total      # Budget warnings emitted
cuga_budget_exceeded_total      # Budget hard blocks
cuga_approval_requests_total{status}  # Approval flow tracking

Multi-Backend Support

  • OpenTelemetry (OTLP): Set OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318 for Jaeger/Zipkin/Tempo
  • LangFuse: Set LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST for LLM tracing
  • LangSmith: Set LANGCHAIN_API_KEY, LANGCHAIN_PROJECT, LANGCHAIN_ENDPOINT
  • Console (Default): Offline-first JSON logs to stdout (no network required)

Grafana Dashboard

Import pre-built dashboard from observability/grafana_dashboard.json:

  • Request rate & success rate panels
  • Latency percentile charts (P50/P95/P99)
  • Tool error breakdown by tool/type
  • Budget utilization gauge
  • Approval queue depth
  • Event timeline with filtering

Configuration:

# Enable OTEL export
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_SERVICE_NAME="cuga-orchestrator"

# Start with observability
uv run cuga start demo

See docs/observability/OBSERVABILITY_GUIDE.md for detailed instrumentation guide and PRODUCTION_READINESS.md for metrics scraping setup.

FAQ

  • Which LLMs are supported?
    • OpenAI (GPT-4o, GPT-4 Turbo)
    • Azure OpenAI
    • Anthropic (Claude 3.5 Sonnet, Opus, Haiku)
    • IBM Watsonx / Granite 4.0 (granite-4-h-small, granite-4-h-micro, granite-4-h-tiny) — Default provider with deterministic temperature=0.0
    • Groq (Mixtral)
    • Google GenAI
    • Any LangChain-compatible model via adapters
  • Do I need a vector DB? Not for quickstarts; an in-memory store is bundled. For production use Chroma/Qdrant/Weaviate/Milvus.
  • How do I add a new tool? Implement ToolSpec in tools/registry.py or wrap an MCP server; see USAGE.md.
  • Is this production-ready? Core stack follows sandboxed, profile-scoped design with observability. Harden configs before internet-facing use.
  • How do I configure Watsonx/Granite? Set environment variables: WATSONX_API_KEY, WATSONX_PROJECT_ID, and optionally WATSONX_URL. See docs/configuration/ENVIRONMENT_MODES.md for details.

Documentation

For a complete understanding of system execution flow:

Roadmap Highlights

  • Streaming-first ReAct policies with beta support for Strands/semantic state machines.
  • Built-in eval harness for self-play and regression suites.
  • Optional LangServe or FastAPI hosting for SaaS-style deployments (see ROADMAP.md).

License

Apache 2.0. See LICENSE.

About

CUGA is an open-source generalist agent for enterprise, r = reduced instruction set computing, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware features.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 54.0%
  • JavaScript 20.8%
  • TypeScript 12.2%
  • Jinja 7.5%
  • CSS 3.8%
  • HTML 1.2%
  • Other 0.5%